Skip to content

UT-Austin-RobIn/interactive_audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

111 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Interactive Audio Robotic Exploration

This repository contains the implementation accompanying the ICRA 2026 paper "CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction".

Paper: arXiv
Project Page: https://robin-lab.cs.utexas.edu/CAVER/

Overview

CAVER is a robotic system that enables autonomous exploration and acoustic data collection in unknown environments. The system integrates vision-based object detection using Grounded Segment Anything Model (SAM), multiple exploration policies, and audio recording capabilities to allow a robotic arm to systematically interact with and analyze objects in its workspace.

Key Features

  • Autonomous Exploration: Multiple exploration strategies including random, curious, and object-cycling policies
  • Vision Integration: Grounded SAM for robust object detection and segmentation
  • Audio Collection: Synchronized audio recording during interactions
  • ROS Architecture: Full ROS integration with MoveIt motion planning
  • Simulation Support: Gazebo simulation environment for testing
  • Real-world Deployment: Hardware interface for physical robot execution

Installation

Prerequisites

  • Ubuntu 20.04+ / ROS Noetic
  • Python 3.8+
  • Conda/Miniconda
  • CUDA-compatible GPU (recommended for Grounded SAM)

Setup

  1. Clone the repository:

    git clone https://github.com/yourusername/interactive_audio.git
    cd interactive_audio
  2. Create conda environment:

    conda env create -f environment_audio.yml
    conda activate interactive_audio
  3. Install ROS dependencies:

    sudo apt-get update
    sudo apt-get install ros-noetic-moveit ros-noetic-octomap

Usage

Real Robot Deployment

For deployment on physical hardware:

roslaunch launch/real.launch

Running Experiments

Execute the main experiment script:

python scripts/GSAM_interactive_audio/experiment_executor.py

Project Structure

├── launch/                  # ROS launch configurations
├── scripts/
│   └── GSAM_interactive_audio/
│       ├── experiment_executor.py    # Main experiment runner
│       └── GSAM_agent/              # Core agent components
│           ├── grounded_sam_relay.py # Grounded SAM interface
│           ├── motion_planner.py     # Motion planning
│           ├── policy_manager.py     # Policy management
│           ├── camera_manager.py     # Camera handling
│           ├── policies/             # Exploration policies
│           └── knn/                  # Uncertainty estimation
└── utils/                   # Shared utilities

Configuration

Key configuration files:

  • launch/sensor_setup.yaml: Sensor parameters
  • utils/config.py: System configuration
  • environment_audio.yml: Python dependencies

Citation

If you use this code in your research, please cite our ICRA 2026 paper:

@inproceedings{yourpaper2026,
  title={CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction},
  author={Your Name and Co-authors},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors