Interactive Audio Robotic Exploration

This repository contains the implementation accompanying the ICRA 2026 paper "CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction".

Paper: arXiv
Project Page: https://robin-lab.cs.utexas.edu/CAVER/

Overview

CAVER is a robotic system that enables autonomous exploration and acoustic data collection in unknown environments. The system integrates vision-based object detection using Grounded Segment Anything Model (SAM), multiple exploration policies, and audio recording capabilities to allow a robotic arm to systematically interact with and analyze objects in its workspace.

Key Features

Autonomous Exploration: Multiple exploration strategies including random, curious, and object-cycling policies
Vision Integration: Grounded SAM for robust object detection and segmentation
Audio Collection: Synchronized audio recording during interactions
ROS Architecture: Full ROS integration with MoveIt motion planning
Simulation Support: Gazebo simulation environment for testing
Real-world Deployment: Hardware interface for physical robot execution

Installation

Prerequisites

Ubuntu 20.04+ / ROS Noetic
Python 3.8+
Conda/Miniconda
CUDA-compatible GPU (recommended for Grounded SAM)

Setup

Clone the repository:

git clone https://github.com/yourusername/interactive_audio.git
cd interactive_audio

Create conda environment:

conda env create -f environment_audio.yml
conda activate interactive_audio

Install ROS dependencies:

sudo apt-get update
sudo apt-get install ros-noetic-moveit ros-noetic-octomap

Usage

Real Robot Deployment

For deployment on physical hardware:

roslaunch launch/real.launch

Running Experiments

Execute the main experiment script:

python scripts/GSAM_interactive_audio/experiment_executor.py

Project Structure

├── launch/                  # ROS launch configurations
├── scripts/
│   └── GSAM_interactive_audio/
│       ├── experiment_executor.py    # Main experiment runner
│       └── GSAM_agent/              # Core agent components
│           ├── grounded_sam_relay.py # Grounded SAM interface
│           ├── motion_planner.py     # Motion planning
│           ├── policy_manager.py     # Policy management
│           ├── camera_manager.py     # Camera handling
│           ├── policies/             # Exploration policies
│           └── knn/                  # Uncertainty estimation
└── utils/                   # Shared utilities

Configuration

Key configuration files:

launch/sensor_setup.yaml: Sensor parameters
utils/config.py: System configuration
environment_audio.yml: Python dependencies

Citation

If you use this code in your research, please cite our ICRA 2026 paper:

@inproceedings{yourpaper2026,
  title={CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction},
  author={Your Name and Co-authors},
  booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
  year={2026}
}

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive Audio Robotic Exploration

Overview

Key Features

Installation

Prerequisites

Setup

Usage

Real Robot Deployment

Running Experiments

Project Structure

Configuration

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 111 Commits
launch		launch
scripts		scripts
utils		utils
.gitignore		.gitignore
README.md		README.md
environment_audio.yml		environment_audio.yml

Folders and files

Latest commit

History

Repository files navigation

Interactive Audio Robotic Exploration

Overview

Key Features

Installation

Prerequisites

Setup

Usage

Real Robot Deployment

Running Experiments

Project Structure

Configuration

Citation

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages