This repository contains the implementation accompanying the ICRA 2026 paper "CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction".
Paper: arXiv
Project Page: https://robin-lab.cs.utexas.edu/CAVER/
CAVER is a robotic system that enables autonomous exploration and acoustic data collection in unknown environments. The system integrates vision-based object detection using Grounded Segment Anything Model (SAM), multiple exploration policies, and audio recording capabilities to allow a robotic arm to systematically interact with and analyze objects in its workspace.
- Autonomous Exploration: Multiple exploration strategies including random, curious, and object-cycling policies
- Vision Integration: Grounded SAM for robust object detection and segmentation
- Audio Collection: Synchronized audio recording during interactions
- ROS Architecture: Full ROS integration with MoveIt motion planning
- Simulation Support: Gazebo simulation environment for testing
- Real-world Deployment: Hardware interface for physical robot execution
- Ubuntu 20.04+ / ROS Noetic
- Python 3.8+
- Conda/Miniconda
- CUDA-compatible GPU (recommended for Grounded SAM)
-
Clone the repository:
git clone https://github.com/yourusername/interactive_audio.git cd interactive_audio -
Create conda environment:
conda env create -f environment_audio.yml conda activate interactive_audio
-
Install ROS dependencies:
sudo apt-get update sudo apt-get install ros-noetic-moveit ros-noetic-octomap
For deployment on physical hardware:
roslaunch launch/real.launchExecute the main experiment script:
python scripts/GSAM_interactive_audio/experiment_executor.py├── launch/ # ROS launch configurations
├── scripts/
│ └── GSAM_interactive_audio/
│ ├── experiment_executor.py # Main experiment runner
│ └── GSAM_agent/ # Core agent components
│ ├── grounded_sam_relay.py # Grounded SAM interface
│ ├── motion_planner.py # Motion planning
│ ├── policy_manager.py # Policy management
│ ├── camera_manager.py # Camera handling
│ ├── policies/ # Exploration policies
│ └── knn/ # Uncertainty estimation
└── utils/ # Shared utilities
Key configuration files:
launch/sensor_setup.yaml: Sensor parametersutils/config.py: System configurationenvironment_audio.yml: Python dependencies
If you use this code in your research, please cite our ICRA 2026 paper:
@inproceedings{yourpaper2026,
title={CAVER: Closed-Loop Autonomous Audio-Visual Exploration and Reconstruction},
author={Your Name and Co-authors},
booktitle={IEEE International Conference on Robotics and Automation (ICRA)},
year={2026}
}
This project is licensed under the MIT License - see the LICENSE file for details.