Skip to content

ishitadatta/shiftBench-AV

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Domain Shift / Benchmarking / Fusion Robustness

Open-source Python toolkit for uncertainty-aware sensor-fusion benchmarking under domain shift and sensor degradation.

This repository is designed for research and reproducible benchmarking workflows, with a practical focus on:

  • uncertainty-aware fusion evaluation
  • domain-shift scenario stress testing
  • sensor perturbation and fault injection
  • reproducible benchmark reporting

Capabilities

  • Multiple fusion methods:
    • inverse_variance (baseline)
    • adaptive_reliability (disagreement-aware reliability weighting)
    • counterfactual_consensus (novel method in this repo)
  • Sensor perturbation engine:
    • gaussian_noise
    • dropout
    • bias_drift
    • temporal_jitter
    • uncertainty_inflation
  • Scenario-driven domain shift benchmarking via JSON config
  • Dataset support:
    • synthetic generated dataset
    • SQLite table dataset
  • Metrics:
    • Accuracy
    • Negative Log-Likelihood (NLL)
    • Expected Calibration Error (ECE)
    • Uncertainty-Risk Correlation
  • CLI tools for benchmark execution and demo DB creation

Repository structure

configs/
  sample_benchmark.json
  sqlite_benchmark.json
  nuscenes_mini_benchmark.json
  nuscenes_mini_benchmark_adaptive.json
  nuscenes_mini_benchmark_novel.json
scripts/
  build_nuscenes_sqlite.py
  generate_result_graphs.py
examples/
  run_sqlite_demo.sh
  nuscenes_mini_results_baseline_again.json
  nuscenes_mini_results_adaptive.json
  nuscenes_mini_results_novel.json
  figures/
    nll_comparison.svg
    uncertainty_correlation_comparison.svg
src/fusionbench/
  bench/
    metrics.py
    runner.py
  cli/
    main.py
  core/
    types.py
    utils.py
  data/
    synthetic.py
    sqlite_store.py
  domain_shift/
    scenarios.py
  fusion/
    uncertainty.py
  perturbations/
    operators.py
tests/
pyproject.toml
README.md

Prerequisites

  • Python 3.9+
  • macOS/Linux/Windows shell

Installation

1. Create and activate virtual environment

python3 -m venv .venv
source .venv/bin/activate

2. Install package in editable mode

pip install -e .

3. Optional: install dev dependencies

pip install -e .[dev]

Offline/no-network fallback

If your environment cannot download packages, you can still run this project directly:

PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/sample_benchmark.json --output examples/sample_results.json

Quick start (synthetic dataset)

Run the included benchmark config:

fusionbench run \
  --config configs/sample_benchmark.json \
  --output examples/sample_results.json

The command prints JSON to terminal and writes output to examples/sample_results.json.

Dataset used for advanced experiment

The advanced experiment in this README was run on nuScenes v1.0-mini converted into the toolkit SQLite format (sample_id, label, camera_*, lidar_*, radar_* columns).

  • Converted DB path: data/nuscenes_mini_samples.db
  • Conversion script: scripts/build_nuscenes_sqlite.py
  • Note: the raw dataset itself is not committed to this repository.

SQLite-backed benchmarking workflow

1. Create demo SQLite dataset

fusionbench make-demo-db \
  --db-path examples/demo_samples.db \
  --n-samples 1500 \
  --seed 11

2. Run benchmark using SQLite dataset

fusionbench run \
  --config configs/sqlite_benchmark.json \
  --output examples/sqlite_results.json

3. One-command demo script

bash examples/run_sqlite_demo.sh

CLI reference

fusionbench run

Run benchmark from config.

fusionbench run --config <path/to/config.json> --output <path/to/results.json>

fusionbench make-demo-db

Create a synthetic SQLite dataset.

fusionbench make-demo-db --db-path <db.sqlite> [--n-samples 1500] [--seed 42]

Config format

Top-level JSON keys:

  • benchmark
  • dataset
  • scenarios

benchmark

{
  "name": "domain_shift_fusion_baseline",
  "seed": 42,
  "calibration_bins": 10,
  "fusion_method": "inverse_variance"
}

For advanced fusion methods you can also provide:

  • fusion_method: inverse_variance | adaptive_reliability | counterfactual_consensus
  • Method parameters, e.g. uncertainty_power, disagreement_gain, agreement_gamma, counterfactual_beta

dataset (synthetic)

{
  "source": "synthetic",
  "n_samples": 1500,
  "seed": 123
}

dataset (sqlite)

{
  "source": "sqlite",
  "db_path": "examples/demo_samples.db",
  "table": "samples",
  "limit": 1000
}

scenarios

Each scenario supports an operations list of perturbations.

{
  "name": "compound_night_shift",
  "description": "Multimodal degradation under low light and weather",
  "operations": [
    { "type": "gaussian_noise", "target": "camera", "std": 0.11 },
    { "type": "dropout", "target": "lidar", "probability": 0.15 },
    { "type": "bias_drift", "target": "radar", "offset": 0.05 },
    { "type": "temporal_jitter", "target": "all", "window": 4 },
    { "type": "uncertainty_inflation", "target": "all", "factor": 1.2 }
  ]
}

Metrics interpretation

  • Accuracy: classification correctness under thresholded fused score
  • NLL: probabilistic quality (lower is better)
  • ECE: calibration gap between confidence and empirical accuracy (lower is better)
  • Uncertainty-Risk Correlation: whether higher uncertainty tracks prediction failures

nuScenes mini experiment results

Compared methods

  • inverse_variance (baseline)
  • adaptive_reliability
  • counterfactual_consensus (novel)

Numeric summary

Scenario Method Accuracy NLL ECE Uncertainty-Risk Corr
baseline inverse_variance 0.655941 1.385254 0.322141 -0.008785
baseline adaptive_reliability 0.655941 0.711751 0.193716 0.338457
baseline counterfactual_consensus 0.660891 1.259207 0.322773 0.259458
weather_and_occlusion_shift inverse_variance 0.655941 1.431917 0.298746 -0.028407
weather_and_occlusion_shift adaptive_reliability 0.655941 0.785094 0.202290 0.181156
weather_and_occlusion_shift counterfactual_consensus 0.658416 1.667270 0.305797 0.174910

Graphs

NLL comparison

Uncertainty-risk correlation comparison

Quick interpretation

  • adaptive_reliability gives the best probabilistic quality on this dataset (lower NLL, lower ECE).
  • counterfactual_consensus improves uncertainty-risk correlation substantially over baseline and slightly improves accuracy.
  • Under stronger perturbation, counterfactual_consensus keeps better uncertainty correlation than baseline, but NLL remains an open optimization target.

Testing

Run the unit test suite:

pytest

Reproducibility guidance

  • Keep seed fixed in both benchmark and dataset configs.
  • Version-control your config files and output JSON.
  • Compare scenarios using identical dataset and model settings.

Extending this project

  1. Plug in real sensor datasets by implementing an additional loader under src/fusionbench/data/.
  2. Add advanced fusion methods under src/fusionbench/fusion/.
  3. Add perturbation types (occlusion masks, blur kernels, packet loss models) in src/fusionbench/perturbations/operators.py.
  4. Add domain-specific metrics in src/fusionbench/bench/metrics.py.

Typical troubleshooting

  • fusionbench: command not found
    • Ensure virtual environment is active and pip install -e . completed successfully.
  • dataset.db_path is required
    • Set dataset.source to sqlite only when db_path is provided.
  • SQLite table errors
    • Confirm the target table exists and columns match expected schema.

References

  1. Holger Caesar et al., "nuScenes: A multimodal dataset for autonomous driving", CVPR 2020.
    https://arxiv.org/abs/1903.11027
  2. nuScenes dataset website:
    https://www.nuscenes.org/

License

MIT (see LICENSE).

Last verified

Verified on March 4, 2026 with:

  • PYTHONPATH=src PYTEST_DISABLE_PLUGIN_AUTOLOAD=1 python3 -m pytest -q -> 6 passed
  • PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/sample_benchmark.json --output examples/sample_results.json
  • PYTHONPATH=src python3 -m fusionbench.cli.main make-demo-db --db-path examples/demo_samples.db --n-samples 1500 --seed 11
  • PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/sqlite_benchmark.json --output examples/sqlite_results.json
  • python3 scripts/build_nuscenes_sqlite.py --nuscenes-meta-dir data/v1.0-mini --out-db data/nuscenes_mini_samples.db
  • PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/nuscenes_mini_benchmark.json --output examples/nuscenes_mini_results_baseline_again.json
  • PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/nuscenes_mini_benchmark_adaptive.json --output examples/nuscenes_mini_results_adaptive.json
  • PYTHONPATH=src python3 -m fusionbench.cli.main run --config configs/nuscenes_mini_benchmark_novel.json --output examples/nuscenes_mini_results_novel.json

About

Open-source Python toolkit for uncertainty-aware sensor-fusion benchmarking under domain shift and sensor degradation

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages