Skip to content

ArietidsZ/vision-language-action

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Vision-Language-Action Model (VLA)

A tiny Vision-Language-Action model for robotic manipulation tasks, implementing advanced Mamba2 blocks and QLoRA optimization techniques.

Features

  • Efficient vision encoding with spatial attention mechanisms
  • Language understanding through transformer encoders
  • Temporal modeling using optimized Mamba2 blocks
  • QLoRA adaptation for parameter-efficient fine-tuning
  • Support for multimodal inputs including contact forces

Installation

[Installation instructions]

Usage

Best model with 43.4% accuracy @2% available at: https://www.dropbox.com/scl/fi/36i9j8nx54uqbpqwycock/best_model.pth?rlkey=bhfq2grswkuc9iu6r8w5bcefh&st=x4hgvyly&dl=0 [Basic usage examples]

Citation

If you use this code in your research, please cite: [Your preferred citation format]

License

Apache 2.0

About

Experiment tiny VLA model

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages