Skip to content

deepily/cosa

Repository files navigation

CoSA: Collection of Small Agents

CoSA is a modular framework for building, training, and deploying specialized LLM-powered agents. It provides the infrastructure for Lupin, a voice-first conversational AI system with trust-aware human-in-the-loop decision making.

Genie robots with microphones

Overview

CoSA implements a collection of targeted agents, each specialized for specific tasks:

  • Text generation and completion
  • Mathematics and calculations
  • Calendar management and scheduling
  • Weather reporting
  • Todo list management
  • Code execution and debugging
  • Hybrid TTS Streaming: Fast, reliable text-to-speech with no word truncation
  • And more...

TTS Implementation Architecture

The system includes two high-performance TTS solutions optimized for different use cases:

Hybrid TTS (OpenAI)

Architecture: OpenAI TTS → FastAPI → WebSocket → Client

  • Server: stream_tts_hybrid() - forwards OpenAI chunks via WebSocket
  • Client: Collects all chunks, then plays complete audio file
  • Benefits: 50% faster than complete file approach, zero truncation, universal compatibility

Instant Mode TTS (ElevenLabs)

Architecture: ElevenLabs Streaming API → FastAPI → WebSocket → Client

  • Server: Direct WebSocket streaming with progressive chunk delivery
  • Client: Immediate playback of audio chunks as received
  • Benefits: Ultra-low latency, real-time streaming, significantly faster than hybrid mode
  • Use Case: Interactive conversations requiring immediate audio response

Endpoints:

  • /api/get-audio - Hybrid OpenAI approach for reliability
  • /api/get-audio-elevenlabs - Instant ElevenLabs streaming for speed

Project Structure

  • /agents: Individual agent implementations
    • agent_base.py: Abstract base class for all agents
    • llm.py, llm_v0.py: LLM service integration (legacy)
    • /v010: Current agent architecture with Pydantic XML processing
    • /io_models/: Pydantic XML models and utilities
      • xml_models.py: Core XML response models with template generation
      • utils/prompt_template_processor.py: Dynamic template processing
    • /v1: New modular LLM client architecture
      • llm_client.py: Unified client for all LLM providers
      • llm_client_factory.py: Factory pattern for client creation
      • token_counter.py: Cross-provider token counting
    • Specialized agents for math, calendaring, weather, etc.
  • /app: Core application components
    • configuration_manager.py: Settings management with inheritance
    • util_llm_client.py: Client for LLM service communication
  • /memory: Data persistence and memory management
  • /rest: REST API infrastructure
    • Queue management, WebSocket routers, authentication
    • Producer-consumer pattern with event-driven processing
  • /tools: External integrations and tools
    • search_gib.py: Internal search capabilities
    • search_kagi.py: Integration with Kagi search API
  • /training: Model training infrastructure
    • peft_trainer.py: PEFT (Parameter-Efficient Fine-Tuning) implementation
    • quantizer.py: Model quantization for deployment
    • xml_coordinator.py: Structured XML training data generation/validation
  • /utils: Shared utility functions

Getting Started

Prerequisites

  • Python 3.9+
  • PyTorch
  • Transformers library
  • Hugging Face account (for model access)

For a complete list of dependencies, see the requirements.txt file.

Installation

# Clone the repository
git clone git@github.com:deepily/cosa.git
cd cosa

# Install dependencies
pip install -r requirements.txt

Usage

CoSA is designed to be used as a submodule/subtree within the parent "Lupin" project (formerly genie-in-the-box), but can also be used independently for agent development.

TBD: Usage examples and API documentation will be provided in future updates.

LLM Model Training

CoSA includes tools for fine-tuning and deploying LLM models using Parameter-Efficient Fine-Tuning (PEFT):

# Example: Fine-tune a model using PEFT
python -m cosa.training.peft_trainer \
  --model "mistralai/Mistral-7B-Instruct-v0.2" \
  --model-name "Mistral-7B-Instruct-v0.2" \
  --test-train-path "/path/to/training/data" \
  --lora-dir "/path/to/output/lora" \
  --post-training-stats

For detailed instructions on using the PEFT trainer, including all available options, data format requirements, and advanced features like GPU management, please refer to the PEFT Trainer README.

COSA Framework Code Flow Diagram

Based on analysis of the codebase, here's how the COSA (Collection of Small Agents) framework works:

1. Entry Points (FastAPI)

FastAPI Server (fastapi_app/main.py) - CURRENT
     |
     ├── WebSocket endpoints
     ├── REST API endpoints
     └── Async handlers
     
Flask Server (app.py) - DEPRECATED/REMOVED
     ├── /push endpoint (migrated to FastAPI)
     ├── /api/upload-and-transcribe-* (migrated)
     └── Socket.IO connections (replaced with WebSockets)

2. Request Flow Architecture

User Request (voice/text)
     |
     v
MultiModalMunger (preprocessing)
     |
     v
TodoFifoQueue.push_job()
     ├── Check for similar snapshots
     ├── Parse salutations
     ├── Get question gist (via Gister)
     └── Route to agent via LLM
          |
          v
     Agent Router (LLM-based)
          ├── "agent router go to calendar" → CalendaringAgent
          ├── "agent router go to math" → MathAgent
          ├── "agent router go to todo list" → TodoListAgent
          ├── "agent router go to date and time" → DateAndTimeAgent
          ├── "agent router go to weather" → WeatherAgent
          └── "agent router go to receptionist" → ReceptionistAgent

3. Queue Management System

TodoFifoQueue (pending jobs)
     |
     v
RunningFifoQueue.enter_running_loop()
     ├── Pop from TodoQueue
     ├── Execute job (Agent or SolutionSnapshot)
     └── Route to appropriate queue:
          ├── DoneQueue (successful)
          └── DeadQueue (errors)

4. Agent Execution Flow

AgentBase (abstract)
     |
     ├── run_prompt() → LlmClient → LLM Service
     ├── run_code() → RunnableCode → Python exec()
     └── run_formatter() → RawOutputFormatter
          |
          v
     do_all() orchestrates the complete flow

5. Core Components

ConfigurationManager

  • Singleton pattern
  • Manages lupin-app.ini settings (formerly gib-app.ini)
  • Environment variable overrides

LlmClient/LlmClientFactory

  • Unified interface for multiple LLM providers
  • Supports OpenAI, Groq, Google, Anthropic
  • Handles streaming/non-streaming modes

SolutionSnapshot

  • Serializes successful agent runs
  • Stores code, prompts, responses
  • Enables solution reuse

Memory Components

  • InputAndOutputTable: Logs all I/O
  • EmbeddingManager: Manages embeddings (singleton)
  • GistNormalizer: Text preprocessing (singleton)
  • SolutionSnapshotManager: Manages saved solutions

6. Data Flow Example

1. User: "What's the weather today?"
2. FastAPI receives request
3. MultiModalMunger processes input
4. TodoFifoQueue:
   - Checks for similar snapshots
   - No match found
   - Routes to weather agent via LLM
5. WeatherAgent created and queued
6. RunningFifoQueue executes:
   - Calls agent.do_all()
   - Agent queries weather API
   - Formats response
7. Results sent to DoneQueue
8. Audio response generated via TTS
9. Response sent to user

Key Design Patterns

  • Singleton: ConfigurationManager, EmbeddingManager, GistNormalizer
  • Abstract Factory: LlmClientFactory
  • Template Method: AgentBase.do_all()
  • Queue-based Architecture: Async job processing
  • Serialization: SolutionSnapshot for persistence

The framework elegantly handles voice/text input, routes to specialized agents, executes code dynamically, and maintains a memory of successful solutions for reuse.

Development Guidelines

Please refer to CLAUDE.md for detailed code style and development guidelines.

Research and Development

For current research and planning documents, see the RND directory, which includes:

Architecture and Refactoring

Implementation Plans

Analysis and Strategy

What's New in v0.1.5 — Voice-First Human in the Loop

Trust-Aware Decision Proxy

  • Universal Prediction Engine (UPE) — 7 prediction slices with response_type filtering to prevent cross-type contamination
  • Bayesian Beta-Bernoulli Trust Model — Per-agent trust learning with conjugate prior updates
  • Thompson Sampling — Exploration-exploitation balance for auto-approve vs. escalate decisions
  • Conformal Prediction — Calibrated confidence intervals with statistical guarantees
  • LanceDB Preference Embeddings — Semantic similarity search with response_type filtering and MC option validation
  • L1-L5 Trust Escalation — Five trust levels from "always ask" to "full autonomy" with circuit breaker pattern

Integration Test Infrastructure

  • Hot-Swap Config — Running dev server toggles between config blocks at runtime via /api/init?config_block_id=...
  • GET /api/server-info — Unauthenticated introspection endpoint (config block, masked DB URL, environment)
  • swap_database() — Runtime database environment switching (development/testing/production)
  • Database Disambiguationlupin_db split into lupin_db_dev and lupin_db_prod

Credential Consolidation

  • Unified ~/.lupin/config — Three credential stores collapsed into one file
  • Fail-hard on missing config — Removed all legacy fallbacks; FileNotFoundError with migration instructions
  • Strict Project DetectionKNOWN_PROJECTS registry + is_known_project() for MCP validation

Voice & Notification Infrastructure

  • user_initiated_message type for voice input routing
  • QualifierClassification model + display_qualifier_widget notification field
  • Programmatic session ID regex tightened to require hyphen
  • Dead event cleanup — Removed active_conversation_changed (emitted but never subscribed)

New Agents & Agent Enhancements

  • SWE Team Agent — 4-phase agentic software development with trust-aware decision proxy
  • Everyday Calculator Agent — Natural language calculator with MathAgent fallback
  • CRUD for DataFrames Agent — Voice-controlled create/read/update/delete for Pandas DataFrames
  • Notification Proxy Agent — Phi-4 LLM fuzzy script matching for automated interactive testing

CJ Flow (COSA Jobs Flow)

  • Agentic Job System — Background execution engine for long-running Claude Agent SDK tasks
  • Deep Research + Podcast Generator — Research-to-podcast chained pipeline
  • Dry-Run Mode — Test agentic jobs without API costs
  • job_state_transition events for real-time job status via WebSocket

Testing (2,075+ unit tests)

  • +905 unit tests across trust engine, session bridge, hooks, credentials, prediction engine
  • WebSocket tests: 50/50 passing
  • Integration tests: 136 passed (comprehensive auth, admin, queue filtering)
  • Interactive proxy tests: 12 scenarios across Calculator, CRUD, and Expediter agents

Earlier Milestones

  • v0.1.4 — cosa-voice MCP Server, Runtime Argument Expeditor, batch voice questions
  • v0.1.3 — CJ Flow agentic job system, JWT WebSocket auth, unified LoRA training
  • v0.1.2 — LanceDB migration with 100% feature parity
  • v0.1.1 — WebSocket FastAPI test suite
  • v0.1.0 — Complete Flask elimination, FastAPI-only architecture

Infrastructure Foundation (pre-v0.1.0)

  • Pydantic XML Migration — All 8 agents migrated with 4 core models and 3-tier strategy
  • Design by Contract Documentation — 100% coverage across all 73 Python modules
  • Modular LLM Client Architecture — Vendor-agnostic support for OpenAI, Groq, Anthropic, Google
  • Producer-Consumer Queue — 6,700x performance improvement via event-driven processing
  • WebSocket User Routing — Persistent user-centric event routing with multi-session support

License

This project is licensed under the terms specified in the LICENSE file.

About

CoSA is a Collection of Small Agents

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors