CoSA: Collection of Small Agents

CoSA is a modular framework for building, training, and deploying specialized LLM-powered agents. It provides the infrastructure for Lupin, a voice-first conversational AI system with trust-aware human-in-the-loop decision making.

Overview

CoSA implements a collection of targeted agents, each specialized for specific tasks:

Text generation and completion
Mathematics and calculations
Calendar management and scheduling
Weather reporting
Todo list management
Code execution and debugging
Hybrid TTS Streaming: Fast, reliable text-to-speech with no word truncation
And more...

TTS Implementation Architecture

The system includes two high-performance TTS solutions optimized for different use cases:

Hybrid TTS (OpenAI)

Architecture: OpenAI TTS → FastAPI → WebSocket → Client

Server: stream_tts_hybrid() - forwards OpenAI chunks via WebSocket
Client: Collects all chunks, then plays complete audio file
Benefits: 50% faster than complete file approach, zero truncation, universal compatibility

Instant Mode TTS (ElevenLabs)

Architecture: ElevenLabs Streaming API → FastAPI → WebSocket → Client

Server: Direct WebSocket streaming with progressive chunk delivery
Client: Immediate playback of audio chunks as received
Benefits: Ultra-low latency, real-time streaming, significantly faster than hybrid mode
Use Case: Interactive conversations requiring immediate audio response

Endpoints:

/api/get-audio - Hybrid OpenAI approach for reliability
/api/get-audio-elevenlabs - Instant ElevenLabs streaming for speed

Project Structure

/agents: Individual agent implementations
- agent_base.py: Abstract base class for all agents
- llm.py, llm_v0.py: LLM service integration (legacy)
- /v010: Current agent architecture with Pydantic XML processing
- /io_models/: Pydantic XML models and utilities
  - xml_models.py: Core XML response models with template generation
  - utils/prompt_template_processor.py: Dynamic template processing
- /v1: New modular LLM client architecture
  - llm_client.py: Unified client for all LLM providers
  - llm_client_factory.py: Factory pattern for client creation
  - token_counter.py: Cross-provider token counting
- Specialized agents for math, calendaring, weather, etc.
/app: Core application components
- configuration_manager.py: Settings management with inheritance
- util_llm_client.py: Client for LLM service communication
/memory: Data persistence and memory management
/rest: REST API infrastructure
- Queue management, WebSocket routers, authentication
- Producer-consumer pattern with event-driven processing
/tools: External integrations and tools
- search_gib.py: Internal search capabilities
- search_kagi.py: Integration with Kagi search API
/training: Model training infrastructure
- peft_trainer.py: PEFT (Parameter-Efficient Fine-Tuning) implementation
- quantizer.py: Model quantization for deployment
- xml_coordinator.py: Structured XML training data generation/validation
/utils: Shared utility functions

Getting Started

Prerequisites

Python 3.9+
PyTorch
Transformers library
Hugging Face account (for model access)

For a complete list of dependencies, see the requirements.txt file.

Installation

# Clone the repository
git clone git@github.com:deepily/cosa.git
cd cosa

# Install dependencies
pip install -r requirements.txt

Usage

CoSA is designed to be used as a submodule/subtree within the parent "Lupin" project (formerly genie-in-the-box), but can also be used independently for agent development.

TBD: Usage examples and API documentation will be provided in future updates.

LLM Model Training

CoSA includes tools for fine-tuning and deploying LLM models using Parameter-Efficient Fine-Tuning (PEFT):

# Example: Fine-tune a model using PEFT
python -m cosa.training.peft_trainer \
  --model "mistralai/Mistral-7B-Instruct-v0.2" \
  --model-name "Mistral-7B-Instruct-v0.2" \
  --test-train-path "/path/to/training/data" \
  --lora-dir "/path/to/output/lora" \
  --post-training-stats

For detailed instructions on using the PEFT trainer, including all available options, data format requirements, and advanced features like GPU management, please refer to the PEFT Trainer README.

COSA Framework Code Flow Diagram

Based on analysis of the codebase, here's how the COSA (Collection of Small Agents) framework works:

1. Entry Points (FastAPI)

FastAPI Server (fastapi_app/main.py) - CURRENT
     |
     ├── WebSocket endpoints
     ├── REST API endpoints
     └── Async handlers
     
Flask Server (app.py) - DEPRECATED/REMOVED
     ├── /push endpoint (migrated to FastAPI)
     ├── /api/upload-and-transcribe-* (migrated)
     └── Socket.IO connections (replaced with WebSockets)

2. Request Flow Architecture

User Request (voice/text)
     |
     v
MultiModalMunger (preprocessing)
     |
     v
TodoFifoQueue.push_job()
     ├── Check for similar snapshots
     ├── Parse salutations
     ├── Get question gist (via Gister)
     └── Route to agent via LLM
          |
          v
     Agent Router (LLM-based)
          ├── "agent router go to calendar" → CalendaringAgent
          ├── "agent router go to math" → MathAgent
          ├── "agent router go to todo list" → TodoListAgent
          ├── "agent router go to date and time" → DateAndTimeAgent
          ├── "agent router go to weather" → WeatherAgent
          └── "agent router go to receptionist" → ReceptionistAgent

3. Queue Management System

TodoFifoQueue (pending jobs)
     |
     v
RunningFifoQueue.enter_running_loop()
     ├── Pop from TodoQueue
     ├── Execute job (Agent or SolutionSnapshot)
     └── Route to appropriate queue:
          ├── DoneQueue (successful)
          └── DeadQueue (errors)

4. Agent Execution Flow

AgentBase (abstract)
     |
     ├── run_prompt() → LlmClient → LLM Service
     ├── run_code() → RunnableCode → Python exec()
     └── run_formatter() → RawOutputFormatter
          |
          v
     do_all() orchestrates the complete flow

5. Core Components

ConfigurationManager

Singleton pattern
Manages lupin-app.ini settings (formerly gib-app.ini)
Environment variable overrides

LlmClient/LlmClientFactory

Unified interface for multiple LLM providers
Supports OpenAI, Groq, Google, Anthropic
Handles streaming/non-streaming modes

SolutionSnapshot

Serializes successful agent runs
Stores code, prompts, responses
Enables solution reuse

Memory Components

InputAndOutputTable: Logs all I/O
EmbeddingManager: Manages embeddings (singleton)
GistNormalizer: Text preprocessing (singleton)
SolutionSnapshotManager: Manages saved solutions

6. Data Flow Example

1. User: "What's the weather today?"
2. FastAPI receives request
3. MultiModalMunger processes input
4. TodoFifoQueue:
   - Checks for similar snapshots
   - No match found
   - Routes to weather agent via LLM
5. WeatherAgent created and queued
6. RunningFifoQueue executes:
   - Calls agent.do_all()
   - Agent queries weather API
   - Formats response
7. Results sent to DoneQueue
8. Audio response generated via TTS
9. Response sent to user

Key Design Patterns

Singleton: ConfigurationManager, EmbeddingManager, GistNormalizer
Abstract Factory: LlmClientFactory
Template Method: AgentBase.do_all()
Queue-based Architecture: Async job processing
Serialization: SolutionSnapshot for persistence

The framework elegantly handles voice/text input, routes to specialized agents, executes code dynamically, and maintains a memory of successful solutions for reuse.

Development Guidelines

Please refer to CLAUDE.md for detailed code style and development guidelines.

Research and Development

For current research and planning documents, see the RND directory, which includes:

Architecture and Refactoring

LLM Client Architecture Refactoring Plan: Comprehensive plan for improving the v010 LLM client architecture
LLM Client Refactoring Progress: Progress tracker for the LLM client refactoring project
LLM Refactoring Analysis: Analysis of LLM component refactoring needs
Agent Migration v000 to v010 Plan: Migration strategy for agent architecture

Implementation Plans

Screen Reader Agent Implementation Plan: Plan for screen reader accessibility agent
Agent Factory Testing Plan: Testing strategy for agent factory components
CI Testing Implementation Plan: Continuous integration testing setup

Analysis and Strategy

LLM Prompt Format Analysis: Analysis of prompt formatting approaches
Prompt Templating Strategies: Strategies for prompt template management
Python Package Distribution Plan: Plan for package distribution strategy
Versioning and CI/CD Strategy: Version management and deployment strategy

What's New in v0.1.5 — Voice-First Human in the Loop

Trust-Aware Decision Proxy

Universal Prediction Engine (UPE) — 7 prediction slices with response_type filtering to prevent cross-type contamination
Bayesian Beta-Bernoulli Trust Model — Per-agent trust learning with conjugate prior updates
Thompson Sampling — Exploration-exploitation balance for auto-approve vs. escalate decisions
Conformal Prediction — Calibrated confidence intervals with statistical guarantees
LanceDB Preference Embeddings — Semantic similarity search with response_type filtering and MC option validation
L1-L5 Trust Escalation — Five trust levels from "always ask" to "full autonomy" with circuit breaker pattern

Integration Test Infrastructure

Hot-Swap Config — Running dev server toggles between config blocks at runtime via /api/init?config_block_id=...
GET /api/server-info — Unauthenticated introspection endpoint (config block, masked DB URL, environment)
swap_database() — Runtime database environment switching (development/testing/production)
Database Disambiguation — lupin_db split into lupin_db_dev and lupin_db_prod

Credential Consolidation

Unified ~/.lupin/config — Three credential stores collapsed into one file
Fail-hard on missing config — Removed all legacy fallbacks; FileNotFoundError with migration instructions
Strict Project Detection — KNOWN_PROJECTS registry + is_known_project() for MCP validation

Voice & Notification Infrastructure

user_initiated_message type for voice input routing
QualifierClassification model + display_qualifier_widget notification field
Programmatic session ID regex tightened to require hyphen
Dead event cleanup — Removed active_conversation_changed (emitted but never subscribed)

New Agents & Agent Enhancements

SWE Team Agent — 4-phase agentic software development with trust-aware decision proxy
Everyday Calculator Agent — Natural language calculator with MathAgent fallback
CRUD for DataFrames Agent — Voice-controlled create/read/update/delete for Pandas DataFrames
Notification Proxy Agent — Phi-4 LLM fuzzy script matching for automated interactive testing

CJ Flow (COSA Jobs Flow)

Agentic Job System — Background execution engine for long-running Claude Agent SDK tasks
Deep Research + Podcast Generator — Research-to-podcast chained pipeline
Dry-Run Mode — Test agentic jobs without API costs
job_state_transition events for real-time job status via WebSocket

Testing (2,075+ unit tests)

+905 unit tests across trust engine, session bridge, hooks, credentials, prediction engine
WebSocket tests: 50/50 passing
Integration tests: 136 passed (comprehensive auth, admin, queue filtering)
Interactive proxy tests: 12 scenarios across Calculator, CRUD, and Expediter agents

Earlier Milestones

v0.1.4 — cosa-voice MCP Server, Runtime Argument Expeditor, batch voice questions
v0.1.3 — CJ Flow agentic job system, JWT WebSocket auth, unified LoRA training
v0.1.2 — LanceDB migration with 100% feature parity
v0.1.1 — WebSocket FastAPI test suite
v0.1.0 — Complete Flask elimination, FastAPI-only architecture

Infrastructure Foundation (pre-v0.1.0)

Pydantic XML Migration — All 8 agents migrated with 4 core models and 3-tier strategy
Design by Contract Documentation — 100% coverage across all 73 Python modules
Modular LLM Client Architecture — Vendor-agnostic support for OpenAI, Groq, Anthropic, Google
Producer-Consumer Queue — 6,700x performance improvement via event-driven processing
WebSocket User Routing — Persistent user-centric event routing with multi-session support

License

This project is licensed under the terms specified in the LICENSE file.

Name		Name	Last commit message	Last commit date
Latest commit History 102 Commits
.claude/commands		.claude/commands
agents		agents
config		config
crud_for_dataframes		crud_for_dataframes
docs/images		docs/images
history		history
memory		memory
orchestration		orchestration
repo		repo
rest		rest
rnd		rnd
tests		tests
tools		tools
training		training
utils		utils
.gitignore		.gitignore
CHANGELOG_TYPE_HINTS.md		CHANGELOG_TYPE_HINTS.md
CLAUDE.local.md		CLAUDE.local.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
README.md		README.md
__init__.py		__init__.py
history.md		history.md
history.md.backup-20251030		history.md.backup-20251030
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

CoSA: Collection of Small Agents

Overview

TTS Implementation Architecture

Hybrid TTS (OpenAI)

Instant Mode TTS (ElevenLabs)

Project Structure

Getting Started

Prerequisites

Installation

Usage

LLM Model Training

COSA Framework Code Flow Diagram

1. Entry Points (FastAPI)

2. Request Flow Architecture

3. Queue Management System

4. Agent Execution Flow

5. Core Components

6. Data Flow Example

Key Design Patterns

Development Guidelines

Research and Development

Architecture and Refactoring

Implementation Plans

Analysis and Strategy

What's New in v0.1.5 — Voice-First Human in the Loop

Trust-Aware Decision Proxy

Integration Test Infrastructure

Credential Consolidation

Voice & Notification Infrastructure

New Agents & Agent Enhancements

CJ Flow (COSA Jobs Flow)

Testing (2,075+ unit tests)

Earlier Milestones

Infrastructure Foundation (pre-v0.1.0)

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 10

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages