Migrate from LiteLLM to Synapse with a single command. This tool reads your LiteLLM YAML configuration and generates equivalent Synapse API payloads, a ready-to-run shell script, and a detailed migration report.
| Capability | LiteLLM | Synapse |
|---|---|---|
| OpenAI-compatible proxy | Yes | Yes |
| Multi-provider routing | Basic (round-robin, cost) | Advanced (5-dimension scoring, rules, A/B testing) |
| Semantic caching | No | Yes (L1 Redis + L2 Milvus, <50ms cache hits) |
| Provider budget controls | Basic | Per-provider monthly budgets with enforcement zones |
| Rate limiting | Per-model RPM/TPM | Tiered rate limits with burst support |
| API key management | Virtual keys | Virtual keys + scoped key analytics + anomaly detection |
| User management | No | Full CRUD with SSO/SCIM integration |
| Team hierarchy | No | Teams with budget inheritance and key organization |
| Observability dashboard | Basic Prometheus | Full analytics: savings, routing decisions, model usage |
| Self-hosted deployment | Docker Compose | Helm chart for Kubernetes (VPC, air-gapped, edge, hybrid) |
| Context memory | No | Cross-session memory with project-based organization |
- Python 3.10+
- PyYAML
pip install pyyamlpython litellm_migrate.py \
--input /path/to/litellm_config.yaml \
--output ./migration-output/This creates:
migration-output/payloads/--- JSON payloads for each Synapse API callmigration-output/apply.sh--- Shell script with curl commandsmigration-output/migration_report.md--- Report with warnings and manual steps
cat migration-output/migration_report.mdThe report highlights:
- Mapped: LiteLLM settings that map directly to Synapse
- Manual: Settings needing manual configuration (e.g., custom callbacks, guardrails)
- Skipped: LiteLLM-specific settings with no Synapse equivalent
python litellm_migrate.py \
--input /path/to/litellm_config.yaml \
--output ./migration-output/ \
--apply \
--gateway-url https://your-synapse.example.com \
--api-key sk-syn-xxxpython litellm_migrate.py \
--input /path/to/litellm_config.yaml \
--output ./migration-output/ \
--validate \
--gateway-url https://your-synapse.example.com \
--api-key sk-syn-xxx| File | Description | Synapse API Endpoint |
|---|---|---|
integrations.json |
Provider API credential entries | POST /api/v1/model-catalog/integrations |
providers.json |
Provider configs with budgets and rate limits | POST /api/v1/model-catalog/providers |
models.json |
Model definitions with pricing and capabilities | POST /api/v1/model-catalog/models |
routing_config.json |
Routing strategy and traffic-splitting rules | PUT /api/v1/routing/config |
rate_limits.json |
Per-model rate limit aggregates | PUT /api/v1/rate-limits/{tenant} |
virtual_keys.json |
Virtual key definitions | POST /api/v1/virtual-keys |
apply.sh |
Executable shell script with curl commands | N/A |
migration_report.md |
Summary report | N/A |
| LiteLLM Prefix | Synapse Provider Type |
|---|---|
openai/ |
OPEN_AI |
azure/ |
AZURE_OPEN_AI |
anthropic/ |
ANTHROPIC |
bedrock/ |
AWS_BEDROCK |
vertex_ai/ |
VERTEX_AI |
gemini/ |
GEMINI |
cohere/ |
COHERE |
groq/ |
GROQ |
together_ai/ |
TOGETHER |
replicate/ |
REPLICATE |
mistral/ |
MISTRAL |
fireworks_ai/ |
FIREWORKS |
perplexity/ |
PERPLEXITY |
deepinfra/ |
DEEP_INFRA |
xai/ |
XAI |
deepseek/, huggingface/, ollama/, vllm/ |
CUSTOM (OpenAI-compatible) |
| LiteLLM Strategy | Synapse Strategy |
|---|---|
simple-shuffle |
cost_optimized |
latency-based-routing |
latency_sensitive |
cost-based-routing |
cost_optimized |
usage-based-routing |
balanced |
When a model name appears multiple times in model_list with different providers (e.g., gpt-4 on both OpenAI and Azure), the tool generates traffic-splitting routing rules with weights proportional to each deployment's RPM.
The only change to your application code is the base_url:
from openai import OpenAI
# Before (LiteLLM)
client = OpenAI(
base_url="http://litellm-proxy:4000",
api_key="sk-litellm-xxx",
)
# After (Synapse)
client = OpenAI(
base_url="https://synapse.example.com/v1",
api_key="sk-syn-xxx",
)
# Everything else stays the same
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)import OpenAI from "openai";
// Before (LiteLLM)
const client = new OpenAI({
baseURL: "http://litellm-proxy:4000",
apiKey: "sk-litellm-xxx",
});
// After (Synapse)
const client = new OpenAI({
baseURL: "https://synapse.example.com/v1",
apiKey: "sk-syn-xxx",
});For minimal code changes across many services:
# Before
export OPENAI_BASE_URL=http://litellm-proxy:4000
export OPENAI_API_KEY=sk-litellm-xxx
# After
export OPENAI_BASE_URL=https://synapse.example.com/v1
export OPENAI_API_KEY=sk-syn-xxxAfter migration, take advantage of capabilities LiteLLM doesn't offer.
Automatic caching for semantically similar queries. No code changes required:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{"role": "user", "content": "What are the benefits of cloud computing?"}],
)
raw = response._response
print(f"Cache: {raw.headers.get('x-synapse-cache-status')}") # HIT or MISSOverride routing per-request:
# Route to the cheapest capable model
response = client.chat.completions.create(
model="auto",
messages=[{"role": "user", "content": "What is 2+2?"}],
extra_headers={"X-WorldFlow-Routing": "cheapest"},
)Every Synapse response includes cost and routing metadata:
| Header | Description |
|---|---|
x-synapse-cache-status |
HIT or MISS |
x-worldflow-model |
Which model handled the request |
x-worldflow-cost |
Cost in USD |
x-worldflow-cost-saved |
Savings vs. most expensive alternative |
x-worldflow-routing-reason |
Why this model was selected |
- Run
litellm_migrate.pyin dry-run mode and review the report - Create integrations for each LLM provider
- Create providers with budget and rate limit configuration
- Register models with pricing metadata
- Configure routing strategy and rules
- Create API keys for each team/service
- Update
base_urlin application code or environment variables - Verify cache headers appear in responses
- Confirm routing decisions via
/api/v1/routing/simulate - Set up users and role assignments (if using SSO)
- Review savings analytics after 24 hours of traffic
If issues arise during migration, rollback is straightforward:
- Revert
base_urlto LiteLLM endpoint - Both systems can run in parallel during the transition
- Synapse does not modify LiteLLM's configuration
- Synapse Documentation
- Detailed Migration Guide --- Step-by-step walkthrough with advanced scenarios
- Self-Hosted Deployment --- Private VPC deployment
Apache 2.0