Skip to content

Temikus/butter

Repository files navigation

Butter logo

Butter comic

Release CI Go Report Card Go Version License

A blazingly fast AI proxy gateway written in Go. Butter sits between your application and AI providers, offering a unified OpenAI-compatible API with minimal latency overhead.

Inspired by Bifrost, but with a focus on simplicity, extensibility via WASM plugins, and raw performance.

Your App ──▶ Butter ──▶ OpenAI / Anthropic / Gemini / Groq / Mistral / ...
                │
                ├── Unified OpenAI-compatible API
                ├── Automatic failover & retries
                ├── Weighted key rotation
                └── Plugin hooks (Go + WASM)

Features

  • OpenAI-compatible API — /v1/chat/completions (streaming & non-streaming), /v1/embeddings, /v1/models
  • 9 providers: OpenAI, Anthropic, Gemini, OpenRouter, Groq, Mistral, Together.ai, Fireworks, Perplexity — shared openaicompat base for any OpenAI-compatible API
  • Anthropic & Gemini format translation (OpenAI requests automatically converted to/from native formats)
  • Multi-provider routing with model-specific provider lists and priority/round-robin strategies
  • Weighted random key selection with per-key model allowlists
  • Multi-provider failover with configurable retry-on status codes and exponential backoff
  • WASM plugin sandbox via Extism/wazero — load external .wasm plugins with zero CGo, full sandbox isolation
  • Prompt injection guard WASM plugin — scans requests for ~60 injection patterns across 7 categories (instruction override, jailbreak, prompt extraction, etc.) with Unicode bypass detection; block, log, or tag modes
  • Plugin system with ordered hook chains (pre_http, post_http, pre_llm, post_llm, stream chunks, observability traces)
  • Plugin short-circuit support (plugins can reject or rewrite requests before they reach the provider)
  • Built-in rate limiter plugin (token bucket, global or per-IP, configurable RPM)
  • Built-in request logging plugin (structured slog, provider/model/status/duration)
  • Built-in Prometheus metrics plugin (OTel SDK instruments, /metrics endpoint)
  • Built-in distributed tracing plugin (OTel SDK, OTLP HTTP export)
  • Response caching (in-memory LRU or Redis backend; SHA256 cache key; temperature=0 non-streaming only)
  • Config hot-reload (mtime polling, atomic engine swap — no restart required)
  • Application keys for usage tracking and attribution — vend btr_ tokens, track per-key request/token counts, optional enforcement via require_key
  • Raw HTTP passthrough for provider-native endpoints (/native/{provider}/*)
  • Health check endpoint (/healthz)
  • Graceful shutdown (SIGINT/SIGTERM)
  • Multi-stage Docker image (distroless base)

Coming soon:

  • More providers (Azure OpenAI, AWS Bedrock, Vertex AI)

Quick Start

Prerequisites

1. Install

Download the latest binary from GitHub Releases, or build from source:

git clone https://github.com/temikus/butter.git
cd butter
go build -o pkg/bin/butter ./cmd/butter/

2. Configure

cp config.example.yaml config.yaml

Edit config.yaml or set environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-v1-..."

The config file supports ${ENV_VAR} substitution, so the default config.example.yaml works out of the box once the environment variables are set.

Example config.yaml
server:
  address: ":8080"
  read_timeout: 30s
  write_timeout: 120s

providers:
  openai:
    base_url: https://api.openai.com/v1
    keys:
      - key: "${OPENAI_API_KEY}"
        weight: 1

  anthropic:
    base_url: https://api.anthropic.com/v1
    keys:
      - key: "${ANTHROPIC_API_KEY}"
        weight: 1

  openrouter:
    base_url: https://openrouter.ai/api/v1
    keys:
      - key: "${OPENROUTER_API_KEY}"
        weight: 1

routing:
  default_provider: openrouter
  models:
    "gpt-4o":
      providers: [openai, openrouter]
      strategy: priority
    "claude-sonnet-4-20250514":
      providers: [anthropic, openrouter]
      strategy: priority
  failover:
    enabled: true
    max_retries: 3
    retry_on: [429, 500, 502, 503, 504]
    backoff:
      initial: 100ms
      multiplier: 2.0
      max: 5s

plugins:
  ratelimit:
    requests_per_minute: 60
    per_ip: false
  requestlog:
    level: info
  metrics: {}

3. Run

./pkg/bin/butter -config config.yaml

You should see:

{"level":"INFO","msg":"butter listening","address":":8080"}

4. Send a request

Non-streaming:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello in three languages"}]
  }'

Streaming:

curl http://localhost:8080/v1/chat/completions \
  --no-buffer \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Health check:

curl http://localhost:8080/healthz
# ok

Drop-in replacement

Butter is compatible with any OpenAI SDK client. Just point the base URL at your Butter instance:

Python (openai SDK):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # Butter uses its own configured keys
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Node.js (openai SDK):

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);

Development

A justfile is provided for common tasks:

just build              # Build binary (with commit hash)
just build-release      # Build with full version info from git
just serve              # Run with config (auto-loads API keys from ~/.openai/api-key etc.)
just test               # Run all tests with race detector
just test-integration   # Run integration tests (mock servers, no real API calls)
just lint               # Run golangci-lint
just check              # Run vet + lint + test + integration tests
just bench              # Run benchmarks with allocation reporting
just release-snapshot   # Test GoReleaser locally (no publish)
just docker-build       # Build multi-stage Docker image
just docker-run         # Run Docker container with config
just build-example-wasm # Compile example WASM plugin (requires TinyGo)
just build-injection-guard # Compile prompt injection guard plugin (requires TinyGo)
just build-wasm         # Build all WASM plugins
just clean              # Remove built binaries and compiled WASM plugins

Or use Go directly:

go run ./cmd/butter/ -config config.yaml
go test ./... -v -race -count=1
go test ./... -bench=. -benchmem

Project structure

butter/
├── cmd/butter/                  Main binary
├── internal/
│   ├── config/                  YAML config with env var substitution + hot-reload watcher
│   ├── transport/               HTTP server and handlers
│   ├── proxy/                   Core dispatch engine (routing, failover, key selection)
│   ├── appkey/                  Application key store (usage tracking, token counting)
│   ├── cache/                   Cache interface + in-memory LRU and Redis backends
│   ├── plugin/                  Plugin system (interfaces, chain, manager)
│   │   ├── wasm/                WASM plugin host (Extism/wazero)
│   │   └── builtin/
│   │       ├── ratelimit/       Token bucket rate limiter plugin
│   │       ├── requestlog/      Request logging plugin
│   │       ├── metrics/         Prometheus metrics plugin (OTel SDK)
│   │       └── tracing/         Distributed tracing plugin (OTel, OTLP HTTP)
│   └── provider/
│       ├── provider.go          Provider interface & types
│       ├── registry.go          Thread-safe provider registry
│       ├── openaicompat/        Reusable base for OpenAI-compatible APIs
│       ├── openai/              OpenAI provider
│       ├── anthropic/           Anthropic provider (format translation)
│       ├── gemini/              Google Gemini provider (format translation)
│       ├── openrouter/          OpenRouter provider
│       ├── groq/                Groq provider
│       ├── mistral/             Mistral provider
│       ├── together/            Together.ai provider
│       ├── fireworks/           Fireworks provider
│       └── perplexity/          Perplexity provider
├── plugin/sdk/                  Public JSON ABI types for WASM plugin authors
├── plugins/
│   ├── example-wasm/            Example WASM plugin (TinyGo, pre_http hook)
│   └── prompt-injection-guard/  Prompt injection detection WASM plugin
├── tests/integration/           Integration tests with mock provider servers
├── config.example.yaml
├── Dockerfile                   Multi-stage distroless image
├── justfile
└── go.mod

Performance Targets

Metric Target
Per-request overhead (no plugins) <50us
Per-request overhead (built-in plugins) <100us
Per-request overhead (1 WASM plugin) <150us
Streaming TTFB overhead <1ms
Memory at idle <30MB

License

Apache 2.0 License. See LICENSE for details.