GitHub - Temikus/butter: A blazingly fast AI proxy gateway.

A blazingly fast AI proxy gateway written in Go. Butter sits between your application and AI providers, offering a unified OpenAI-compatible API with minimal latency overhead.

Inspired by Bifrost, but with a focus on simplicity, extensibility via WASM plugins, and raw performance.

Your App ──▶ Butter ──▶ OpenAI / Anthropic / Gemini / Groq / Mistral / ...
                │
                ├── Unified OpenAI-compatible API
                ├── Automatic failover & retries
                ├── Weighted key rotation
                └── Plugin hooks (Go + WASM)

Features

OpenAI-compatible API — /v1/chat/completions (streaming & non-streaming), /v1/embeddings, /v1/models
9 providers: OpenAI, Anthropic, Gemini, OpenRouter, Groq, Mistral, Together.ai, Fireworks, Perplexity — shared openaicompat base for any OpenAI-compatible API
Anthropic & Gemini format translation (OpenAI requests automatically converted to/from native formats)
Multi-provider routing with model-specific provider lists and priority/round-robin strategies
Weighted random key selection with per-key model allowlists
Multi-provider failover with configurable retry-on status codes and exponential backoff
WASM plugin sandbox via Extism/wazero — load external .wasm plugins with zero CGo, full sandbox isolation
Prompt injection guard WASM plugin — scans requests for ~60 injection patterns across 7 categories (instruction override, jailbreak, prompt extraction, etc.) with Unicode bypass detection; block, log, or tag modes
Plugin system with ordered hook chains (pre_http, post_http, pre_llm, post_llm, stream chunks, observability traces)
Plugin short-circuit support (plugins can reject or rewrite requests before they reach the provider)
Built-in rate limiter plugin (token bucket, global or per-IP, configurable RPM)
Built-in request logging plugin (structured slog, provider/model/status/duration)
Built-in Prometheus metrics plugin (OTel SDK instruments, /metrics endpoint)
Built-in distributed tracing plugin (OTel SDK, OTLP HTTP export)
Response caching (in-memory LRU or Redis backend; SHA256 cache key; temperature=0 non-streaming only)
Config hot-reload (mtime polling, atomic engine swap — no restart required)
Application keys for usage tracking and attribution — vend btr_ tokens, track per-key request/token counts, optional enforcement via require_key
Raw HTTP passthrough for provider-native endpoints (/native/{provider}/*)
Health check endpoint (/healthz)
Graceful shutdown (SIGINT/SIGTERM)
Multi-stage Docker image (distroless base)

Coming soon:

More providers (Azure OpenAI, AWS Bedrock, Vertex AI)

Quick Start

Prerequisites

Go 1.25+ (uses enhanced ServeMux pattern routing)
An API key for a supported provider (OpenAI, Anthropic, Google Gemini, OpenRouter, Groq, Mistral, Together.ai, Fireworks, Perplexity, or any OpenAI-compatible API)

1. Install

Download the latest binary from GitHub Releases, or build from source:

git clone https://github.com/temikus/butter.git
cd butter
go build -o pkg/bin/butter ./cmd/butter/

2. Configure

cp config.example.yaml config.yaml

Edit config.yaml or set environment variables:

export OPENAI_API_KEY="sk-..."
export ANTHROPIC_API_KEY="sk-ant-..."
export OPENROUTER_API_KEY="sk-or-v1-..."

The config file supports ${ENV_VAR} substitution, so the default config.example.yaml works out of the box once the environment variables are set.

Example config.yaml

server:
  address: ":8080"
  read_timeout: 30s
  write_timeout: 120s

providers:
  openai:
    base_url: https://api.openai.com/v1
    keys:
      - key: "${OPENAI_API_KEY}"
        weight: 1

  anthropic:
    base_url: https://api.anthropic.com/v1
    keys:
      - key: "${ANTHROPIC_API_KEY}"
        weight: 1

  openrouter:
    base_url: https://openrouter.ai/api/v1
    keys:
      - key: "${OPENROUTER_API_KEY}"
        weight: 1

routing:
  default_provider: openrouter
  models:
    "gpt-4o":
      providers: [openai, openrouter]
      strategy: priority
    "claude-sonnet-4-20250514":
      providers: [anthropic, openrouter]
      strategy: priority
  failover:
    enabled: true
    max_retries: 3
    retry_on: [429, 500, 502, 503, 504]
    backoff:
      initial: 100ms
      multiplier: 2.0
      max: 5s

plugins:
  ratelimit:
    requests_per_minute: 60
    per_ip: false
  requestlog:
    level: info
  metrics: {}

3. Run

./pkg/bin/butter -config config.yaml

You should see:

{"level":"INFO","msg":"butter listening","address":":8080"}

4. Send a request

Non-streaming:

curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Say hello in three languages"}]
  }'

Streaming:

curl http://localhost:8080/v1/chat/completions \
  --no-buffer \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-4o-mini",
    "messages": [{"role": "user", "content": "Count to 5"}],
    "stream": true
  }'

Health check:

curl http://localhost:8080/healthz
# ok

Drop-in replacement

Butter is compatible with any OpenAI SDK client. Just point the base URL at your Butter instance:

Python (openai SDK):

from openai import OpenAI

client = OpenAI(
    base_url="http://localhost:8080/v1",
    api_key="unused",  # Butter uses its own configured keys
)

response = client.chat.completions.create(
    model="openai/gpt-4o-mini",
    messages=[{"role": "user", "content": "Hello!"}],
)
print(response.choices[0].message.content)

Node.js (openai SDK):

import OpenAI from "openai";

const client = new OpenAI({
  baseURL: "http://localhost:8080/v1",
  apiKey: "unused",
});

const completion = await client.chat.completions.create({
  model: "openai/gpt-4o-mini",
  messages: [{ role: "user", content: "Hello!" }],
});
console.log(completion.choices[0].message.content);

Development

A justfile is provided for common tasks:

just build              # Build binary (with commit hash)
just build-release      # Build with full version info from git
just serve              # Run with config (auto-loads API keys from ~/.openai/api-key etc.)
just test               # Run all tests with race detector
just test-integration   # Run integration tests (mock servers, no real API calls)
just lint               # Run golangci-lint
just check              # Run vet + lint + test + integration tests
just bench              # Run benchmarks with allocation reporting
just release-snapshot   # Test GoReleaser locally (no publish)
just docker-build       # Build multi-stage Docker image
just docker-run         # Run Docker container with config
just build-example-wasm # Compile example WASM plugin (requires TinyGo)
just build-injection-guard # Compile prompt injection guard plugin (requires TinyGo)
just build-wasm         # Build all WASM plugins
just clean              # Remove built binaries and compiled WASM plugins

Or use Go directly:

go run ./cmd/butter/ -config config.yaml
go test ./... -v -race -count=1
go test ./... -bench=. -benchmem

Project structure

butter/
├── cmd/butter/                  Main binary
├── internal/
│   ├── config/                  YAML config with env var substitution + hot-reload watcher
│   ├── transport/               HTTP server and handlers
│   ├── proxy/                   Core dispatch engine (routing, failover, key selection)
│   ├── appkey/                  Application key store (usage tracking, token counting)
│   ├── cache/                   Cache interface + in-memory LRU and Redis backends
│   ├── plugin/                  Plugin system (interfaces, chain, manager)
│   │   ├── wasm/                WASM plugin host (Extism/wazero)
│   │   └── builtin/
│   │       ├── ratelimit/       Token bucket rate limiter plugin
│   │       ├── requestlog/      Request logging plugin
│   │       ├── metrics/         Prometheus metrics plugin (OTel SDK)
│   │       └── tracing/         Distributed tracing plugin (OTel, OTLP HTTP)
│   └── provider/
│       ├── provider.go          Provider interface & types
│       ├── registry.go          Thread-safe provider registry
│       ├── openaicompat/        Reusable base for OpenAI-compatible APIs
│       ├── openai/              OpenAI provider
│       ├── anthropic/           Anthropic provider (format translation)
│       ├── gemini/              Google Gemini provider (format translation)
│       ├── openrouter/          OpenRouter provider
│       ├── groq/                Groq provider
│       ├── mistral/             Mistral provider
│       ├── together/            Together.ai provider
│       ├── fireworks/           Fireworks provider
│       └── perplexity/          Perplexity provider
├── plugin/sdk/                  Public JSON ABI types for WASM plugin authors
├── plugins/
│   ├── example-wasm/            Example WASM plugin (TinyGo, pre_http hook)
│   └── prompt-injection-guard/  Prompt injection detection WASM plugin
├── tests/integration/           Integration tests with mock provider servers
├── config.example.yaml
├── Dockerfile                   Multi-stage distroless image
├── justfile
└── go.mod

Performance Targets

Metric	Target
Per-request overhead (no plugins)	<50us
Per-request overhead (built-in plugins)	<100us
Per-request overhead (1 WASM plugin)	<150us
Streaming TTFB overhead	<1ms
Memory at idle	<30MB

License

Apache 2.0 License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 56 Commits
.github		.github
assets		assets
cmd/butter		cmd/butter
internal		internal
plugin/sdk		plugin/sdk
plugins		plugins
tests/integration		tests/integration
web		web
.dockerignore		.dockerignore
.gitignore		.gitignore
.goreleaser.yml		.goreleaser.yml
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
justfile		justfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Quick Start

Prerequisites

1. Install

2. Configure

3. Run

4. Send a request

Drop-in replacement

Development

Project structure

Performance Targets

License

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Quick Start

Prerequisites

1. Install

2. Configure

3. Run

4. Send a request

Drop-in replacement

Development

Project structure

Performance Targets

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages