One local endpoint. Every model you have access to. Any API format.
opencode-llm-proxy is an OpenCode plugin that starts a local HTTP server on http://127.0.0.1:4010. It translates between the API format your tool speaks and whichever LLM provider OpenCode has configured — so you never reconfigure the same models twice.
Your tool (OpenAI / Anthropic / Gemini SDK)
│
▼ http://127.0.0.1:4010
opencode-llm-proxy
│
▼ OpenCode SDK
GitHub Copilot · Anthropic · Gemini · Ollama · OpenRouter · Bedrock · …
Supported API formats — all with streaming:
| Format | Endpoint |
|---|---|
| OpenAI Chat Completions | POST /v1/chat/completions |
| OpenAI Responses API | POST /v1/responses |
| Anthropic Messages API | POST /v1/messages |
| Google Gemini | POST /v1beta/models/:model:generateContent |
Most LLM tools speak exactly one API dialect. OpenCode already manages connections to every provider you use. This proxy bridges the two — your tools keep working as-is, and you change which model they use in one place.
Common situations it solves:
- You have a GitHub Copilot subscription. Open WebUI, Chatbox, or a VS Code extension only accepts an OpenAI-compatible URL. Point them at the proxy — done.
- You run Ollama locally. Your Python scripts use the OpenAI SDK. Set
base_urlto the proxy and use your Ollama model IDs directly. - You want to swap models without code changes. Your app talks to the proxy; you change the model in OpenCode config.
- You want to share your models on a LAN. Expose the proxy on
0.0.0.0and give teammates the URL. - You use the Anthropic SDK but want to route through GitHub Copilot or Bedrock. No code change in the SDK — just point it at the proxy.
npm install opencode-llm-proxyAdd to opencode.json:
{
"plugin": ["opencode-llm-proxy"]
}Start OpenCode — the proxy starts automatically:
opencodeSend a request:
curl http://127.0.0.1:4010/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "github-copilot/claude-sonnet-4.6",
"messages": [{"role": "user", "content": "Hello!"}]
}'npm install opencode-llm-proxyAdd to your global ~/.config/opencode/opencode.json (works everywhere) or a project-level opencode.json:
{
"plugin": ["opencode-llm-proxy"]
}Global — loaded for every OpenCode session:
curl -o ~/.config/opencode/plugins/llm-proxy.js \
https://raw.githubusercontent.com/KochC/opencode-llm-proxy/main/index.jsPer-project — loaded only in this directory:
mkdir -p .opencode/plugins
curl -o .opencode/plugins/llm-proxy.js \
https://raw.githubusercontent.com/KochC/opencode-llm-proxy/main/index.js| Variable | Default | Description |
|---|---|---|
OPENCODE_LLM_PROXY_HOST |
127.0.0.1 |
Bind address. 0.0.0.0 to expose on LAN or Docker. |
OPENCODE_LLM_PROXY_PORT |
4010 |
TCP port. |
OPENCODE_LLM_PROXY_TOKEN |
(unset) | Bearer token required on every request. Unset = no auth. |
OPENCODE_LLM_PROXY_CORS_ORIGIN |
* |
Access-Control-Allow-Origin value for browser clients. |
OPENCODE_LLM_PROXY_HOST=0.0.0.0 \
OPENCODE_LLM_PROXY_TOKEN=my-secret \
opencodeimport OpenAI from "openai"
const client = new OpenAI({
baseURL: "http://127.0.0.1:4010/v1",
apiKey: "unused",
})
const response = await client.chat.completions.create({
model: "github-copilot/claude-sonnet-4.6",
messages: [{ role: "user", content: "Explain recursion." }],
})from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:4010/v1", api_key="unused")
response = client.chat.completions.create(
model="ollama/qwen2.5-coder",
messages=[{"role": "user", "content": "Write a Python function to reverse a string."}],
)
print(response.choices[0].message.content)import anthropic
client = anthropic.Anthropic(
base_url="http://127.0.0.1:4010",
api_key="unused",
)
message = client.messages.create(
model="anthropic/claude-3-5-sonnet",
max_tokens=1024,
messages=[{"role": "user", "content": "What is the Pythagorean theorem?"}],
)
print(message.content[0].text)import Anthropic from "@anthropic-ai/sdk"
const client = new Anthropic({
baseURL: "http://127.0.0.1:4010",
apiKey: "unused",
})
const message = await client.messages.create({
model: "anthropic/claude-opus-4",
max_tokens: 1024,
messages: [{ role: "user", content: "Explain async/await." }],
})import { GoogleGenerativeAI } from "@google/generative-ai"
const genAI = new GoogleGenerativeAI("unused", {
baseUrl: "http://127.0.0.1:4010",
})
const model = genAI.getGenerativeModel({ model: "google/gemini-2.0-flash" })
const result = await model.generateContent("What is machine learning?")
console.log(result.response.text())from langchain_openai import ChatOpenAI
llm = ChatOpenAI(
model="anthropic/claude-3-5-sonnet",
openai_api_base="http://127.0.0.1:4010/v1",
openai_api_key="unused",
)
response = llm.invoke("What are the SOLID principles?")
print(response.content)- Settings → Connections → OpenAI API
- Set API Base URL to
http://127.0.0.1:4010/v1 - Leave API Key blank (or set to your
OPENCODE_LLM_PROXY_TOKEN) - Save — all your OpenCode models appear in the model picker
Running Open WebUI in Docker? Use
http://host.docker.internal:4010/v1and setOPENCODE_LLM_PROXY_HOST=0.0.0.0.
Settings → AI Provider → OpenAI API → set API Host to http://127.0.0.1:4010.
In ~/.continue/config.json:
{
"models": [
{
"title": "Claude via OpenCode",
"provider": "openai",
"model": "anthropic/claude-3-5-sonnet",
"apiBase": "http://127.0.0.1:4010/v1",
"apiKey": "unused"
}
]
}In ~/.config/zed/settings.json:
{
"language_models": {
"openai": {
"api_url": "http://127.0.0.1:4010/v1",
"available_models": [
{
"name": "github-copilot/claude-sonnet-4.6",
"display_name": "Claude (OpenCode)",
"max_tokens": 8096
}
]
}
}
}curl http://127.0.0.1:4010/v1/models | jq '.data[].id'
# "github-copilot/claude-sonnet-4.6"
# "anthropic/claude-3-5-sonnet"
# "ollama/qwen2.5-coder"
# ...Use provider/model for clarity. Bare model IDs (e.g. gpt-4o) work if unambiguous across your providers.
To force a specific provider without changing the model string, add:
x-opencode-provider: anthropic
{ "healthy": true, "service": "opencode-openai-proxy" }Returns all models from all configured providers in OpenAI list format.
OpenAI Chat Completions. Required fields: model, messages. Optional: stream, temperature, max_tokens.
OpenAI Responses API. Required fields: model, input. Optional: instructions, stream, max_output_tokens.
Anthropic Messages API. Required fields: model, messages. Optional: system, max_tokens, stream.
Errors are returned in Anthropic format: { "type": "error", "error": { "type": "...", "message": "..." } }.
Google Gemini non-streaming. Model name in URL path. Required field: contents. Optional: systemInstruction, generationConfig.
Same as above, returns newline-delimited JSON stream.
Each request:
- Is authenticated if
OPENCODE_LLM_PROXY_TOKENis set - Has its model resolved —
provider/model, bare model ID, or Gemini URL path - Creates a temporary OpenCode session (visible in the session list)
- Sends the prompt via
client.session.prompt/client.session.promptAsync - Returns the response in the same format as the request
Streaming uses OpenCode's client.event.subscribe() SSE stream. Text deltas are forwarded in real time.
- Text only — image, audio, and file inputs are ignored
- No tool/function calling — all OpenCode tools are disabled for proxy sessions
- No cross-request session state — send full conversation history on every request
- Temperature and max tokens are advisory (passed as system prompt hints)
MIT