Skip to content

ederign/lls-sample-remote

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lls-openai

Llama Stack server using the remote::passthrough provider with OpenAI, Gemini, and Anthropic. No API key stored on the server — keys are passed per-request via headers.

Setup

uv sync

Create a .env file with your API keys:

cat > .env <<EOF
OPENAI_API_KEY=sk-your-key-here
GEMINI_API_KEY=your-gemini-key-here
ANTHROPIC_API_KEY=your-anthropic-key-here
EOF

Start the server

make server

Or manually:

uv run llama stack run run.yaml

The server starts on http://localhost:8321 with no API key required at boot.

Registered models

The run.yaml explicitly registers models via registered_resources (no auto-discovery):

Model ID Provider Display Name
openai/gpt-4o openai GPT-4o
openai-mini/gpt-4o-mini openai-mini GPT-4o Mini
openai-mini/gpt-4.1-nano openai-mini GPT-4.1 Nano
gemini/gemini-2.5-flash-lite gemini Gemini 2.5 Flash Lite
anthropic/claude-haiku-4-5-20251001 anthropic Claude Haiku 4.5

List models

curl -s http://localhost:8321/v1/models | jq '.data[].id'

Export your API key first:

export $(cat .env)

Inference (regular)

curl -s http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$OPENAI_API_KEY\", \"passthrough_url\": \"https://api.openai.com\"}" \
  -d '{
    "model": "openai/gpt-4o",
    "messages": [{"role": "user", "content": "Hello!"}]
  }' | jq '.choices[0].message.content'

Inference with Gemini

curl -s http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$GEMINI_API_KEY\", \"passthrough_url\": \"https://generativelanguage.googleapis.com/v1beta/openai\"}" \
  -d '{
    "model": "gemini/gemini-2.5-flash-lite",
    "messages": [{"role": "user", "content": "Which model are you?"}]
  }' | jq '.choices[0].message.content'

Inference with Anthropic (Claude)

Anthropic exposes an OpenAI-compatible endpoint, so remote::passthrough works here too.

curl -s http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$ANTHROPIC_API_KEY\", \"passthrough_url\": \"https://api.anthropic.com\"}" \
  -d '{
    "model": "anthropic/claude-haiku-4-5-20251001",
    "messages": [{"role": "user", "content": "Which model are you?"}]
  }' | jq '.choices[0].message.content'

Inference (streaming)

curl -s -N http://localhost:8321/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$OPENAI_API_KEY\", \"passthrough_url\": \"https://api.openai.com\"}" \
  -d '{
    "model": "openai/gpt-4o",
    "stream": true,
    "messages": [{"role": "user", "content": "Count from 1 to 5."}]
  }'

Embeddings (OpenAI)

curl -s http://localhost:8321/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$OPENAI_API_KEY\", \"passthrough_url\": \"https://api.openai.com\"}" \
  -d '{
    "model": "openai/text-embedding-3-small",
    "input": "Hello, this is a test sentence for embeddings"
  }' | jq '.'

Embeddings (Gemini)

curl -s http://localhost:8321/v1/embeddings \
  -H "Content-Type: application/json" \
  -H "X-LlamaStack-Provider-Data: {\"passthrough_api_key\": \"$GEMINI_API_KEY\", \"passthrough_url\": \"https://generativelanguage.googleapis.com/v1beta/openai\"}" \
  -d '{
    "model": "gemini/gemini-embedding-001",
    "input": "Hello, this is a test sentence for embeddings"
  }' | jq '.'

Note: Gemini embeddings currently fail due to remote::passthrough being a pure proxy. Gemini's OpenAI-compatible endpoint response is missing required fields (index and usage), causing validation errors. OpenAI embeddings work because they include all expected fields.

Python client

make client

Or:

export $(cat .env) && uv run python main.py

Why remote::passthrough instead of remote::openai?

The remote::openai provider requires a valid API key at server startup because it auto-discovers models by calling OpenAI's /v1/models endpoint. It also validates models during registered_resources registration against that discovered list.

remote::passthrough skips all of that:

  • register_model() does zero validation — any model ID is accepted
  • No auto-discovery at startup
  • API key is only needed at inference time, passed per-request

Tradeoffs:

  • The X-LlamaStack-Provider-Data header requires both passthrough_url and passthrough_api_key, even when base_url is set in run.yaml
  • No OpenAIMixin processing (no embedding metadata, no stream_options for usage stats in streaming)
  • Pure proxy — less integrated with llama-stack's model management

About

Llama Stack sample with remote::passthrough provider for OpenAI

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors