ADR-800: Dynamic model catalog and OpenRouter support#357
ADR-800: Dynamic model catalog and OpenRouter support#357
Conversation
Proposes replacing hardcoded model lists with a database-backed catalog per provider, adding OpenRouter as a fourth inference endpoint, and integrating per-model pricing into cost estimation.
…R-800) - Add provider_model_catalog table (migration 059) with pricing, capabilities, and curation fields per model per provider - Seed catalog with known models and pricing (migration 060) - Add OpenRouterProvider using OpenAI SDK with custom base URL - Add fetch_model_catalog() to all providers for dynamic discovery - Add model_catalog.py with upsert/query/enable/default/pricing helpers - Add admin API endpoints: GET/POST/PUT /admin/models/catalog/* - Add configure.py 'models' subcommand: list, refresh, enable, disable, default, price - Update CostEstimator to read pricing from catalog before env vars - Wire OpenRouter into provider factory, rate limiter, and configure.py - Update list_available_models() on all providers to prefer catalog
Replace hardcoded OpenAI setup in guided-init.sh with interactive flow: - Step 4: Choose provider (OpenAI, Anthropic, OpenRouter) - Step 5: Enter and validate API key - Step 6: Refresh model catalog, present filtered menu, user picks model OpenRouter shows curated subset (GPT-4o, Claude, Gemini, Llama, etc.) with option [0] to show all 200+ models. Ollama noted as post-init config. Also adds --tsv, --category, --limit flags to configure.py models list for machine-parseable output used by the init script.
Needed by OpenRouter and Ollama fetch_model_catalog() which use requests.get() directly rather than the OpenAI SDK.
Follow existing pattern — INSERT INTO public.schema_migrations at the end of each migration file.
058 uses AGE Cypher DDL (CREATE VLABEL/ELABEL) which can fail on cold start when AGE isn't fully initialized. Wrap each label creation in EXCEPTION handler so failures log a notice instead of aborting — labels get created on first use anyway. Also add ON_ERROR_STOP=1 to psql in migrate-db.sh so failed migrations don't self-register via the INSERT at the end of the file. Previously, a mid-file failure would still record the migration as applied, then the runner would exit, blocking all subsequent migrations.
Split migrations into two phases: - schema/migrations/ — standard SQL (tables, indexes, permissions) - schema/migrations-warm/ — AGE/Cypher DDL (needs running graph engine) Move 058 (precreate_graph_labels) to migrations-warm. After cold migrations, restart postgres so AGE is fully initialized, verify it's healthy, then apply warm migrations. Also: - Migration runner continues past failures instead of stopping - configure.py ai-provider handles missing catalog table gracefully - Runner supports --warm flag to select migration directory
…input The ai_extraction_config table has a CHECK constraint that only allowed openai, anthropic, ollama. Add openrouter via migration 061. Also remove -s flag from API key read so input is visible — easier to verify the key was pasted correctly.
NUMERIC(12,6) overflows on high-cost OpenRouter models (image generation, specialized). Use unconstrained NUMERIC which handles arbitrary precision.
No backwards compatibility needed — merge table creation, seed data, openrouter constraint, and NUMERIC widening into one migration.
[$] cycles through: unsorted → cheapest first → most expensive first. [0] toggles between curated and full model list. Both compose — sort applies to whichever list is currently shown.
The catalog ID is the 2nd positional arg which argparse maps to provider_name, not model_id. Fall back to provider_name for actions that expect a catalog ID.
4096 truncates extraction responses for verbose models via OpenRouter. Bump to 16384 as interim fix — proper solution is catalog-driven token limits per model.
- Add --max-tokens flag to configure.py ai-provider - Store max_tokens in ai_extraction_config table (column already existed) - OpenRouterProvider reads max_tokens from config (default 16384) - Init flow prompts for max tokens with press-enter-to-accept default - Factory passes max_tokens from database config to provider
Code Review: ADR-800 Dynamic Model Catalog & OpenRouter SupportPR Size: ~2500 lines across 15 files -- thorough review focused on patterns, security, and structural concerns. What this changes: Adds a SECURITY: API Key Exposed in Process ListLocation: Problem: The API key is passed as a command-line argument to docker exec kg-operator python /workspace/operator/configure.py api-key "$AI_PROVIDER" --key "$AI_KEY"This makes the key visible in Why it matters: On multi-user systems or if any monitoring tool captures process lists, the API key and admin password are transiently exposed in plaintext. Suggestion: Pipe the key via stdin or use an environment variable passed to docker exec: echo "$AI_KEY" | docker exec -i kg-operator python /workspace/operator/configure.py api-key "$AI_PROVIDER" --key-stdinOr use Severity: Medium. The exposure window is brief and this is a local operator tool, but it is worth addressing before the init flow sees wider use. SECURITY: Admin Routes Missing AuthenticationLocation: Problem: The from ..dependencies.auth import CurrentUser, require_role, require_permission
# ...
_: None = Depends(require_permission("extraction_config", "read"))The models routes are registered at Why it matters: An unauthenticated user could enable expensive models, change defaults, or refresh the catalog (which makes outbound API calls using stored keys). Suggestion: Add Severity: High -- this should be addressed before merge. The QUALITY: Repeated Connection Boilerplate in RoutesLocation: Problem: Every route handler repeats the same pattern: client = AGEClient()
conn = client.pool.getconn()
try:
# ...
finally:
client.pool.putconn(conn)This appears 6 times in 180 lines. It violates DRY and creates risk -- if someone forgets the Why it matters: Connection leak risk, maintenance burden, and inconsistency (some handlers catch Suggestion: Extract a context manager or FastAPI dependency: def get_db_connection():
client = AGEClient()
conn = client.pool.getconn()
try:
yield conn
finally:
client.pool.putconn(conn)
# Then in routes:
@admin_router.get("/catalog")
async def get_catalog(conn=Depends(get_db_connection), ...):
results = list_catalog(conn, ...)Check if other route files in this project already have such a dependency -- if so, reuse it. Severity: Low-medium. Functional but should be cleaned up. QUALITY: Error Detail Leaks Internal StateLocation: Problem: Exception messages are passed directly to the HTTP response: raise HTTPException(status_code=500, detail=str(e))This can expose database connection strings, SQL errors, file paths, and stack trace fragments to API consumers. Suggestion: Log the full error server-side (already done), but return a generic message to the client: raise HTTPException(status_code=500, detail="Failed to list catalog")SCHEMA: ADR Specifies NUMERIC(12,6) but Migration Uses Unconstrained NUMERICLocation: Problem: The ADR specifies Assessment: This is actually fine in practice -- PostgreSQL's unconstrained Severity: Informational -- no action needed on the code, but the ADR text is now slightly misleading. DESIGN: set_model_default Has a Race ConditionLocation: Problem:
While Suggestion: This is fine for the current single-operator use pattern, but if this ever gets concurrent API access, consider adding Severity: Low -- single-operator tool, but worth noting. DESIGN: CostEstimator Creates DB Connections Per CallLocation: Problem: Why it matters: Inconsistency -- Suggestion: Consider passing a connection through from the caller, or at least caching the client instance. Severity: Low -- pool handles this fine at current scale. ROBUSTNESS: guided-init.sh API Key Input Not MaskedLocation: Problem: The API key is read with plain read -p "${AI_KEY_PROMPT}: " AI_KEYSuggestion: Use read -s -p "${AI_KEY_PROMPT}: " AI_KEY
echo ""Severity: Medium -- usability/security issue during interactive setup. (Note: I see the ADR mentions the key input was changed to "unmask" in commit POSITIVE: OpenRouterProvider Follows Existing Patterns WellThe
POSITIVE: Two-Phase Migration DesignThe cold/warm migration split ( POSITIVE: fetch_model_catalog as Non-Abstract DefaultMaking POSITIVE: Schema DesignThe
Summary
Recommendation: Fix the missing authentication on |
Summary
OpenRouterProviderclass using OpenAI SDK with custom base URL, supporting 200+ models through a single API keyprovider_model_catalogdatabase table replaces hardcoded model lists; stores per-model pricing, capabilities, and operator curation stateoperator.sh initnow offers provider selection (OpenAI/Anthropic/OpenRouter), API key validation, catalog-driven model browsing with price sort, and configurable max_tokensschema/migrations/(cold SQL) fromschema/migrations-warm/(AGE/Cypher DDL) to fix cold-start race condition with migration 058CostEstimatorreads pricing from catalog table before falling back to env varsKey files
api/app/lib/ai_providers.py(OpenRouterProvider + fetch_model_catalog on all providers)api/app/lib/model_catalog.py,api/app/routes/models.pyschema/migrations/059_provider_model_catalog.sqloperator/configure.py(models subcommand),operator/lib/guided-init.shoperator/database/migrate-db.sh(--warm, continue-on-fail),operator/lib/start-infra.shapi/app/services/job_analysis.pyADR
docs/architecture/ai-embeddings/ADR-800-dynamic-model-catalog-and-openrouter-support.mdTest plan
operator.sh initwith OpenRouter — provider selection, key validation, catalog refresh (346 models), model selection with price sort, max_tokens promptconfigure.py models list/refresh/enable/disable/default/pricecommands/admin/models/catalog/*)