Skip to content

ADR-800: Dynamic model catalog and OpenRouter support#357

Open
aaronsb wants to merge 14 commits intomainfrom
feature/adr-086-model-catalog
Open

ADR-800: Dynamic model catalog and OpenRouter support#357
aaronsb wants to merge 14 commits intomainfrom
feature/adr-086-model-catalog

Conversation

@aaronsb
Copy link
Owner

@aaronsb aaronsb commented Mar 15, 2026

Summary

  • OpenRouter provider — new OpenRouterProvider class using OpenAI SDK with custom base URL, supporting 200+ models through a single API key
  • Dynamic model catalogprovider_model_catalog database table replaces hardcoded model lists; stores per-model pricing, capabilities, and operator curation state
  • Interactive init flowoperator.sh init now offers provider selection (OpenAI/Anthropic/OpenRouter), API key validation, catalog-driven model browsing with price sort, and configurable max_tokens
  • Two-phase migrations — split schema/migrations/ (cold SQL) from schema/migrations-warm/ (AGE/Cypher DDL) to fix cold-start race condition with migration 058
  • Catalog API endpoints — admin routes for listing, refreshing, enabling/disabling, and pricing models
  • Cost estimator integrationCostEstimator reads pricing from catalog table before falling back to env vars

Key files

Area Files
Provider api/app/lib/ai_providers.py (OpenRouterProvider + fetch_model_catalog on all providers)
Catalog api/app/lib/model_catalog.py, api/app/routes/models.py
Schema schema/migrations/059_provider_model_catalog.sql
Operator operator/configure.py (models subcommand), operator/lib/guided-init.sh
Migrations operator/database/migrate-db.sh (--warm, continue-on-fail), operator/lib/start-infra.sh
Cost api/app/services/job_analysis.py

ADR

docs/architecture/ai-embeddings/ADR-800-dynamic-model-catalog-and-openrouter-support.md

Test plan

  • Cold operator.sh init with OpenRouter — provider selection, key validation, catalog refresh (346 models), model selection with price sort, max_tokens prompt
  • Document ingestion via OpenRouter/GPT-5.1 — concepts extracted successfully
  • MCP search against extracted concepts — working
  • Two-phase migrations — cold migrations apply, postgres restarts, warm migration 058 applies after AGE init
  • Cold init with OpenAI (regression)
  • Cold init with Anthropic (regression)
  • configure.py models list/refresh/enable/disable/default/price commands
  • Admin API endpoints (/admin/models/catalog/*)

aaronsb added 14 commits March 15, 2026 15:33
Proposes replacing hardcoded model lists with a database-backed catalog
per provider, adding OpenRouter as a fourth inference endpoint, and
integrating per-model pricing into cost estimation.
…R-800)

- Add provider_model_catalog table (migration 059) with pricing, capabilities,
  and curation fields per model per provider
- Seed catalog with known models and pricing (migration 060)
- Add OpenRouterProvider using OpenAI SDK with custom base URL
- Add fetch_model_catalog() to all providers for dynamic discovery
- Add model_catalog.py with upsert/query/enable/default/pricing helpers
- Add admin API endpoints: GET/POST/PUT /admin/models/catalog/*
- Add configure.py 'models' subcommand: list, refresh, enable, disable, default, price
- Update CostEstimator to read pricing from catalog before env vars
- Wire OpenRouter into provider factory, rate limiter, and configure.py
- Update list_available_models() on all providers to prefer catalog
Replace hardcoded OpenAI setup in guided-init.sh with interactive flow:
- Step 4: Choose provider (OpenAI, Anthropic, OpenRouter)
- Step 5: Enter and validate API key
- Step 6: Refresh model catalog, present filtered menu, user picks model

OpenRouter shows curated subset (GPT-4o, Claude, Gemini, Llama, etc.)
with option [0] to show all 200+ models. Ollama noted as post-init config.

Also adds --tsv, --category, --limit flags to configure.py models list
for machine-parseable output used by the init script.
Needed by OpenRouter and Ollama fetch_model_catalog() which use
requests.get() directly rather than the OpenAI SDK.
Follow existing pattern — INSERT INTO public.schema_migrations at the
end of each migration file.
058 uses AGE Cypher DDL (CREATE VLABEL/ELABEL) which can fail on cold
start when AGE isn't fully initialized. Wrap each label creation in
EXCEPTION handler so failures log a notice instead of aborting — labels
get created on first use anyway.

Also add ON_ERROR_STOP=1 to psql in migrate-db.sh so failed migrations
don't self-register via the INSERT at the end of the file. Previously,
a mid-file failure would still record the migration as applied, then
the runner would exit, blocking all subsequent migrations.
Split migrations into two phases:
- schema/migrations/     — standard SQL (tables, indexes, permissions)
- schema/migrations-warm/ — AGE/Cypher DDL (needs running graph engine)

Move 058 (precreate_graph_labels) to migrations-warm. After cold
migrations, restart postgres so AGE is fully initialized, verify it's
healthy, then apply warm migrations.

Also:
- Migration runner continues past failures instead of stopping
- configure.py ai-provider handles missing catalog table gracefully
- Runner supports --warm flag to select migration directory
…input

The ai_extraction_config table has a CHECK constraint that only allowed
openai, anthropic, ollama. Add openrouter via migration 061.

Also remove -s flag from API key read so input is visible — easier to
verify the key was pasted correctly.
NUMERIC(12,6) overflows on high-cost OpenRouter models (image generation,
specialized). Use unconstrained NUMERIC which handles arbitrary precision.
No backwards compatibility needed — merge table creation, seed data,
openrouter constraint, and NUMERIC widening into one migration.
[$] cycles through: unsorted → cheapest first → most expensive first.
[0] toggles between curated and full model list. Both compose — sort
applies to whichever list is currently shown.
The catalog ID is the 2nd positional arg which argparse maps to
provider_name, not model_id. Fall back to provider_name for actions
that expect a catalog ID.
4096 truncates extraction responses for verbose models via OpenRouter.
Bump to 16384 as interim fix — proper solution is catalog-driven token
limits per model.
- Add --max-tokens flag to configure.py ai-provider
- Store max_tokens in ai_extraction_config table (column already existed)
- OpenRouterProvider reads max_tokens from config (default 16384)
- Init flow prompts for max tokens with press-enter-to-accept default
- Factory passes max_tokens from database config to provider
@aaronsb
Copy link
Owner Author

aaronsb commented Mar 15, 2026

Code Review: ADR-800 Dynamic Model Catalog & OpenRouter Support

PR Size: ~2500 lines across 15 files -- thorough review focused on patterns, security, and structural concerns.

What this changes: Adds a provider_model_catalog database table for runtime model discovery, implements an OpenRouter provider for 200+ model access via unified API, wires up operator init flow with interactive model selection, and introduces a two-phase (cold/warm) migration strategy.


SECURITY: API Key Exposed in Process List

Location: operator/lib/guided-init.sh:385

Problem: The API key is passed as a command-line argument to docker exec:

docker exec kg-operator python /workspace/operator/configure.py api-key "$AI_PROVIDER" --key "$AI_KEY"

This makes the key visible in ps aux output, docker inspect, and host process listings for the duration of the command. Same issue at line 561 with --max-tokens (benign) and line 308 with --password (also a concern).

Why it matters: On multi-user systems or if any monitoring tool captures process lists, the API key and admin password are transiently exposed in plaintext.

Suggestion: Pipe the key via stdin or use an environment variable passed to docker exec:

echo "$AI_KEY" | docker exec -i kg-operator python /workspace/operator/configure.py api-key "$AI_PROVIDER" --key-stdin

Or use -e flag: docker exec -e API_KEY="$AI_KEY" ... and read from env inside the script.

Severity: Medium. The exposure window is brief and this is a local operator tool, but it is worth addressing before the init flow sees wider use.


SECURITY: Admin Routes Missing Authentication

Location: api/app/routes/models.py -- all endpoints (lines 32-179)

Problem: The admin_router endpoints have no authentication or authorization checks. Compare with api/app/routes/extraction.py which uses:

from ..dependencies.auth import CurrentUser, require_role, require_permission
# ...
_: None = Depends(require_permission("extraction_config", "read"))

The models routes are registered at /admin/models/* but anyone with network access to the API can list, refresh, enable, disable, or modify pricing for catalog models.

Why it matters: An unauthenticated user could enable expensive models, change defaults, or refresh the catalog (which makes outbound API calls using stored keys).

Suggestion: Add require_permission dependencies matching the pattern in extraction.py. At minimum, read operations should require "model_catalog", "read" and mutating operations should require "model_catalog", "write".

Severity: High -- this should be addressed before merge. The /admin/models/catalog/refresh endpoint is particularly sensitive since it triggers outbound API calls with stored credentials.


QUALITY: Repeated Connection Boilerplate in Routes

Location: api/app/routes/models.py -- every endpoint (lines 39-49, 83-93, 103-115, etc.)

Problem: Every route handler repeats the same pattern:

client = AGEClient()
conn = client.pool.getconn()
try:
    # ...
finally:
    client.pool.putconn(conn)

This appears 6 times in 180 lines. It violates DRY and creates risk -- if someone forgets the finally, connections leak.

Why it matters: Connection leak risk, maintenance burden, and inconsistency (some handlers catch HTTPException and re-raise, others don't).

Suggestion: Extract a context manager or FastAPI dependency:

def get_db_connection():
    client = AGEClient()
    conn = client.pool.getconn()
    try:
        yield conn
    finally:
        client.pool.putconn(conn)

# Then in routes:
@admin_router.get("/catalog")
async def get_catalog(conn=Depends(get_db_connection), ...):
    results = list_catalog(conn, ...)

Check if other route files in this project already have such a dependency -- if so, reuse it.

Severity: Low-medium. Functional but should be cleaned up.


QUALITY: Error Detail Leaks Internal State

Location: api/app/routes/models.py:49, 97, 115, 133, 151, 179

Problem: Exception messages are passed directly to the HTTP response:

raise HTTPException(status_code=500, detail=str(e))

This can expose database connection strings, SQL errors, file paths, and stack trace fragments to API consumers.

Suggestion: Log the full error server-side (already done), but return a generic message to the client:

raise HTTPException(status_code=500, detail="Failed to list catalog")

SCHEMA: ADR Specifies NUMERIC(12,6) but Migration Uses Unconstrained NUMERIC

Location: schema/migrations/059_provider_model_catalog.sql:25-27 vs ADR-800 line 59-61

Problem: The ADR specifies NUMERIC(12, 6) for pricing columns, but the migration uses bare NUMERIC (unconstrained). Commit f9011a9e explicitly widened from constrained to unconstrained with the message "widen pricing columns."

Assessment: This is actually fine in practice -- PostgreSQL's unconstrained NUMERIC avoids truncation issues with very small per-token prices (OpenRouter prices can be extremely small fractions). The ADR should be updated to match the implementation, or a note added explaining the deviation.

Severity: Informational -- no action needed on the code, but the ADR text is now slightly misleading.


DESIGN: set_model_default Has a Race Condition

Location: api/app/lib/model_catalog.py:144-178

Problem: set_model_default performs a read-then-two-writes pattern without wrapping in a transaction:

  1. SELECT the provider/category (line 152-157)
  2. UPDATE to clear existing default (line 163-168)
  3. UPDATE to set new default (line 171-176)
  4. COMMIT (line 177)

While conn.commit() wraps all three in the implicit transaction, the partial unique index idx_catalog_default could cause issues if two concurrent requests try to set defaults for the same provider+category simultaneously -- the second request's step 3 would violate the unique index before step 2 of the first request commits.

Suggestion: This is fine for the current single-operator use pattern, but if this ever gets concurrent API access, consider adding SELECT ... FOR UPDATE on the initial read, or using a single CTE-based statement.

Severity: Low -- single-operator tool, but worth noting.


DESIGN: CostEstimator Creates DB Connections Per Call

Location: api/app/services/job_analysis.py:49-62

Problem: _get_catalog_price creates a new AGEClient and gets a connection from the pool on every cost lookup. During job analysis, this is called twice (extraction + embedding). The broader pattern in model_catalog.py functions also expects a connection to be passed in, but this helper creates its own.

Why it matters: Inconsistency -- model_catalog.py functions take conn as a parameter (good, testable), but the cost estimator creates its own connections (harder to test, pool churn).

Suggestion: Consider passing a connection through from the caller, or at least caching the client instance.

Severity: Low -- pool handles this fine at current scale.


ROBUSTNESS: guided-init.sh API Key Input Not Masked

Location: operator/lib/guided-init.sh:374

Problem: The API key is read with plain read -p, not read -s -p (silent mode). The key is visible on screen as the user types.

read -p "${AI_KEY_PROMPT}: " AI_KEY

Suggestion: Use read -s -p and echo a newline after:

read -s -p "${AI_KEY_PROMPT}: " AI_KEY
echo ""

Severity: Medium -- usability/security issue during interactive setup. (Note: I see the ADR mentions the key input was changed to "unmask" in commit 986d6726 -- if this was intentional for copy-paste UX, document the tradeoff.)


POSITIVE: OpenRouterProvider Follows Existing Patterns Well

The OpenRouterProvider class (ai_providers.py:954-1288) is well-structured:

  • Follows the same AIProvider ABC interface as all other providers
  • Properly delegates embeddings to embedding_provider (OpenRouter doesn't serve embeddings)
  • Uses _load_api_key with the standard fallback chain (ADR-031)
  • Integrates with the rate limiter (get_provider_max_retries)
  • The fetch_model_catalog implementation correctly converts OpenRouter's per-token pricing strings to per-1M-token numerics
  • MIME type detection in describe_image matches the OpenAI provider pattern

POSITIVE: Two-Phase Migration Design

The cold/warm migration split (migrate-db.sh --warm + schema/migrations-warm/) is a practical solution to the AGE extension chicken-and-egg problem. Cold migrations run pure SQL before AGE is fully initialized; warm migrations run after a PostgreSQL restart brings AGE online. The migration runner handles both phases cleanly, and start-infra.sh orchestrates the restart between them (lines 127-140).

POSITIVE: fetch_model_catalog as Non-Abstract Default

Making fetch_model_catalog a concrete method with an empty-list default (ai_providers.py:207) rather than an abstract method is the right call. It follows Open/Closed -- existing providers work unchanged, new providers opt in. Good application of Interface Segregation.

POSITIVE: Schema Design

The provider_model_catalog table design is solid:

  • Composite unique on (provider, model_id, category) handles the overlap case well
  • Partial unique index for is_default per provider+category is elegant
  • ON CONFLICT DO NOTHING in seed data is idempotent
  • upstream_provider for OpenRouter model provenance tracking

Summary

Category Count Action
Security (auth missing) 1 Block -- add auth before merge
Security (key exposure) 2 Fix before wider deployment
Code quality 3 Address in follow-up
Schema/design 2 Informational
Positive patterns 4 --

Recommendation: Fix the missing authentication on /admin/models/* routes before merge. The API key exposure in guided-init.sh and the input masking are secondary but should be addressed soon. Everything else is solid work -- the OpenRouter integration, catalog design, and two-phase migration are well-thought-out.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant