This project provides a CLI to build/update a persisted vectorstore from
data/gcam_simulations.json and query it by similarity.
AI integrations use pydantic-ai with an OpenAI-compatible embeddings endpoint.
The default persistence backend is FAISS (.faiss index + JSON metadata), which
scales better than storing full embedding vectors in JSON.
- Create and activate a virtual environment.
- Install the project in editable mode:
pip install -e .- Configure OpenAI credentials in a
.envfile (if using OpenAI backend):
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=https://your-api-url/v1
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
EXPLORER_VECTORSTORE_PATH=data/gcam_vectorstore.jsonOPENAI_API_URL is optional. You can also use OPENAI_BASE_URL.
EXPLORER_VECTORSTORE_PATH is optional. If set, CLI commands use it as the default
for --vectorstore-path.
Default behavior is incremental update:
- existing
simulation_identries keep their persisted embeddings - only new
simulation_identries are embedded
explorer-cli build \
--json-path data/gcam_simulations.json \
--vectorstore-path data/gcam_vectorstore.json \
--env-path /path/to/.envUse --overwrite to force a complete rebuild:
explorer-cli build \
--json-path data/gcam_simulations.json \
--vectorstore-path data/gcam_vectorstore.json \
--env-path /path/to/.env \
--overwriteBuild writes metadata to data/gcam_vectorstore.json and vectors to
data/gcam_vectorstore.faiss.
explorer-cli query \
--vectorstore-path data/gcam_vectorstore.json \
--topic "water cooling demand in urban systems" \
--env-path /path/to/.env \
--min-score 0.25 \
--top-k 2--min-score is optional. When set, only matches with score >= min-score are returned,
and --top-k is ignored.
When --min-score is set, top_k is omitted from the response payload.
Use --ids-only to return only simulation_id values in results.
Both build and query support --env-path if your environment file is not at
the default .env location.
Start a local API server that uses the same query logic as explorer-cli query:
explorer-cli serve \
--vectorstore-path data/gcam_vectorstore.json \
--env-path /path/to/.env \
--host 127.0.0.1 \
--port 5000Open http://127.0.0.1:5000/ for a simple query tester UI.
Query it:
curl -X POST "http://127.0.0.1:5000/search" \
-H "Content-Type: application/json" \
-d '{"topic":"sectoral water demand","min_score":0.25,"top_k":5}'To return IDs only via API:
curl -X POST "http://127.0.0.1:5000/search" \
-H "Content-Type: application/json" \
-d '{"topic":"sectoral water demand","ids_only":true}'Health check:
curl "http://127.0.0.1:5000/health"- Build the vectorstore once on the instance (or copy existing
.json+.faissfiles). - Run with a public bind address for external access:
explorer-cli serve \
--vectorstore-path data/gcam_vectorstore.json \
--host 0.0.0.0 \
--port 5000- Open the instance security group for your chosen port (for example,
5000) from trusted IP ranges.