Skip to content

JGCRI/explorer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Explorer CLI

This project provides a CLI to build/update a persisted vectorstore from data/gcam_simulations.json and query it by similarity. AI integrations use pydantic-ai with an OpenAI-compatible embeddings endpoint. The default persistence backend is FAISS (.faiss index + JSON metadata), which scales better than storing full embedding vectors in JSON.

Setup

  1. Create and activate a virtual environment.
  2. Install the project in editable mode:
pip install -e .
  1. Configure OpenAI credentials in a .env file (if using OpenAI backend):
OPENAI_API_KEY=your_api_key
OPENAI_BASE_URL=https://your-api-url/v1
OPENAI_EMBEDDING_MODEL=text-embedding-3-large
EXPLORER_VECTORSTORE_PATH=data/gcam_vectorstore.json

OPENAI_API_URL is optional. You can also use OPENAI_BASE_URL. EXPLORER_VECTORSTORE_PATH is optional. If set, CLI commands use it as the default for --vectorstore-path.

Build or update vectorstore

Default behavior is incremental update:

  • existing simulation_id entries keep their persisted embeddings
  • only new simulation_id entries are embedded
explorer-cli build \
  --json-path data/gcam_simulations.json \
  --vectorstore-path data/gcam_vectorstore.json \
  --env-path /path/to/.env

Use --overwrite to force a complete rebuild:

explorer-cli build \
  --json-path data/gcam_simulations.json \
  --vectorstore-path data/gcam_vectorstore.json \
  --env-path /path/to/.env \
  --overwrite

Build writes metadata to data/gcam_vectorstore.json and vectors to data/gcam_vectorstore.faiss.

Query vectorstore

explorer-cli query \
  --vectorstore-path data/gcam_vectorstore.json \
  --topic "water cooling demand in urban systems" \
  --env-path /path/to/.env \
  --min-score 0.25 \
  --top-k 2

--min-score is optional. When set, only matches with score >= min-score are returned, and --top-k is ignored. When --min-score is set, top_k is omitted from the response payload. Use --ids-only to return only simulation_id values in results.

Both build and query support --env-path if your environment file is not at the default .env location.

Run local Flask API via CLI

Start a local API server that uses the same query logic as explorer-cli query:

explorer-cli serve \
  --vectorstore-path data/gcam_vectorstore.json \
  --env-path /path/to/.env \
  --host 127.0.0.1 \
  --port 5000

Open http://127.0.0.1:5000/ for a simple query tester UI.

Query it:

curl -X POST "http://127.0.0.1:5000/search" \
  -H "Content-Type: application/json" \
  -d '{"topic":"sectoral water demand","min_score":0.25,"top_k":5}'

To return IDs only via API:

curl -X POST "http://127.0.0.1:5000/search" \
  -H "Content-Type: application/json" \
  -d '{"topic":"sectoral water demand","ids_only":true}'

Health check:

curl "http://127.0.0.1:5000/health"

EC2 deployment notes (Ubuntu)

  • Build the vectorstore once on the instance (or copy existing .json + .faiss files).
  • Run with a public bind address for external access:
explorer-cli serve \
  --vectorstore-path data/gcam_vectorstore.json \
  --host 0.0.0.0 \
  --port 5000
  • Open the instance security group for your chosen port (for example, 5000) from trusted IP ranges.

About

AI capabilities for the GCIMS explorer app

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages