Context Layer

This repository implements a Context Layer a service that turns a natural language question into a Resolved Query Plan backed by evidence, bound to real schema, and producing verifiable SQL.

Demo: Google Drive

The problem we are solving

Business metric definitions are scattered across tables, Slack threads, and docs/PDFs. When business users ask questions in English, the system has to guess:

which metric definition to use,
which tables/columns map to it,
and what business rules apply.

That guessing creates ambiguity, broken dashboards, and untrustworthy SQL making AI tools hallucinate. In large orgs, the data is in tables but the metric definition lives in Slack, docs, tickets, and tribal knowledge.

Solution

A pluggable Context Layer service that resolves user questions into a Resolved Query Plan using schema context, lineage expansion, unstructured evidence, deterministic binding, and verification checks. Essentially a layer that helps with the business context and exact metric definitions to query the data tables.

Given a question, it:

QuerySpec agent extracts a formal intent object (strict JSON, Zod‑validated).
Schema agent searches the catalog and expands via lineage (plus optional join inference).
Evidence agent pulls Slack/docs/PDF chunks using hybrid retrieval.
Metric agent generates multiple MetricSpecCandidates with evidence citations.
Binding + join planner deterministically maps hints → columns and discards invalid candidates.
Ranking agent scores candidates and selects the best.
SQL agent builds deterministic SQL and optionally refines on SQL errors.
Verification agent runs safety checks (join duplication, null join keys, rate bounds).

Multi‑agent mode

Multi-agent is primarliy used for complex queries because the task spans multiple failure modes that a single pass often misses:

Different experts catch different errors: schema fit, evidence grounding, join plausibility, and SQL syntax are distinct checks. Debate/critique reduces hallucinations: one agent proposes, another challenges mismatches to actual tables/columns. It improves robustness on messy data: if one agent overfits to noisy evidence, another can correct using schema constraints. It gives a safety net: verification and SQL-repair agents can fix issues before execution.

How to run

Install dependencies

npm install

Start pgvector (if you don’t already have it) and Create the pgvector extension (once per pgvector DB)

docker run --name pgvector \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_USER=postgres \
  -e POSTGRES_DB=context \
  -p 5433:5432 \
  -d pgvector/pgvector:pg16
psql "postgres://postgres:postgres@127.0.0.1:5433/context" -c "CREATE EXTENSION IF NOT EXISTS vector;"

Set environment variables

Update .env with correct values:

# Core
PROVIDER_MODE=real
OPENAI_API_KEY=...
PORT=3000

# Data DB (your real warehouse)
DATA_DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5432/finance
WAREHOUSE_DIALECT=postgres

# Vector DB
VECTOR_STORE=pgvector
PGVECTOR_DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/context

# Unstructured sources
DOCS_DIR=./docs/finance
SLACK_BOT_TOKEN=...
SLACK_CHANNEL_ALLOWLIST=...   # optional

MULTI_AGENT_MODE=debate
MULTI_AGENT_MAX_ROUNDS=1
SQL_REPAIR_MODE=llm
SQL_REFINER_MODE=llm

Load mock financial/product data (prompted for DB name) Enter finance when prompted

node src/mock_data/scripts/load_finance_static.mjs

Run migrations (pgvector DB)

PGVECTOR_DATABASE_URL=postgres://postgres:postgres@127.0.0.1:5433/context \
  node --import tsx src/scripts/migrate.ts

Run the server

npm install
npm run dev

Create schema embeddings and generate synonyms

curl -X POST http://127.0.0.1:3000/admin/schema/refresh
curl -X POST http://127.0.0.1:3000/admin/schema/synonyms/refresh \
  -H 'Content-Type: application/json' \
  -d '{"mode":"llm","include_described":true,"max_tables":25,"max_columns":300,"max_synonyms":6}'

Sync docs + Slack

curl -X POST http://127.0.0.1:3000/admin/docs/sync
curl -X POST http://127.0.0.1:3000/admin/slack/sync

This endpoint triggers Slack ingestion: it fetches channels (filtered by SLACK_CHANNEL_ALLOWLIST if set), pulls recent messages and thread replies, normalizes them into chunks, and stores them in the unstructured store (JSONL and/or pgvector). It returns counts for channels, messages, and chunks indexed.

Embeddings are stored in pgvector (unstructured_chunks)

Open frontend at http://127.0.0.1:3000/ and start Asking questions In the backend it calls following endpoint

curl -X POST http://127.0.0.1:3000/resolve \
  -H 'content-type: application/json' \
  -d '{"question_text":"What is MTD net P&L by portfolio for active accounts?"}'

sample queries

“What is MTD net P&L by portfolio for active accounts?”

“Give YTD P&L (total, realized, unrealized) by desk.”

“Exclude FX impact: report net P&L by portfolio for QTD.”

“Average balance by region for active accounts this quarter.”

“Show daily gross vs net P&L for the Wealth desk last week.”

“What are total cash balances by product line for the most recent date?”

“What is NAV by portfolio for 2025‑01‑31?”

“List top 5 accounts by ending balance last month.”

Health Endpoint

GET /health — returns active providers and connectivity status

QuerySpec

{
  question_text: string;
  metric_terms: string[];
  measure_intent: { kind: "count"|"sum"|"avg"|"rate"|"distinct_count", numerator?: string|null, denominator?: string|null };
  grain: string|null;
  time_window: { start?: string, end?: string, tz?: string } | null;
  group_by: string[];
  filters: Array<{ field_hint: string; op: string; value: any }>;
  entities: string[];
  ambiguity_notes: string[];
  assumptions: string[];
}

Verification Checks

Implemented in src/verify/checks.ts:

sql_runs
row_count_nonnegative
join_duplication_check (dup_ratio <= 1.05)
null_join_key_rate (<= 0.2)
rate_bounds (0 ≤ value ≤ 1)

Name		Name	Last commit message	Last commit date
Latest commit History 29 Commits
docs		docs
frontend		frontend
src		src
.env.example		.env.example
.gitignore		.gitignore
CONTEXT_CHANGES.md		CONTEXT_CHANGES.md
README.md		README.md
image.png		image.png
package-lock.json		package-lock.json
package.json		package.json
tmp-bind-check.ts		tmp-bind-check.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Context Layer

The problem we are solving

Solution

Multi‑agent mode

How to run

sample queries

Health Endpoint

QuerySpec

Verification Checks

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Context Layer

The problem we are solving

Solution

Multi‑agent mode

How to run

sample queries

Health Endpoint

QuerySpec

Verification Checks

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages