EvalGate

CLI release gate for structured AI changes.

EvalGate runs a saved JSONL dataset against a prompt and model, validates the output against a JSON schema, computes deterministic metrics, and writes artifacts you can use locally or in CI.

Best fit:

classification
structured extraction
tagging and routing

Most valuable use case:

a team has a prompt or model change for a structured AI feature and needs one repeatable command to decide whether it is still safe to ship

Quickstart

corepack enable
pnpm install
pnpm evalgate:sample

That sample uses:

Artifacts are written to:

.artifacts/report.json
.artifacts/summary.md
.artifacts/junit.xml

Core Commands

Create a starter config:

pnpm evalgate:init

Run an eval:

pnpm evalgate run --dataset ./my-dataset.jsonl --config ./evalgate.config.json

Create a baseline from a finished run:

pnpm evalgate baseline create --from ./report.json --out ./baseline.json

Compare a report to a baseline:

pnpm evalgate compare --report ./report.json --baseline ./baseline.json

Fail on gate or regression:

pnpm evalgate run \
  --dataset ./my-dataset.jsonl \
  --config ./evalgate.config.json \
  --baseline ./baseline.json \
  --fail-on-gate \
  --fail-on-regression

Output

EvalGate always writes report.json.

By default it also writes:

summary.md
junit.xml

Optional:

sarif.json via --formats summary,junit,sarif

Useful flags:

--output-dir ./artifacts/evalgate
--out ./artifacts/report.json
--formats summary,junit,sarif

Key report fields:

schema_version
tool_version
provider
model
prompt_version
dataset_sha256
config_sha256
git_sha
git_branch
started_at
finished_at
duration_ms
failure_counts_by_type

Example

The repo ships with one complete walkthrough:

examples/ticket-triage/README.md

That example shows:

the dataset file
the config file
the exact command to run
the terminal output
the generated report artifacts

CI

GitHub Actions example:

docs/github-actions-example.yml

Live Provider

For OpenAI-backed runs:

export OPENAI_API_KEY=your_key_here
pnpm evalgate:sample:openai

See .env.example for environment variables.

Do not commit generated reports built from sensitive datasets. report.json, summary.md, and other artifacts can include raw inputs, outputs, and diffs.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 35 Commits
.github/workflows		.github/workflows
bin		bin
docs		docs
examples/ticket-triage		examples/ticket-triage
src		src
tests		tests
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

EvalGate

Quickstart

Core Commands

Output

Example

CI

Live Provider

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

EvalGate

Quickstart

Core Commands

Output

Example

CI

Live Provider

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages