v100

v100 is an experimental agent harness for studying long-horizon LLM behavior.

It provides a modular runtime for tool-using language model agents where every run is treated as an observable experiment. Model calls, tool execution, context compression, and delegation events are emitted as structured traces that can be replayed and analyzed.

The goal is to make agent behavior measurable and reproducible so different prompting strategies, tool policies, and orchestration approaches can be systematically evaluated.

Features

Durable traces — every run is logged as append-only JSONL (runs/<id>/trace.jsonl) with 21 structured event types
Two solvers — ReAct loop (default) and Plan-Execute with automatic replanning
Sandbox execution — Docker-based isolated containers with hardened security, snapshots, and apply-back
Run metadata + scoring — attach names/tags and score outcomes for later analysis
Evaluation tooling — per-run stats, run comparisons, experiments, and batch bench execution
Trace-derived diagnostics — efficiency/behavior metrics and automatic failure classification
Phase 300 optimization — autonomous agent refinement via reflective scoring and prompt mutation
Self-evolution engine — agents can distill trajectories and author new tools at runtime
Delegated sub-agents — agent tool can spawn bounded child loops
Named specialist agents — config-driven roles via [agents.<name>] and role dispatching
Coordination patterns — orchestrate tool supports fanout and pipeline execution
Shared run state — blackboard tools provide cross-agent coordination via vectorized memory
Reflection turn — agents perform an internal confidence-check before executing dangerous tools
Streaming — real-time token streaming from providers that support it
Tool execution — 26+ built-in tools: file system, shell, git, patch, curl, ripgrep search, semantic parsing, sql_search, graphviz, reflect, and multi-agent coordination
Dangerous tool confirmation — CLI stdin prompt or TUI Ctrl+Y/Ctrl+N
Budget enforcement — hard limits on steps, tokens, and cost
Build verification loop — automatically runs go build ./... after every workspace mutation and injects compiler errors as a diagnostic alert
Resume & replay — pick up any run from its trace; pretty-print transcripts
Six providers — Codex (ChatGPT subscription), Gemini subscription, OpenAI API, Anthropic API, Minimax, or local Ollama
Two UIs — line-by-line CLI (default) or Bubble Tea "Mission Control" TUI (--tui)
Dev supervisor — restart on demand by creating .v100-reload

Install

git clone https://github.com/tripledoublev/v100
cd v100
go build -ldflags "-X main.version=v0.2.5" -o v100 ./cmd/v100/

Requires Go 1.21+. Optional: rg (ripgrep) for project_search, patch for patch_apply, docker for sandbox execution, mpv or ffplay for radio.

Pre-built binaries are available on the releases page.

Quick start

# Initialize config
v100 config init

# Fill ~/.config/v100/oauth_credentials.json with your OAuth client values
# (only needed for Codex/Gemini subscription providers)

# Authenticate with ChatGPT subscription
v100 login

# Verify setup
v100 doctor

# Start a run (uses ChatGPT subscription by default)
v100 run

# Start a non-interactive run (executes prompt then exits)
v100 run --exit "Summarize the project structure"

# With a step budget
v100 run --budget-steps 10

# Use OpenAI API
v100 run --provider openai

# Use Anthropic API
v100 run --provider anthropic

# Use Minimax (MiniMax-M2.5)
v100 run --provider minimax

# Use local Ollama
v100 run --provider ollama --model qwen3.5:9b

# Use plan-execute solver with replanning
v100 run --solver plan_execute --max-replans 3

# Enable sandbox execution
v100 run --sandbox

# Enable "Mission Control" TUI
v100 run --tui

# Name and tag runs for later querying
v100 run --name "parser refactor" --tag team=core --tag sprint=12

Development checks

Use the repo-local lint wrapper so local lint matches CI:

GOCACHE="$PWD/.gocache" go test ./...
GOCACHE="$PWD/.gocache" go build ./...
GOCACHE="$PWD/.gocache" go vet ./...
./scripts/lint.sh

Local Sandbox Image

For Docker sandbox runs, build the repo-local image that includes the binaries the harness expects:

./scripts/build-sandbox-image.sh

That image installs Go, git, patch, ripgrep, and curl, which avoids the missing-patch failure seen with a plain golang base image.

Providers

ChatGPT subscription (default)

Uses your existing ChatGPT Plus/Pro plan — no separate API billing. Authenticate with v100 login after filling ~/.config/v100/oauth_credentials.json:

v100 login   # opens browser → completes OAuth → saves token

Model: gpt-5.4

OpenAI API

Standard pay-as-you-go API access.

export OPENAI_API_KEY=sk-...
v100 run --provider openai --model gpt-4o

Anthropic API

Claude API access via API key.

export ANTHROPIC_API_KEY=sk-ant-...
v100 run --provider anthropic --model claude-sonnet-4-20250514

Or authenticate interactively:

v100 login --provider anthropic   # prompts for API key, saves to ~/.config/v100/anthropic_auth.json

Gemini subscription

Uses your existing Gemini Pro / Google One AI Premium plan — no separate API billing. Authenticate with v100 login --provider gemini after filling ~/.config/v100/oauth_credentials.json:

v100 login --provider gemini
v100 run --provider gemini
v100 run --provider gemini --model gemini-2.5-pro

Models: gemini-2.5-flash (default), gemini-2.5-pro, gemini-3-pro-preview, gemini-3-flash-preview

Minimax

Advanced model support via MiniMax-M2.5.

v100 login --provider minimax
v100 run --provider minimax

Ollama (local)

Run fully local models via Ollama (no API key required).

ollama run qwen3.5:9b
v100 run --provider ollama --model qwen3.5:9b

Provider matrix

Provider	Auth	Default model	Streaming	Best for
`codex`	OAuth (`v100 login`)	`gpt-5.4`	yes	subscription-backed coding runs
`openai`	`OPENAI_API_KEY`	`gpt-4o`	yes	API-driven experiments
`anthropic`	`ANTHROPIC_API_KEY` or `v100 login --provider anthropic`	`claude-sonnet-4-20250514`	yes	Claude API experiments
`gemini`	OAuth (`v100 login --provider gemini`)	`gemini-2.5-flash`	yes	subscription-backed Gemini runs
`minimax`	OAuth (`v100 login --provider minimax`)	`MiniMax-M2.5`	yes	high-fidelity research runs
`ollama`	local daemon	`qwen3.5:2b`	yes	fully local runs

OAuth client config for subscription providers lives at ~/.config/v100/oauth_credentials.json.

Known limitations

Provider behavior differs noticeably; the same prompt can produce very different tool-use patterns across providers.
Subscription providers require a local OAuth client config file before v100 login will work.
Underspecified prompts can still trigger over-eager exploration on some models. Use --budget-steps, --budget-tokens, and default dangerous-tool confirmation when evaluating a new provider or prompt style.

Solvers

v100 supports two solver strategies that control how the agent loop processes tasks.

ReactSolver (default)

Classic ReAct (Reasoning + Acting) loop. The model receives the conversation, produces reasoning and tool calls, observes results, and repeats until done.

v100 run --solver react

PlanExecuteSolver

Two-phase strategy: first generates a plan, then executes it with ReAct. If execution fails, the solver can replan and retry up to --max-replans times. Checkpoints are created before execution so the workspace can be restored on failure.

v100 run --solver plan_execute --max-replans 3

Sandbox

v100 can execute tool operations inside an isolated Docker container instead of directly on the host.

Setup

v100 run --sandbox

Security

The Docker executor applies hardened security defaults:

--cap-drop ALL — drops all Linux capabilities
--security-opt no-new-privileges — prevents privilege escalation
--pids-limit 64 — limits child processes
Seccomp profile blocking: mount, ptrace, kexec_load, and other sensitive syscalls
Network isolation configurable via network_tier

Configuration

[sandbox]
enabled = false
backend = "docker"              # "host" or "docker"
image = "ubuntu:22.04"
network_tier = "off"            # "off" (isolated) or "open" (bridge)
memory_mb = 512
cpus = 1.0
apply_back = "manual"           # "manual", "on_success", or "never"

Snapshots and restore

During sandboxed runs, the solver can create workspace snapshots (checkpoints). These can be restored later:

# List checkpoints for a run
v100 restore --list <run_id>

# Restore to a specific checkpoint
v100 restore <run_id> <checkpoint_id>

# Resume from restored state
v100 resume <run_id>

Apply-back

After a sandboxed run, changes can be merged back to the host workspace:

on_success — automatically apply changes when the run ends successfully
manual — prompt for confirmation
never — keep changes only in the sandbox

Commands

Command	Description
`v100 run`	Start interactive agent run
`v100 resume <run_id>`	Continue a run from its trace (`--auto --unsafe` supported)
`v100 restore <run_id> [checkpoint_id]`	Restore sandbox checkpoint
`v100 replay <run_id>`	Pretty-print run transcript
`v100 runs [-n N] [--provider X] [--failed] [--all]`	List recent runs with optional filtering
`v100 tools`	List registered tools
`v100 providers`	List configured providers
`v100 config init`	Write default config to `~/.config/v100/config.toml`
`v100 doctor`	Check provider auth, tools, run dir
`v100 login [--provider <name>]`	Authenticate via browser OAuth or API key
`v100 logout [--provider <name>]`	Remove stored auth token
`v100 score <run_id> <pass\|fail\|partial> [notes...]`	Score a completed run
`v100 distill <run_id>`	Distill a run trace into ShareGPT format
`v100 stats <run_id>`	Compute stats from trace events
`v100 metrics <run_id>`	Compute trace-derived efficiency/behavior metrics
`v100 compare <run_id> <run_id> [...]`	Compare multiple runs side-by-side
`v100 bench <bench.toml>`	Batch-run prompt/provider/model variants
`v100 experiment create <name>`	Create a new experiment
`v100 experiment run <exp_id> --prompt <text>`	Execute all experiment variants
`v100 experiment results <exp_id>`	Display statistical results
`v100 analyze <run_id>`	Automated behavioral analysis
`v100 digest <run_id>`	Compact failure digest for a completed run
`v100 mutate <run_id>`	Suggest improved prompt based on failure analysis
`v100 diff <run_a> <run_b>`	Find divergence point between traces
`v100 query [--tag k=v ...] [--score <verdict>]`	Filter runs by metadata
`v100 dev`	Rebuild/restart dev binary on `.v100-reload`

`v100 run` flags

--provider string              Provider name (codex, gemini, openai, ollama, anthropic, minimax)
--model string                 Model override
--solver string                Solver strategy: react (default), plan_execute
--max-replans int              Max replans for plan_execute solver
--sandbox                      Enable isolated sandbox execution
--streaming                    Enable real-time token streaming (default: true)
--budget-steps int             Max steps before halting
--budget-tokens int            Max tokens before halting
--budget-cost float            Max cost in USD before halting
--max-tool-calls-per-step int  Max tool calls per step
--confirm-tools string         always | dangerous | never (default: dangerous)
--auto                         Auto-approve all tool calls
--unsafe                       Disable path guardrails
--workspace string             Workspace directory for tool operations
--name string                  Human-readable run name (meta.json)
--tag key=value                Repeatable run tags (meta.json)
--temperature float            Sampling temperature
--top-p float                  Nucleus sampling parameter
--top-k int                    Top-k sampling parameter
--max-tokens int               Max output tokens per model call
--seed int                     Random seed for reproducibility
--exit                         Finalize and exit after initial prompt completes
--tui                          Enable "Mission Control" TUI
--tui-no-alt                   Disable alternate screen
--tui-plain                    Force monochrome rendering
--tui-debug                    Write TUI debug log in run directory

Default workspace is the current directory where v100 run is launched.

Deterministic replay

v100 replay --deterministic <run_id>
v100 replay --deterministic --step <run_id>
v100 replay --deterministic --replace-model gpt-5.4 <run_id>
v100 replay --deterministic --inject-tool project_search="parser.go:123" <run_id>

In deterministic mode, model responses and tool outputs are replayed from trace records. --step pauses between model/tool events for debugger-style inspection. --replace-model runs recorded model.call prompts against a different model and prints a counterfactual response. --inject-tool overrides recorded tool outputs in replayed prompts for what-if experiments.

Tools

Tool	Danger	Description
`fs_read`	safe	Read file contents
`fs_write`	dangerous	Write/append to file
`fs_list`	safe	List directory entries
`fs_mkdir`	safe	Create directory
`fs_outline`	safe	Semantic outline of functions/types (Go only)
`sh`	dangerous	Execute shell command
`git_status`	safe	Git status
`git_diff`	safe	Git diff
`git_commit`	dangerous	Stage and commit
`git_push`	dangerous	Push current branch
`sem_diff`	safe	Semantic entity-level diffing
`sem_impact`	safe	Impact analysis for specific code entities
`sem_blame`	safe	Entity-level blame for a file
`patch_apply`	dangerous	Apply unified diff
`project_search`	safe	Ripgrep search
`sql_search`	dangerous	Execute SQL against local SQLite
`graphviz`	safe	Render DOT files to images
`curl_fetch`	safe	Fetch URL content
`agent`	dangerous	Spawn a bounded sub-agent run
`dispatch`	dangerous	Dispatch a task to a named agent role
`orchestrate`	dangerous	Coordinate multiple dispatches (fanout/pipeline)
`blackboard_read`	safe	Read shared run blackboard
`blackboard_write`	dangerous	Append/overwrite shared run blackboard
`reflect`	safe	Meta-cognitive self-critique and plan evaluation
`blackboard_search`	safe	Search vectorized blackboard memory
`blackboard_store`	dangerous	Store a record in vectorized blackboard

Dangerous tools require confirmation unless --auto or --confirm-tools never is set.

Config

Default location: ~/.config/v100/config.toml

[providers.codex]
type = "codex"
default_model = "gpt-5.4"

[providers.openai]
type = "openai"
default_model = "gpt-4o"
[providers.openai.auth]
env = "OPENAI_API_KEY"

[providers.anthropic]
type = "anthropic"
default_model = "claude-sonnet-4-20250514"
[providers.anthropic.auth]
env = "ANTHROPIC_API_KEY"

[providers.ollama]
type = "ollama"
default_model = "qwen3.5:9b"
base_url = "http://localhost:11434"

[providers.gemini]
type = "gemini"
default_model = "gemini-2.5-flash"

[providers.minimax]
type = "minimax"
default_model = "MiniMax-M2.5"

[sandbox]
enabled = false
backend = "docker"
image = "ubuntu:22.04"
network_tier = "off"
apply_back = "manual"

[agents.researcher]
system_prompt = "You are a researcher agent. Find and read relevant code and return concise findings. Do not modify files."
tools = ["fs_read", "fs_list", "project_search"]
model = ""
budget_steps = 15
budget_tokens = 20000
budget_cost_usd = 0.0

[defaults]
provider = "codex"
solver = "react"
confirm_tools = "dangerous"
budget_steps = 50
budget_tokens = 100000
budget_cost_usd = 0.0
tool_timeout_ms = 30000
max_tool_calls_per_step = 50
context_limit = 80000

Run layout

runs/<run_id>/
  trace.jsonl     # append-only event log (21 event types)
  meta.json       # run metadata: name/tags/provider/model/score
  blackboard.md   # shared scratchpad for multi-agent coordination
  artifacts/      # per-run artifact files
  tui.debug.log   # optional, if --tui-debug

Evaluation workflow

# Score a run
v100 score <run_id> pass "completed task without manual fixes"

# Inspect one run
v100 stats <run_id>
v100 metrics <run_id>

# Compare several runs
v100 compare <run_id> <run_id> [...]

# Query by metadata
v100 query --tag team=core --score pass

# Automated behavioral analysis
v100 analyze <run_id>

# Find where two runs diverged
v100 diff <run_a> <run_b>

Batch benchmarks

v100 bench ./bench.toml

name = "prompt-rewrite-v1"

[[prompts]]
message = "Refactor parser for streaming mode."

[[variants]]
name = "codex-default"
provider = "codex"
model = "gpt-5.4"
budget_steps = 20

Experiments

# Create an experiment with 3 repeats per variant
v100 experiment create my-experiment --repeats 3 \
  --variants gpt-4o:react --variants claude-sonnet-4-20250514:plan_execute

# Run all trials
v100 experiment run <exp_id> --prompt "Implement a linked list in Go"

# View results with statistical analysis
v100 experiment results <exp_id>

Dogfooding

For a concrete operator loop with ten runnable tasks, sandbox drills, and a starter bench file, see dogfood/README.md and dogfood/smoke.bench.toml.

Multi-agent quick examples

Call dispatch with {"agent":"researcher","task":"Find replay implementation and list key files."}
Call orchestrate with {"pattern":"fanout","tasks":[{"agent":"researcher","task":"Map replay"},{"agent":"reviewer","task":"List risks"}]}
Call blackboard_read with {}

Debugging a run

# Verify auth, provider setup, and local tools
v100 doctor
v100 providers

# Inspect one run in more detail
v100 stats <run_id>
v100 metrics <run_id>
v100 replay <run_id>
v100 replay --deterministic <run_id>

Dev mode

v100 dev runs a supervisor that rebuilds/restarts the local binary when .v100-reload exists in the project root.

v100 dev
touch .v100-reload

TUI keybinds

Key	Action
`Enter`	Send message
`Tab`	Cycle focus (input → transcript → trace → status)
`Alt+R`	Open Radio Station Selector
`Ctrl+M`	Toggle Visual Inspector Dashboard
`Ctrl+T`	Toggle trace pane
`Ctrl+S`	Toggle status pane
`Ctrl+A`	Copy full plain-text transcript
`Ctrl+Y`	Approve dangerous tool
`Ctrl+N`	Deny dangerous tool
`Ctrl+C`	Quit

Files created

v100 config init and the login flows create a small set of local files:

~/.config/v100/config.toml — main runtime config
~/.config/v100/oauth_credentials.json — local OAuth client config for Codex/Gemini
~/.config/v100/auth.json — Codex subscription token after v100 login
~/.config/v100/gemini_auth.json — Gemini subscription token after v100 login --provider gemini
~/.config/v100/minimax_auth.json — Minimax auth token after v100 login --provider minimax
~/.config/v100/anthropic_auth.json — Anthropic API key after v100 login --provider anthropic
runs/<run_id>/ — trace, metadata, and artifacts for each run

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 238 Commits
.github/workflows		.github/workflows
cmd/v100		cmd/v100
docker		docker
docs		docs
dogfood		dogfood
internal		internal
schemas		schemas
scripts		scripts
tests		tests
todos		todos
.gitignore		.gitignore
.golangci-version		.golangci-version
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
LICENSE		LICENSE
MEMORY.md		MEMORY.md
README.md		README.md
exploration.bench.toml		exploration.bench.toml
go.mod		go.mod
go.sum		go.sum

Folders and files

Latest commit

History

Repository files navigation

v100

Features

Install

Quick start

Development checks

Local Sandbox Image

Providers

ChatGPT subscription (default)

OpenAI API

Anthropic API

Gemini subscription

Minimax

Ollama (local)

Provider matrix

Known limitations

Solvers

ReactSolver (default)

PlanExecuteSolver

Sandbox

Setup

Security

Configuration

Snapshots and restore

Apply-back

Commands

v100 run flags

Deterministic replay

Tools

Config

Run layout

Evaluation workflow

Batch benchmarks

Experiments

Dogfooding

Multi-agent quick examples

Debugging a run

Dev mode

TUI keybinds

Files created

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`v100 run` flags

Packages