v100 is an experimental agent harness for studying long-horizon LLM behavior.
It provides a modular runtime for tool-using language model agents where every run is treated as an observable experiment. Model calls, tool execution, context compression, and delegation events are emitted as structured traces that can be replayed and analyzed.
The goal is to make agent behavior measurable and reproducible so different prompting strategies, tool policies, and orchestration approaches can be systematically evaluated.
- Durable traces — every run is logged as append-only JSONL (
runs/<id>/trace.jsonl) with 21 structured event types - Two solvers — ReAct loop (default) and Plan-Execute with automatic replanning
- Sandbox execution — Docker-based isolated containers with hardened security, snapshots, and apply-back
- Run metadata + scoring — attach names/tags and score outcomes for later analysis
- Evaluation tooling — per-run stats, run comparisons, experiments, and batch bench execution
- Trace-derived diagnostics — efficiency/behavior metrics and automatic failure classification
- Phase 300 optimization — autonomous agent refinement via reflective scoring and prompt mutation
- Self-evolution engine — agents can distill trajectories and author new tools at runtime
- Delegated sub-agents —
agenttool can spawn bounded child loops - Named specialist agents — config-driven roles via
[agents.<name>]and role dispatching - Coordination patterns —
orchestratetool supportsfanoutandpipelineexecution - Shared run state — blackboard tools provide cross-agent coordination via vectorized memory
- Reflection turn — agents perform an internal confidence-check before executing dangerous tools
- Streaming — real-time token streaming from providers that support it
- Tool execution — 26+ built-in tools: file system, shell, git, patch, curl, ripgrep search, semantic parsing, sql_search, graphviz, reflect, and multi-agent coordination
- Dangerous tool confirmation — CLI stdin prompt or TUI Ctrl+Y/Ctrl+N
- Budget enforcement — hard limits on steps, tokens, and cost
- Build verification loop — automatically runs
go build ./...after every workspace mutation and injects compiler errors as a diagnostic alert - Resume & replay — pick up any run from its trace; pretty-print transcripts
- Six providers — Codex (ChatGPT subscription), Gemini subscription, OpenAI API, Anthropic API, Minimax, or local Ollama
- Two UIs — line-by-line CLI (default) or Bubble Tea "Mission Control" TUI (
--tui) - Dev supervisor — restart on demand by creating
.v100-reload
git clone https://github.com/tripledoublev/v100
cd v100
go build -ldflags "-X main.version=v0.2.5" -o v100 ./cmd/v100/Requires Go 1.21+. Optional: rg (ripgrep) for project_search, patch for patch_apply, docker for sandbox execution, mpv or ffplay for radio.
Pre-built binaries are available on the releases page.
# Initialize config
v100 config init
# Fill ~/.config/v100/oauth_credentials.json with your OAuth client values
# (only needed for Codex/Gemini subscription providers)
# Authenticate with ChatGPT subscription
v100 login
# Verify setup
v100 doctor
# Start a run (uses ChatGPT subscription by default)
v100 run
# Start a non-interactive run (executes prompt then exits)
v100 run --exit "Summarize the project structure"
# With a step budget
v100 run --budget-steps 10
# Use OpenAI API
v100 run --provider openai
# Use Anthropic API
v100 run --provider anthropic
# Use Minimax (MiniMax-M2.5)
v100 run --provider minimax
# Use local Ollama
v100 run --provider ollama --model qwen3.5:9b
# Use plan-execute solver with replanning
v100 run --solver plan_execute --max-replans 3
# Enable sandbox execution
v100 run --sandbox
# Enable "Mission Control" TUI
v100 run --tui
# Name and tag runs for later querying
v100 run --name "parser refactor" --tag team=core --tag sprint=12Use the repo-local lint wrapper so local lint matches CI:
GOCACHE="$PWD/.gocache" go test ./...
GOCACHE="$PWD/.gocache" go build ./...
GOCACHE="$PWD/.gocache" go vet ./...
./scripts/lint.shFor Docker sandbox runs, build the repo-local image that includes the binaries the harness expects:
./scripts/build-sandbox-image.shThat image installs Go, git, patch, ripgrep, and curl, which avoids the missing-patch failure seen with a plain golang base image.
Uses your existing ChatGPT Plus/Pro plan — no separate API billing. Authenticate with v100 login after filling ~/.config/v100/oauth_credentials.json:
v100 login # opens browser → completes OAuth → saves tokenModel: gpt-5.4
Standard pay-as-you-go API access.
export OPENAI_API_KEY=sk-...
v100 run --provider openai --model gpt-4oClaude API access via API key.
export ANTHROPIC_API_KEY=sk-ant-...
v100 run --provider anthropic --model claude-sonnet-4-20250514Or authenticate interactively:
v100 login --provider anthropic # prompts for API key, saves to ~/.config/v100/anthropic_auth.jsonUses your existing Gemini Pro / Google One AI Premium plan — no separate API billing. Authenticate with v100 login --provider gemini after filling ~/.config/v100/oauth_credentials.json:
v100 login --provider gemini
v100 run --provider gemini
v100 run --provider gemini --model gemini-2.5-proModels: gemini-2.5-flash (default), gemini-2.5-pro, gemini-3-pro-preview, gemini-3-flash-preview
Advanced model support via MiniMax-M2.5.
v100 login --provider minimax
v100 run --provider minimaxRun fully local models via Ollama (no API key required).
ollama run qwen3.5:9b
v100 run --provider ollama --model qwen3.5:9b| Provider | Auth | Default model | Streaming | Best for |
|---|---|---|---|---|
codex |
OAuth (v100 login) |
gpt-5.4 |
yes | subscription-backed coding runs |
openai |
OPENAI_API_KEY |
gpt-4o |
yes | API-driven experiments |
anthropic |
ANTHROPIC_API_KEY or v100 login --provider anthropic |
claude-sonnet-4-20250514 |
yes | Claude API experiments |
gemini |
OAuth (v100 login --provider gemini) |
gemini-2.5-flash |
yes | subscription-backed Gemini runs |
minimax |
OAuth (v100 login --provider minimax) |
MiniMax-M2.5 |
yes | high-fidelity research runs |
ollama |
local daemon | qwen3.5:2b |
yes | fully local runs |
OAuth client config for subscription providers lives at ~/.config/v100/oauth_credentials.json.
- Provider behavior differs noticeably; the same prompt can produce very different tool-use patterns across providers.
- Subscription providers require a local OAuth client config file before
v100 loginwill work. - Underspecified prompts can still trigger over-eager exploration on some models. Use
--budget-steps,--budget-tokens, and default dangerous-tool confirmation when evaluating a new provider or prompt style.
v100 supports two solver strategies that control how the agent loop processes tasks.
Classic ReAct (Reasoning + Acting) loop. The model receives the conversation, produces reasoning and tool calls, observes results, and repeats until done.
v100 run --solver reactTwo-phase strategy: first generates a plan, then executes it with ReAct. If execution fails, the solver can replan and retry up to --max-replans times. Checkpoints are created before execution so the workspace can be restored on failure.
v100 run --solver plan_execute --max-replans 3v100 can execute tool operations inside an isolated Docker container instead of directly on the host.
v100 run --sandboxThe Docker executor applies hardened security defaults:
--cap-drop ALL— drops all Linux capabilities--security-opt no-new-privileges— prevents privilege escalation--pids-limit 64— limits child processes- Seccomp profile blocking: mount, ptrace, kexec_load, and other sensitive syscalls
- Network isolation configurable via
network_tier
[sandbox]
enabled = false
backend = "docker" # "host" or "docker"
image = "ubuntu:22.04"
network_tier = "off" # "off" (isolated) or "open" (bridge)
memory_mb = 512
cpus = 1.0
apply_back = "manual" # "manual", "on_success", or "never"During sandboxed runs, the solver can create workspace snapshots (checkpoints). These can be restored later:
# List checkpoints for a run
v100 restore --list <run_id>
# Restore to a specific checkpoint
v100 restore <run_id> <checkpoint_id>
# Resume from restored state
v100 resume <run_id>After a sandboxed run, changes can be merged back to the host workspace:
on_success— automatically apply changes when the run ends successfullymanual— prompt for confirmationnever— keep changes only in the sandbox
| Command | Description |
|---|---|
v100 run |
Start interactive agent run |
v100 resume <run_id> |
Continue a run from its trace (--auto --unsafe supported) |
v100 restore <run_id> [checkpoint_id] |
Restore sandbox checkpoint |
v100 replay <run_id> |
Pretty-print run transcript |
v100 runs [-n N] [--provider X] [--failed] [--all] |
List recent runs with optional filtering |
v100 tools |
List registered tools |
v100 providers |
List configured providers |
v100 config init |
Write default config to ~/.config/v100/config.toml |
v100 doctor |
Check provider auth, tools, run dir |
v100 login [--provider <name>] |
Authenticate via browser OAuth or API key |
v100 logout [--provider <name>] |
Remove stored auth token |
v100 score <run_id> <pass|fail|partial> [notes...] |
Score a completed run |
v100 distill <run_id> |
Distill a run trace into ShareGPT format |
v100 stats <run_id> |
Compute stats from trace events |
v100 metrics <run_id> |
Compute trace-derived efficiency/behavior metrics |
v100 compare <run_id> <run_id> [...] |
Compare multiple runs side-by-side |
v100 bench <bench.toml> |
Batch-run prompt/provider/model variants |
v100 experiment create <name> |
Create a new experiment |
v100 experiment run <exp_id> --prompt <text> |
Execute all experiment variants |
v100 experiment results <exp_id> |
Display statistical results |
v100 analyze <run_id> |
Automated behavioral analysis |
v100 digest <run_id> |
Compact failure digest for a completed run |
v100 mutate <run_id> |
Suggest improved prompt based on failure analysis |
v100 diff <run_a> <run_b> |
Find divergence point between traces |
v100 query [--tag k=v ...] [--score <verdict>] |
Filter runs by metadata |
v100 dev |
Rebuild/restart dev binary on .v100-reload |
--provider string Provider name (codex, gemini, openai, ollama, anthropic, minimax)
--model string Model override
--solver string Solver strategy: react (default), plan_execute
--max-replans int Max replans for plan_execute solver
--sandbox Enable isolated sandbox execution
--streaming Enable real-time token streaming (default: true)
--budget-steps int Max steps before halting
--budget-tokens int Max tokens before halting
--budget-cost float Max cost in USD before halting
--max-tool-calls-per-step int Max tool calls per step
--confirm-tools string always | dangerous | never (default: dangerous)
--auto Auto-approve all tool calls
--unsafe Disable path guardrails
--workspace string Workspace directory for tool operations
--name string Human-readable run name (meta.json)
--tag key=value Repeatable run tags (meta.json)
--temperature float Sampling temperature
--top-p float Nucleus sampling parameter
--top-k int Top-k sampling parameter
--max-tokens int Max output tokens per model call
--seed int Random seed for reproducibility
--exit Finalize and exit after initial prompt completes
--tui Enable "Mission Control" TUI
--tui-no-alt Disable alternate screen
--tui-plain Force monochrome rendering
--tui-debug Write TUI debug log in run directory
Default workspace is the current directory where v100 run is launched.
v100 replay --deterministic <run_id>
v100 replay --deterministic --step <run_id>
v100 replay --deterministic --replace-model gpt-5.4 <run_id>
v100 replay --deterministic --inject-tool project_search="parser.go:123" <run_id>In deterministic mode, model responses and tool outputs are replayed from trace records.
--step pauses between model/tool events for debugger-style inspection.
--replace-model runs recorded model.call prompts against a different model and prints a counterfactual response.
--inject-tool overrides recorded tool outputs in replayed prompts for what-if experiments.
| Tool | Danger | Description |
|---|---|---|
fs_read |
safe | Read file contents |
fs_write |
dangerous | Write/append to file |
fs_list |
safe | List directory entries |
fs_mkdir |
safe | Create directory |
fs_outline |
safe | Semantic outline of functions/types (Go only) |
sh |
dangerous | Execute shell command |
git_status |
safe | Git status |
git_diff |
safe | Git diff |
git_commit |
dangerous | Stage and commit |
git_push |
dangerous | Push current branch |
sem_diff |
safe | Semantic entity-level diffing |
sem_impact |
safe | Impact analysis for specific code entities |
sem_blame |
safe | Entity-level blame for a file |
patch_apply |
dangerous | Apply unified diff |
project_search |
safe | Ripgrep search |
sql_search |
dangerous | Execute SQL against local SQLite |
graphviz |
safe | Render DOT files to images |
curl_fetch |
safe | Fetch URL content |
agent |
dangerous | Spawn a bounded sub-agent run |
dispatch |
dangerous | Dispatch a task to a named agent role |
orchestrate |
dangerous | Coordinate multiple dispatches (fanout/pipeline) |
blackboard_read |
safe | Read shared run blackboard |
blackboard_write |
dangerous | Append/overwrite shared run blackboard |
reflect |
safe | Meta-cognitive self-critique and plan evaluation |
blackboard_search |
safe | Search vectorized blackboard memory |
blackboard_store |
dangerous | Store a record in vectorized blackboard |
Dangerous tools require confirmation unless --auto or --confirm-tools never is set.
Default location: ~/.config/v100/config.toml
[providers.codex]
type = "codex"
default_model = "gpt-5.4"
[providers.openai]
type = "openai"
default_model = "gpt-4o"
[providers.openai.auth]
env = "OPENAI_API_KEY"
[providers.anthropic]
type = "anthropic"
default_model = "claude-sonnet-4-20250514"
[providers.anthropic.auth]
env = "ANTHROPIC_API_KEY"
[providers.ollama]
type = "ollama"
default_model = "qwen3.5:9b"
base_url = "http://localhost:11434"
[providers.gemini]
type = "gemini"
default_model = "gemini-2.5-flash"
[providers.minimax]
type = "minimax"
default_model = "MiniMax-M2.5"
[sandbox]
enabled = false
backend = "docker"
image = "ubuntu:22.04"
network_tier = "off"
apply_back = "manual"
[agents.researcher]
system_prompt = "You are a researcher agent. Find and read relevant code and return concise findings. Do not modify files."
tools = ["fs_read", "fs_list", "project_search"]
model = ""
budget_steps = 15
budget_tokens = 20000
budget_cost_usd = 0.0
[defaults]
provider = "codex"
solver = "react"
confirm_tools = "dangerous"
budget_steps = 50
budget_tokens = 100000
budget_cost_usd = 0.0
tool_timeout_ms = 30000
max_tool_calls_per_step = 50
context_limit = 80000runs/<run_id>/
trace.jsonl # append-only event log (21 event types)
meta.json # run metadata: name/tags/provider/model/score
blackboard.md # shared scratchpad for multi-agent coordination
artifacts/ # per-run artifact files
tui.debug.log # optional, if --tui-debug
# Score a run
v100 score <run_id> pass "completed task without manual fixes"
# Inspect one run
v100 stats <run_id>
v100 metrics <run_id>
# Compare several runs
v100 compare <run_id> <run_id> [...]
# Query by metadata
v100 query --tag team=core --score pass
# Automated behavioral analysis
v100 analyze <run_id>
# Find where two runs diverged
v100 diff <run_a> <run_b>v100 bench ./bench.tomlname = "prompt-rewrite-v1"
[[prompts]]
message = "Refactor parser for streaming mode."
[[variants]]
name = "codex-default"
provider = "codex"
model = "gpt-5.4"
budget_steps = 20# Create an experiment with 3 repeats per variant
v100 experiment create my-experiment --repeats 3 \
--variants gpt-4o:react --variants claude-sonnet-4-20250514:plan_execute
# Run all trials
v100 experiment run <exp_id> --prompt "Implement a linked list in Go"
# View results with statistical analysis
v100 experiment results <exp_id>For a concrete operator loop with ten runnable tasks, sandbox drills, and a starter bench file, see dogfood/README.md and dogfood/smoke.bench.toml.
Call dispatch with {"agent":"researcher","task":"Find replay implementation and list key files."}
Call orchestrate with {"pattern":"fanout","tasks":[{"agent":"researcher","task":"Map replay"},{"agent":"reviewer","task":"List risks"}]}
Call blackboard_read with {}
# Verify auth, provider setup, and local tools
v100 doctor
v100 providers
# Inspect one run in more detail
v100 stats <run_id>
v100 metrics <run_id>
v100 replay <run_id>
v100 replay --deterministic <run_id>v100 dev runs a supervisor that rebuilds/restarts the local binary when
.v100-reload exists in the project root.
v100 dev
touch .v100-reload| Key | Action |
|---|---|
Enter |
Send message |
Tab |
Cycle focus (input → transcript → trace → status) |
Alt+R |
Open Radio Station Selector |
Ctrl+M |
Toggle Visual Inspector Dashboard |
Ctrl+T |
Toggle trace pane |
Ctrl+S |
Toggle status pane |
Ctrl+A |
Copy full plain-text transcript |
Ctrl+Y |
Approve dangerous tool |
Ctrl+N |
Deny dangerous tool |
Ctrl+C |
Quit |
v100 config init and the login flows create a small set of local files:
~/.config/v100/config.toml— main runtime config~/.config/v100/oauth_credentials.json— local OAuth client config for Codex/Gemini~/.config/v100/auth.json— Codex subscription token afterv100 login~/.config/v100/gemini_auth.json— Gemini subscription token afterv100 login --provider gemini~/.config/v100/minimax_auth.json— Minimax auth token afterv100 login --provider minimax~/.config/v100/anthropic_auth.json— Anthropic API key afterv100 login --provider anthropicruns/<run_id>/— trace, metadata, and artifacts for each run