LLM-driven browser automation. Reads page state via accessibility tree, decides actions via LLM, executes in a loop until the goal is done.
90% pass rate on WebBench-50. Default model: gpt-5.4.
curl -fsSL https://raw.githubusercontent.com/tangle-network/browser-agent-driver/main/scripts/install.sh | shInstalls the bad command to ~/.local/bin, downloads Playwright Chromium, and adds PATH instructions. Requires Node.js 20+.
Or via npm:
npm i -g @tangle-network/browser-agent-driver
npx playwright install chromiumpnpm add @tangle-network/browser-agent-driver
pnpm add -D playwrightimport { chromium } from 'playwright'
import { PlaywrightDriver, BrowserAgent } from '@tangle-network/browser-agent-driver'
const browser = await chromium.launch()
const page = await browser.newPage()
const driver = new PlaywrightDriver(page)
const runner = new BrowserAgent({
driver,
config: { model: 'gpt-5.4' },
})
const result = await runner.run({
goal: 'Sign in and navigate to settings',
startUrl: 'https://app.example.com',
maxTurns: 30,
})
console.log(result.success, `${result.turns.length} turns`)
await browser.close()# single task
bad run --goal "Sign up for account" --url http://localhost:3000
# test suite from case file
bad run --cases ./cases.json
# authenticated session
bad run --goal "Open settings" --url https://app.example.com \
--storage-state ./.auth/session.json
# speed-optimized mode
bad run --cases ./cases.json --mode fast-explore
# evidence-rich mode for signoff
bad run --cases ./cases.json --mode full-evidenceCreate browser-agent-driver.config.ts in your project root:
import { defineConfig } from '@tangle-network/browser-agent-driver'
export default defineConfig({
model: 'gpt-5.4',
headless: true,
concurrency: 4,
maxTurns: 30,
timeoutMs: 300_000,
outputDir: './test-results',
reporters: ['junit', 'html'],
})Auto-detected by CLI and programmatic API. CLI flags override config values. Supports .ts, .js, .mjs.
import { TestRunner } from '@tangle-network/browser-agent-driver'
const suite = await runner.runSuite([
{
id: 'login',
name: 'User login flow',
startUrl: 'https://app.example.com/login',
goal: 'Log in with test credentials',
successCriteria: [
{ type: 'url-contains', value: '/dashboard' },
{ type: 'element-visible', selector: '[data-testid="user-menu"]' },
],
},
])The LLM can perform: click, type, press, hover, select, scroll, navigate, wait, evaluate, verifyPreview, complete, abort.
Each turn: observe page (a11y tree + optional screenshot) → LLM decides action → execute → verify effect → repeat.
Recovery is automatic: cookie consent, modal blockers, stuck loops (A-B-A-B oscillation), and selector failures are handled before the agent continues.
bad design-audit is a vision-powered design quality analyzer with a closed-loop improvement mode. It auto-classifies the page, runs ground-truth measurements (axe-core + WCAG contrast math), then evaluates visual quality with a composable rubric — and ranks the top fixes by ROI.
# Audit any URL — auto-classifies, no profile needed
bad design-audit --url https://your-app.com
# Multi-page crawl with cross-page systemic detection
bad design-audit --url https://your-app.com --pages 10
# Closed-loop fix: dispatch findings to a coding agent that edits source files
bad design-audit --url http://localhost:3000 \
--evolve claude-code \
--project-dir ~/my-app
# Other evolve modes: codex, opencode, css (browser injection), or any custom CLI
bad design-audit --url http://localhost:3000 --evolve "aider --message"
# Pure DOM token extraction (no LLM)
bad design-audit --url https://your-app.com --extract-tokensReports open with Top Fixes (by ROI) — the 5 highest-leverage fixes ranked by (impact × blast / effort). Findings appearing on multiple pages collapse into systemic findings. Verified end-to-end: a deliberately-bad fixture went 3.0 → 5.0 (+2.0) over 2 evolve rounds with claude-code rewriting actual source files.
See Design Audit Guide for the full pipeline, custom rubric fragments, and starter-foundry integration.
- Configuration Reference — all config options
- CLI Reference — commands, modes, profiles, auth
- Design Audit — vision-powered design quality + ROI-ranked closed-loop improvement
- Memory System — trajectory store, app knowledge, selector cache
- Benchmarks & Experiments — tiered gates, AB specs, research cycles
- Wallet & EVM Apps — MetaMask, DeFi testing, RPC interception, Anvil forks
- Providers — OpenAI, Anthropic, Codex CLI, Claude Code, sandbox backend
- Reporters & Sinks — JUnit, HTML, webhooks, custom sinks
- Custom Drivers — implement the
Driverinterface
Ships Codex skills under skills/ for test execution discipline and agent-friendly UX conventions.
npm run skills:installTag-triggered via .github/workflows/publish-npm.yml. Push browser-agent-driver-vX.Y.Z to publish.
pnpm build # TypeScript → dist/
pnpm test # vitest
pnpm lint # type-check
pnpm check:boundariesDual-licensed under MIT and Apache 2.0. See LICENSE.