@tangle-network/browser-agent-driver

LLM-driven browser automation. Reads page state via accessibility tree, decides actions via LLM, executes in a loop until the goal is done.

90% pass rate on WebBench-50. Default model: gpt-5.4.

Install

CLI

curl -fsSL https://raw.githubusercontent.com/tangle-network/browser-agent-driver/main/scripts/install.sh | sh

Installs the bad command to ~/.local/bin, downloads Playwright Chromium, and adds PATH instructions. Requires Node.js 20+.

Or via npm:

npm i -g @tangle-network/browser-agent-driver
npx playwright install chromium

As a library

pnpm add @tangle-network/browser-agent-driver
pnpm add -D playwright

Quick Start

Programmatic

import { chromium } from 'playwright'
import { PlaywrightDriver, BrowserAgent } from '@tangle-network/browser-agent-driver'

const browser = await chromium.launch()
const page = await browser.newPage()
const driver = new PlaywrightDriver(page)

const runner = new BrowserAgent({
  driver,
  config: { model: 'gpt-5.4' },
})

const result = await runner.run({
  goal: 'Sign in and navigate to settings',
  startUrl: 'https://app.example.com',
  maxTurns: 30,
})

console.log(result.success, `${result.turns.length} turns`)
await browser.close()

CLI

# single task
bad run --goal "Sign up for account" --url http://localhost:3000

# test suite from case file
bad run --cases ./cases.json

# authenticated session
bad run --goal "Open settings" --url https://app.example.com \
  --storage-state ./.auth/session.json

# speed-optimized mode
bad run --cases ./cases.json --mode fast-explore

# evidence-rich mode for signoff
bad run --cases ./cases.json --mode full-evidence

Config File

Create browser-agent-driver.config.ts in your project root:

import { defineConfig } from '@tangle-network/browser-agent-driver'

export default defineConfig({
  model: 'gpt-5.4',
  headless: true,
  concurrency: 4,
  maxTurns: 30,
  timeoutMs: 300_000,
  outputDir: './test-results',
  reporters: ['junit', 'html'],
})

Auto-detected by CLI and programmatic API. CLI flags override config values. Supports .ts, .js, .mjs.

Test Suites

import { TestRunner } from '@tangle-network/browser-agent-driver'

const suite = await runner.runSuite([
  {
    id: 'login',
    name: 'User login flow',
    startUrl: 'https://app.example.com/login',
    goal: 'Log in with test credentials',
    successCriteria: [
      { type: 'url-contains', value: '/dashboard' },
      { type: 'element-visible', selector: '[data-testid="user-menu"]' },
    ],
  },
])

Actions

The LLM can perform: click, type, press, hover, select, scroll, navigate, wait, evaluate, verifyPreview, complete, abort.

How It Works

Each turn: observe page (a11y tree + optional screenshot) → LLM decides action → execute → verify effect → repeat.

Recovery is automatic: cookie consent, modal blockers, stuck loops (A-B-A-B oscillation), and selector failures are handled before the agent continues.

Design Audit

bad design-audit is a vision-powered design quality analyzer with a closed-loop improvement mode. It auto-classifies the page, runs ground-truth measurements (axe-core + WCAG contrast math), then evaluates visual quality with a composable rubric — and ranks the top fixes by ROI.

# Audit any URL — auto-classifies, no profile needed
bad design-audit --url https://your-app.com

# Multi-page crawl with cross-page systemic detection
bad design-audit --url https://your-app.com --pages 10

# Closed-loop fix: dispatch findings to a coding agent that edits source files
bad design-audit --url http://localhost:3000 \
  --evolve claude-code \
  --project-dir ~/my-app

# Other evolve modes: codex, opencode, css (browser injection), or any custom CLI
bad design-audit --url http://localhost:3000 --evolve "aider --message"

# Pure DOM token extraction (no LLM)
bad design-audit --url https://your-app.com --extract-tokens

Reports open with Top Fixes (by ROI) — the 5 highest-leverage fixes ranked by (impact × blast / effort). Findings appearing on multiple pages collapse into systemic findings. Verified end-to-end: a deliberately-bad fixture went 3.0 → 5.0 (+2.0) over 2 evolve rounds with claude-code rewriting actual source files.

See Design Audit Guide for the full pipeline, custom rubric fragments, and starter-foundry integration.

Guides

Configuration Reference — all config options
CLI Reference — commands, modes, profiles, auth
Design Audit — vision-powered design quality + ROI-ranked closed-loop improvement
Memory System — trajectory store, app knowledge, selector cache
Benchmarks & Experiments — tiered gates, AB specs, research cycles
Wallet & EVM Apps — MetaMask, DeFi testing, RPC interception, Anvil forks
Providers — OpenAI, Anthropic, Codex CLI, Claude Code, sandbox backend
Reporters & Sinks — JUnit, HTML, webhooks, custom sinks
Custom Drivers — implement the Driver interface

Research

Skills

Ships Codex skills under skills/ for test execution discipline and agent-friendly UX conventions.

npm run skills:install

Publishing

Tag-triggered via .github/workflows/publish-npm.yml. Push browser-agent-driver-vX.Y.Z to publish.

Development

pnpm build          # TypeScript → dist/
pnpm test           # vitest
pnpm lint           # type-check
pnpm check:boundaries

License

Dual-licensed under MIT and Apache 2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.claude/commands		.claude/commands
.evolve		.evolve
.github/workflows		.github/workflows
bench		bench
docs		docs
scripts		scripts
skills		skills
src		src
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.npmrc		.npmrc
CLAUDE.md		CLAUDE.md
Dockerfile		Dockerfile
Dockerfile.bench		Dockerfile.bench
LICENSE		LICENSE
README.md		README.md
RELIABILITY.md		RELIABILITY.md
package-lock.json		package-lock.json
package.json		package.json
pnpm-lock.yaml		pnpm-lock.yaml
tsconfig.json		tsconfig.json
vitest.config.ts		vitest.config.ts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

@tangle-network/browser-agent-driver

Install

CLI

As a library

Quick Start

Programmatic

CLI

Config File

Test Suites

Actions

How It Works

Design Audit

Guides

Research

Skills

Publishing

Development

License

About

Uh oh!

Releases 6

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

@tangle-network/browser-agent-driver

Install

CLI

As a library

Quick Start

Programmatic

CLI

Config File

Test Suites

Actions

How It Works

Design Audit

Guides

Research

Skills

Publishing

Development

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages