Arbiter

Agent-aware code quality system for multi-agent codebases.

In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line — human or AI — and scores quality accordingly.

What Makes Arbiter Different

Feature	Traditional Tools	Arbiter
Agent attribution	None	First-class: tracks Claude, Codex, Gemini, Copilot, humans
Per-commit scoring	Repo-wide only	Scores each commit's changed files individually
Diff analysis	N/A	Score only what changed in a PR/branch
Transparency	Opaque score	Every score decomposes into lint + security + complexity
Agent-specific gates	N/A	Different quality thresholds per agent trust tier
Tool integration	Proprietary	Wraps tools you already trust: ruff, Bandit, radon, vulture
Dashboard	SaaS login	Single HTML file with per-agent timelines, commit feed, fleet view
Dependencies	Heavy	Analysis tools only; core is stdlib Python

Quick Start

git clone https://github.com/hummbl-dev/arbiter.git
cd arbiter

# Install (makes `arbiter` command available)
pip install ".[analyzers]"

# Quick score (no persistence)
arbiter score /path/to/your/repo

# Full analysis with per-commit agent attribution
arbiter analyze /path/to/your/repo

# Score only files changed since main
arbiter diff /path/to/your/repo --base main

# Agent leaderboard
arbiter agents

# Start dashboard
arbiter serve --port 8080
# Open http://localhost:8080

Without install (PYTHONPATH)

PYTHONPATH=src python -m arbiter score /path/to/your/repo

With Docker

docker build -t arbiter .
docker run -p 8080:8080 -v /path/to/repo:/repo:ro arbiter

Architecture

Git Repo ──→ [Git Historian] ──→ [Analyzer Runner] ──→ [Scoring Engine] ──→ [SQLite Store]
                  │                      │                     │                    │
           agent attribution      tool invocation        weighted rubric       trend data
           (Co-Authored-By,       (ruff, radon,          (lint 35%,             │
            email matching)        vulture, bandit)        security 30%,        ├──→ REST API
                                                           complexity 35%)     └──→ Dashboard
             ┌────────────┐
             │Diff Analyzer│ ←── v0.2: scores only changed files per commit/branch
             └────────────┘

Per-Commit Scoring (v0.2)

Every commit is scored against only the files it changed, not the entire repo. This makes the agent leaderboard meaningful — a commit that touches 1 clean file scores differently than one that touches 10 messy files.

Diff Mode (v0.2)

arbiter diff scores only files changed since a base branch. Ideal for CI/PR quality gates — fast, scoped, actionable.

Agent Attribution

Arbiter identifies which agent authored each commit:

Co-Authored-By trailer — Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Author email — maps noreply@anthropic.com → claude, codex@openai.com → codex
Default — "human" if no agent pattern matches

Configure in agents.yml:

agents:
  - name: claude
    emails: [noreply@anthropic.com]
    co_author_patterns: ["Claude\\s+(Opus|Sonnet|Haiku)"]
    trust_tier: verified
    quality_threshold: 70.0
  - name: gemini
    trust_tier: probation
    quality_threshold: 80.0  # Higher bar for probationary agents

Analyzers (pluggable)

Analyzer	Tool	What It Finds
Lint	ruff	Style violations, import errors, bugbear patterns
Complexity	radon	Cyclomatic complexity (grade A-F per function)
Security	bandit	Hardcoded secrets, shell injection, dangerous patterns
Dead Code	vulture	Unused functions, imports, variables
Duplication	AST hash	Near-duplicate function bodies

Scoring

Deterministic. Same code → same score. Always.

Overall = Lint (35%) + Security (30%) + Complexity (35%)

Penalty points by severity:
  CRITICAL: 50 | HIGH: 20 | MEDIUM: 5 | LOW: 1

Score = 100 - (total_penalty / LOC) * normalization_factor

Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60)

Dashboard (v2)

Single HTML file with Chart.js. No build step, no React, no npm.

Score Card — Big number + breakdown bars
Agent Leaderboard — Who writes the best code? Color-coded by agent
Per-Agent Quality Timeline — Score over time per agent (not just repo-wide)
Commit Feed — Recent commits with agent, score, changes, timestamp
Hotspot Files — Ranked by finding count
Fleet View — Multi-repo quality grid with color-coded scores
Tabbed UI — Overview, Commits, Fleet tabs

API

GET /api/score                  Current repo score
GET /api/agents                 Agent leaderboard
GET /api/agents/{name}/trend    Per-agent quality over time
GET /api/trend?days=30          Quality over time
GET /api/worst?limit=20         Worst files
GET /api/commits                Recent commits with scores
GET /api/commits/{hash}         Detail for one commit
GET /api/fleet                  Fleet report (multi-repo)
GET /api/health                 System health

CLI Commands

arbiter analyze <repo>                     # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude]  # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score only changed files vs base branch
arbiter agents                             # Agent leaderboard
arbiter trend [--days 30]                  # Quality trend
arbiter worst [--limit 20]                 # Worst files
arbiter commits [--agent claude]           # Recent commits
arbiter audit-fleet <directory>            # Audit all repos in a directory
arbiter fleet-report                       # Fleet quality summary
arbiter triage                             # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run]             # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080]                # API + dashboard

Tests

pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v
# 78 tests, <7 seconds

Requirements

Python 3.11+
git (for historian)
Optional: ruff, radon, vulture, bandit (for full analysis)
Docker (for containerized deployment)

HUMMBL Ecosystem

This repo is part of the HUMMBL cognitive AI architecture. Related repos:

Repo	Purpose
hummbl-governance	Governance primitives that Arbiter scores repos against
base120	Deterministic cognitive framework -- 120 mental models across 6 transformations
mcp-server	Model Context Protocol server for Base120 integration
agentic-patterns	Stdlib-only safety patterns for agentic AI systems
governed-iac-reference	Reference architecture for governed infrastructure-as-code

Learn more at hummbl.io.

License

MIT — see LICENSE.

Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 6,000+ test codebase.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.github/workflows		.github/workflows
dashboard		dashboard
src/arbiter		src/arbiter
tests		tests
.gitignore		.gitignore
AUDIT_PLAN.md		AUDIT_PLAN.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
agents.yml		agents.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Arbiter

What Makes Arbiter Different

Quick Start

Without install (PYTHONPATH)

With Docker

Architecture

Per-Commit Scoring (v0.2)

Diff Mode (v0.2)

Agent Attribution

Analyzers (pluggable)

Scoring

Dashboard (v2)

API

CLI Commands

Tests

Requirements

HUMMBL Ecosystem

License

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Arbiter

What Makes Arbiter Different

Quick Start

Without install (PYTHONPATH)

With Docker

Architecture

Per-Commit Scoring (v0.2)

Diff Mode (v0.2)

Agent Attribution

Analyzers (pluggable)

Scoring

Dashboard (v2)

API

CLI Commands

Tests

Requirements

HUMMBL Ecosystem

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages