Skip to content

hummbl-dev/arbiter

Repository files navigation

Arbiter

Agent-aware code quality system for multi-agent codebases.

In 2026, code is written by fleets of AI agents. Arbiter knows who wrote each line — human or AI — and scores quality accordingly.

What Makes Arbiter Different

Feature Traditional Tools Arbiter
Agent attribution None First-class: tracks Claude, Codex, Gemini, Copilot, humans
Per-commit scoring Repo-wide only Scores each commit's changed files individually
Diff analysis N/A Score only what changed in a PR/branch
Transparency Opaque score Every score decomposes into lint + security + complexity
Agent-specific gates N/A Different quality thresholds per agent trust tier
Tool integration Proprietary Wraps tools you already trust: ruff, Bandit, radon, vulture
Dashboard SaaS login Single HTML file with per-agent timelines, commit feed, fleet view
Dependencies Heavy Analysis tools only; core is stdlib Python

Quick Start

git clone https://github.com/hummbl-dev/arbiter.git
cd arbiter

# Install (makes `arbiter` command available)
pip install ".[analyzers]"

# Quick score (no persistence)
arbiter score /path/to/your/repo

# Full analysis with per-commit agent attribution
arbiter analyze /path/to/your/repo

# Score only files changed since main
arbiter diff /path/to/your/repo --base main

# Agent leaderboard
arbiter agents

# Start dashboard
arbiter serve --port 8080
# Open http://localhost:8080

Without install (PYTHONPATH)

PYTHONPATH=src python -m arbiter score /path/to/your/repo

With Docker

docker build -t arbiter .
docker run -p 8080:8080 -v /path/to/repo:/repo:ro arbiter

Architecture

Git Repo ──→ [Git Historian] ──→ [Analyzer Runner] ──→ [Scoring Engine] ──→ [SQLite Store]
                  │                      │                     │                    │
           agent attribution      tool invocation        weighted rubric       trend data
           (Co-Authored-By,       (ruff, radon,          (lint 35%,             │
            email matching)        vulture, bandit)        security 30%,        ├──→ REST API
                                                           complexity 35%)     └──→ Dashboard
             ┌────────────┐
             │Diff Analyzer│ ←── v0.2: scores only changed files per commit/branch
             └────────────┘

Per-Commit Scoring (v0.2)

Every commit is scored against only the files it changed, not the entire repo. This makes the agent leaderboard meaningful — a commit that touches 1 clean file scores differently than one that touches 10 messy files.

Diff Mode (v0.2)

arbiter diff scores only files changed since a base branch. Ideal for CI/PR quality gates — fast, scoped, actionable.

Agent Attribution

Arbiter identifies which agent authored each commit:

  1. Co-Authored-By trailerCo-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
  2. Author email — maps noreply@anthropic.com → claude, codex@openai.com → codex
  3. Default — "human" if no agent pattern matches

Configure in agents.yml:

agents:
  - name: claude
    emails: [noreply@anthropic.com]
    co_author_patterns: ["Claude\\s+(Opus|Sonnet|Haiku)"]
    trust_tier: verified
    quality_threshold: 70.0
  - name: gemini
    trust_tier: probation
    quality_threshold: 80.0  # Higher bar for probationary agents

Analyzers (pluggable)

Analyzer Tool What It Finds
Lint ruff Style violations, import errors, bugbear patterns
Complexity radon Cyclomatic complexity (grade A-F per function)
Security bandit Hardcoded secrets, shell injection, dangerous patterns
Dead Code vulture Unused functions, imports, variables
Duplication AST hash Near-duplicate function bodies

Scoring

Deterministic. Same code → same score. Always.

Overall = Lint (35%) + Security (30%) + Complexity (35%)

Penalty points by severity:
  CRITICAL: 50 | HIGH: 20 | MEDIUM: 5 | LOW: 1

Score = 100 - (total_penalty / LOC) * normalization_factor

Grades: A (90+) | B (80+) | C (70+) | D (60+) | F (<60)

Dashboard (v2)

Single HTML file with Chart.js. No build step, no React, no npm.

  • Score Card — Big number + breakdown bars
  • Agent Leaderboard — Who writes the best code? Color-coded by agent
  • Per-Agent Quality Timeline — Score over time per agent (not just repo-wide)
  • Commit Feed — Recent commits with agent, score, changes, timestamp
  • Hotspot Files — Ranked by finding count
  • Fleet View — Multi-repo quality grid with color-coded scores
  • Tabbed UI — Overview, Commits, Fleet tabs

API

GET /api/score                  Current repo score
GET /api/agents                 Agent leaderboard
GET /api/agents/{name}/trend    Per-agent quality over time
GET /api/trend?days=30          Quality over time
GET /api/worst?limit=20         Worst files
GET /api/commits                Recent commits with scores
GET /api/commits/{hash}         Detail for one commit
GET /api/fleet                  Fleet report (multi-repo)
GET /api/health                 System health

CLI Commands

arbiter analyze <repo>                     # Full analysis + per-commit scoring + persist
arbiter score <repo> [--json] [--exclude]  # Quick score (no persist)
arbiter diff <repo> [--base main] [--json] # Score only changed files vs base branch
arbiter agents                             # Agent leaderboard
arbiter trend [--days 30]                  # Quality trend
arbiter worst [--limit 20]                 # Worst files
arbiter commits [--agent claude]           # Recent commits
arbiter audit-fleet <directory>            # Audit all repos in a directory
arbiter fleet-report                       # Fleet quality summary
arbiter triage                             # Auto-classify repos: green/yellow/red/archive
arbiter fix <repo> [--dry-run]             # Auto-fix ruff findings + before/after score
arbiter serve [--port 8080]                # API + dashboard

Tests

pip install ".[test]"
PYTHONPATH=src python -m pytest tests/ -v
# 78 tests, <7 seconds

Requirements

  • Python 3.11+
  • git (for historian)
  • Optional: ruff, radon, vulture, bandit (for full analysis)
  • Docker (for containerized deployment)

HUMMBL Ecosystem

This repo is part of the HUMMBL cognitive AI architecture. Related repos:

Repo Purpose
hummbl-governance Governance primitives that Arbiter scores repos against
base120 Deterministic cognitive framework -- 120 mental models across 6 transformations
mcp-server Model Context Protocol server for Base120 integration
agentic-patterns Stdlib-only safety patterns for agentic AI systems
governed-iac-reference Reference architecture for governed infrastructure-as-code

Learn more at hummbl.io.

License

MIT — see LICENSE.


Built by HUMMBL LLC from production experience coordinating Claude, Codex, Gemini, and human engineers on a 6,000+ test codebase.

About

Agent-aware code quality scoring — cyclomatic complexity, dependency analysis, test coverage, governance compliance. Grades repos A through F with evidence-backed findings.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors