Bug RCA Agent — Multi-Agent Trustworthy RCA Engine

Fetches a bug from Azure DevOps, runs multi-perspective AI root cause analysis, evaluates and verifies each result, then uses semantic consensus to select the best answer. Supports cross-model voting across OpenAI, Anthropic, and Google models via the GitHub Copilot API.

Features

Multi-Agent RCA — 3 perspective agents (Flow, Signal, Data), each using a different model by default (GPT-4o, Claude Sonnet, Gemini) for maximum diversity. Override with --model to use a single model for all agents.
Multi-Run Mode — Run the same model N times with varied temperatures to reduce randomness.
Cross-Model Vote — Run different models (e.g. GPT-4o, Claude Sonnet, Gemini) and let them compete; a consensus LLM picks the winner.
5-Dimension Scoring — Each RCA is scored on Specificity, Causality, Evidence, Uncertainty Awareness, and Signal Usage (max 14 points).
Verification — Each RCA is checked against extracted context (stack traces, files, repro steps) and signals; fabrications are flagged.
Semantic Consensus — An LLM groups candidates by root cause theme, then selects the best group using a weighted formula (group size × 2 + avg score + verified count × 2).
Bug Type Classification — Automatically classifies bugs as runtime_error, config_data, or feature_request and adapts evaluation thresholds accordingly.
Table Signal Extraction — Detects blank cells, posting groups, GL accounts, and field headers from embedded HTML tables and Excel attachments.
Excel Attachment Parsing — Downloads and parses .xlsx attachments from Azure DevOps using openpyxl.
Memory Store — Verified, consensus-backed RCAs are stored to disk for future similarity lookups.

Pipeline

Fetch bug from ADO → Preprocess & extract signals → Classify bug type
  → Extract verifiable context → Retrieve similar past cases from memory
  → Multi-agent / Multi-run / Cross-model RCA
  → Evaluate each (5-dimension scoring)
  → Verify each against context + signals
  → LLM semantic consensus with weighted scoring
  → Store verified result to memory

Setup

Copy .env.example to .env and fill in your values:

cp .env.example .env

Install dependencies:

pip install -r requirements.txt

Configure .env:

Variable	Description
`ADO_TOKEN`	Azure DevOps Personal Access Token (needs Work Items read scope)
`ADO_ORG`	Azure DevOps organization name
`ADO_PROJECT`	Azure DevOps project name
`GITHUB_TOKEN`	GitHub PAT (for calling AI models via GitHub Copilot API)
`API_ENDPOINT`	API endpoint (default: `https://api.githubcopilot.com`)
`MODEL`	Default model (e.g. `gpt-4o`, `claude-sonnet-4`, `gemini-2.5-pro`)

Usage

# Multi-agent mode (default — 3 agents × 3 different models)
python main.py 596528

# Force all agents to use the same model
python main.py 596528 --model claude-sonnet-4

# Multi-run mode (5 identical runs, same model)
python main.py 596528 --runs 5

# Cross-model vote (run multiple models and let them compete)
python main.py 596528 --models gpt-4o claude-sonnet-4 gpt-4.1

# List all available models
python main.py --list-models

CLI Options

Flag	Description
`work_item_id`	Azure DevOps work item ID to analyze
`--model`, `-m`	Force a single model for all agents (default: diverse rotation)
`--runs`, `-r`	Number of multi-run passes (0 = multi-agent mode)
`--models`	Space-separated list of models for cross-model voting
`--list-models`	List all available models and exit

Supported Models

The GitHub Copilot endpoint supports the following models:

Provider	Models
OpenAI	`gpt-4o`, `gpt-4o-mini`, `gpt-4.1`, `gpt-4.1-mini`, `gpt-4.1-nano`, `o3-mini`, `o4-mini`
Anthropic	`claude-sonnet-4`, `claude-opus-4`, `claude-3.5-sonnet`
Google	`gemini-2.5-pro`

Reasoning models (o3-mini, o4-mini) are automatically handled — temperature and system role are disabled.

Output

The tool outputs a ranked comparison table, consensus analysis, and the winning RCA:

============================================================
  RCA Candidates — Comparison
============================================================

  Rank  Agent/Model            Score  S C E U Sig  Verified  Status
  ───── ────────────────────── ───── ─ ─ ─ ─ ─── ──────── ──────────
  1     Flow (gpt-4o)          11/14  3 3 2 1   2       YES  ★ WINNER
  2     Signal (claude-sonnet-4) 10/14  3 2 2 1   2       YES
  3     Data (gemini-2.5-pro)   9/14  2 2 2 1   2        no

============================================================
  Consensus & Winner Selection
============================================================

  Winner Selection:
    Consensus size:  2 (strong)
    Consensus score: 15.5 (= size×2 + avg_score + verified×2)
    Reasoning: gpt-4o and claude-sonnet-4 agree on the root cause...

============================================================
  Final RCA — Winner: gpt-4o
  Score: 11/14 | Verified: YES | Consensus: 2/3
============================================================

  Root Cause:
    ...

  Memory: STORED (passed triple filter: consensus + score + verified)

Project Structure

bug-agent-demo/
├── main.py              # Entry point & CLI
├── fetch_bug.py         # Azure DevOps REST API + Excel attachment parsing
├── preprocess.py        # HTML cleaning, signal extraction, bug classification
├── signal_extractor.py  # Keyword/pattern signals + table signal extraction
├── context_loader.py    # Extract verifiable context (stack traces, files, lines)
├── rca_agent.py         # Single-run & multi-run RCA via OpenAI-compatible API
├── multi_agent.py       # Multi-agent RCA (3 perspective agents)
├── rca_evaluator.py     # 5-dimension RCA quality scoring
├── rca_verifier.py      # Verification against context + signals
├── consensus.py         # LLM-based semantic consensus with weighted scoring
├── memory_store.py      # RCA memory storage & similarity lookup
├── config.py            # Configuration, model registry, validation
├── requirements.txt     # Python dependencies
├── .env.example         # Template for secrets
├── prompts/
│   ├── rca_prompt.txt   # RCA system prompt
│   ├── agent_data.txt   # Data-focused agent prompt
│   ├── agent_flow.txt   # Flow-focused agent prompt
│   └── agent_signal.txt # Signal-focused agent prompt
└── data/
    ├── bug_*.json       # Fetched bugs (created at runtime)
    └── rca_memory/      # Stored RCA results

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Bug RCA Agent — Multi-Agent Trustworthy RCA Engine

Features

Pipeline

Setup

Usage

CLI Options

Supported Models

Output

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
prompts		prompts
.env.example		.env.example
.gitignore		.gitignore
PRESENTATION.md		PRESENTATION.md
README.md		README.md
_check.py		_check.py
_find_images.py		_find_images.py
_inspect_html.py		_inspect_html.py
_test_parse.py		_test_parse.py
_test_selection.py		_test_selection.py
config.py		config.py
consensus.py		consensus.py
context_loader.py		context_loader.py
fetch_bug.py		fetch_bug.py
main.py		main.py
memory_store.py		memory_store.py
multi_agent.py		multi_agent.py
preprocess.py		preprocess.py
rca_agent.py		rca_agent.py
rca_evaluator.py		rca_evaluator.py
rca_verifier.py		rca_verifier.py
requirements.txt		requirements.txt
signal_extractor.py		signal_extractor.py

Folders and files

Latest commit

History

Repository files navigation

Bug RCA Agent — Multi-Agent Trustworthy RCA Engine

Features

Pipeline

Setup

Usage

CLI Options

Supported Models

Output

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages