Skip to content

Jiawen-CS/bug-rca-agent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Bug RCA Agent — Multi-Agent Trustworthy RCA Engine

Fetches a bug from Azure DevOps, runs multi-perspective AI root cause analysis, evaluates and verifies each result, then uses semantic consensus to select the best answer. Supports cross-model voting across OpenAI, Anthropic, and Google models via the GitHub Copilot API.

Features

  • Multi-Agent RCA — 3 perspective agents (Flow, Signal, Data), each using a different model by default (GPT-4o, Claude Sonnet, Gemini) for maximum diversity. Override with --model to use a single model for all agents.
  • Multi-Run Mode — Run the same model N times with varied temperatures to reduce randomness.
  • Cross-Model Vote — Run different models (e.g. GPT-4o, Claude Sonnet, Gemini) and let them compete; a consensus LLM picks the winner.
  • 5-Dimension Scoring — Each RCA is scored on Specificity, Causality, Evidence, Uncertainty Awareness, and Signal Usage (max 14 points).
  • Verification — Each RCA is checked against extracted context (stack traces, files, repro steps) and signals; fabrications are flagged.
  • Semantic Consensus — An LLM groups candidates by root cause theme, then selects the best group using a weighted formula (group size × 2 + avg score + verified count × 2).
  • Bug Type Classification — Automatically classifies bugs as runtime_error, config_data, or feature_request and adapts evaluation thresholds accordingly.
  • Table Signal Extraction — Detects blank cells, posting groups, GL accounts, and field headers from embedded HTML tables and Excel attachments.
  • Excel Attachment Parsing — Downloads and parses .xlsx attachments from Azure DevOps using openpyxl.
  • Memory Store — Verified, consensus-backed RCAs are stored to disk for future similarity lookups.

Pipeline

Fetch bug from ADO → Preprocess & extract signals → Classify bug type
  → Extract verifiable context → Retrieve similar past cases from memory
  → Multi-agent / Multi-run / Cross-model RCA
  → Evaluate each (5-dimension scoring)
  → Verify each against context + signals
  → LLM semantic consensus with weighted scoring
  → Store verified result to memory

Setup

  1. Copy .env.example to .env and fill in your values:
cp .env.example .env
  1. Install dependencies:
pip install -r requirements.txt
  1. Configure .env:
Variable Description
ADO_TOKEN Azure DevOps Personal Access Token (needs Work Items read scope)
ADO_ORG Azure DevOps organization name
ADO_PROJECT Azure DevOps project name
GITHUB_TOKEN GitHub PAT (for calling AI models via GitHub Copilot API)
API_ENDPOINT API endpoint (default: https://api.githubcopilot.com)
MODEL Default model (e.g. gpt-4o, claude-sonnet-4, gemini-2.5-pro)

Usage

# Multi-agent mode (default — 3 agents × 3 different models)
python main.py 596528

# Force all agents to use the same model
python main.py 596528 --model claude-sonnet-4

# Multi-run mode (5 identical runs, same model)
python main.py 596528 --runs 5

# Cross-model vote (run multiple models and let them compete)
python main.py 596528 --models gpt-4o claude-sonnet-4 gpt-4.1

# List all available models
python main.py --list-models

CLI Options

Flag Description
work_item_id Azure DevOps work item ID to analyze
--model, -m Force a single model for all agents (default: diverse rotation)
--runs, -r Number of multi-run passes (0 = multi-agent mode)
--models Space-separated list of models for cross-model voting
--list-models List all available models and exit

Supported Models

The GitHub Copilot endpoint supports the following models:

Provider Models
OpenAI gpt-4o, gpt-4o-mini, gpt-4.1, gpt-4.1-mini, gpt-4.1-nano, o3-mini, o4-mini
Anthropic claude-sonnet-4, claude-opus-4, claude-3.5-sonnet
Google gemini-2.5-pro

Reasoning models (o3-mini, o4-mini) are automatically handled — temperature and system role are disabled.

Output

The tool outputs a ranked comparison table, consensus analysis, and the winning RCA:

============================================================
  RCA Candidates — Comparison
============================================================

  Rank  Agent/Model            Score  S C E U Sig  Verified  Status
  ───── ────────────────────── ───── ─ ─ ─ ─ ─── ──────── ──────────
  1     Flow (gpt-4o)          11/14  3 3 2 1   2       YES  ★ WINNER
  2     Signal (claude-sonnet-4) 10/14  3 2 2 1   2       YES
  3     Data (gemini-2.5-pro)   9/14  2 2 2 1   2        no

============================================================
  Consensus & Winner Selection
============================================================

  Winner Selection:
    Consensus size:  2 (strong)
    Consensus score: 15.5 (= size×2 + avg_score + verified×2)
    Reasoning: gpt-4o and claude-sonnet-4 agree on the root cause...

============================================================
  Final RCA — Winner: gpt-4o
  Score: 11/14 | Verified: YES | Consensus: 2/3
============================================================

  Root Cause:
    ...

  Memory: STORED (passed triple filter: consensus + score + verified)

Project Structure

bug-agent-demo/
├── main.py              # Entry point & CLI
├── fetch_bug.py         # Azure DevOps REST API + Excel attachment parsing
├── preprocess.py        # HTML cleaning, signal extraction, bug classification
├── signal_extractor.py  # Keyword/pattern signals + table signal extraction
├── context_loader.py    # Extract verifiable context (stack traces, files, lines)
├── rca_agent.py         # Single-run & multi-run RCA via OpenAI-compatible API
├── multi_agent.py       # Multi-agent RCA (3 perspective agents)
├── rca_evaluator.py     # 5-dimension RCA quality scoring
├── rca_verifier.py      # Verification against context + signals
├── consensus.py         # LLM-based semantic consensus with weighted scoring
├── memory_store.py      # RCA memory storage & similarity lookup
├── config.py            # Configuration, model registry, validation
├── requirements.txt     # Python dependencies
├── .env.example         # Template for secrets
├── prompts/
│   ├── rca_prompt.txt   # RCA system prompt
│   ├── agent_data.txt   # Data-focused agent prompt
│   ├── agent_flow.txt   # Flow-focused agent prompt
│   └── agent_signal.txt # Signal-focused agent prompt
└── data/
    ├── bug_*.json       # Fetched bugs (created at runtime)
    └── rca_memory/      # Stored RCA results

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages