WhyLab: Causal Decision Intelligence Engine

"Don't just predict the future. Cause it."

▲ 인터랙티브 대시보드 — ROI 시뮬레이터, AI 토론, CATE 탐색기, 인과 그래프

WhyLab is a Decision Intelligence Engine powered by Multi-Agent Debate and a Causal Audit Framework for autonomous agent systems.

v1.0: Causal inference pipeline (22-Cell), MCP server, interactive dashboard
v2.0: Causal audit engine for agent self-improvement (drift detection, sensitivity analysis, Lyapunov stability)

🎯 Why WhyLab?

For POs: "Rollout or Not?" — Get actionable verdicts (e.g., "ROI +12%, Risk Low → Rollout").
For Data Scientists: SOTA accuracy (T-Learner PEHE 1.164 on IHDP) + IV/DiD/RDD/Granger out-of-the-box.
For Devs: 3 lines of code to integrate causal AI into your pipeline.

import whylab

result = whylab.analyze(data, treatment='coupon', outcome='purchase')
print(result.verdict)   # "CAUSAL"
result.summary()        # ATE, CI, Meta-learners, Sensitivity, Debate verdict

What Makes WhyLab Different?

	DoWhy	EconML	CausalML	WhyLab
Causal Graph Modeling	O	-	-	O
Meta-Learners (S/T/X/DR/R)	-	O	O	O
Double Machine Learning	-	O	-	O
Refutation Tests	O	-	-	O
IV / DiD / RDD	-	△	-	O
Granger / CausalImpact	-	-	-	O
Structural Counterfactual	-	-	-	O
AI Agent Auto-Debate	-	-	-	O
Auto Verdict (CAUSAL/NOT)	-	-	-	O
Auto Discovery (PC + LLM)	-	-	-	O
Native MCP v2 Server	-	-	-	O
Policy Simulator (What-If)	-	-	-	O
Interactive Dashboard	-	-	-	O
REST API Server	-	-	-	O

Architecture: 22-Cell Pipeline + MCP Integration

                        ┌──────────────────────────────────┐
                        │     External AI Agents           │
                        │  (Claude Desktop, GPT, etc.)     │
                        └──────────┬───────────────────────┘
                                   │ MCP Protocol
                        ┌──────────▼───────────────────────┐
                        │   WhyLab MCP Server (v2)         │
                        │   7 Tools + 3 Resources          │
                        └──────────┬───────────────────────┘
                                   │
  ┌────────────────────────────────▼──────────────────────────────┐
  │                    22-Cell Causal Engine                      │
  │                                                              │
  │  Data → Discovery → AutoCausal → Causal → MetaLearner →     │
  │    Conformal → Explain → Refutation → Sensitivity →          │
  │      QuasiExp → Temporal → Counterfactual → DoseResponse →   │
  │        DeepCATE → Fairness → Benchmark →                     │
  │          Viz → Debate → Export → Report                      │
  └──────────────────────────┬───────────────────────────────────┘
                             │
              ┌──────────────▼──────────────┐
              │   Next.js Dashboard         │
              │   Policy Simulator (What-If)│
              │   CATE Explorer             │
              │   AI Debate Verdict         │
              └────────────────────────────┘

📋 Full 22-Cell Reference

#	Cell	Role
1	`DataCell`	SCM-based synthetic data + external CSV/SQL/BigQuery
2	`DiscoveryCell`	Auto causal graph discovery (PC + LLM hybrid)
3	`AutoCausalCell`	Data profiling → methodology auto-recommendation
4	`CausalCell`	DML estimation (Linear/Forest/Sparse)
5	`MetaLearnerCell`	5 meta-learners (S/T/X/DR/R) + Oracle ensemble
6	`ConformalCell`	Distribution-free confidence intervals
7	`ExplainCell`	SHAP-based feature importance & explanations
8	`RefutationCell`	Placebo, Bootstrap, Random Cause tests
9	`SensitivityCell`	E-value, Overlap, GATES analysis
10	`QuasiExperimentalCell`	IV (2SLS), DiD (parallel trend), Sharp RDD
11	`TemporalCausalCell`	Granger causality, CausalImpact, lag correlation
12	`CounterfactualCell`	Structural counterfactuals, Manski bounds, ITE ranking
13	`DoseResponseCell`	Continuous treatment dose-response estimation
14	`DeepCateCell`	Deep learning-based CATE estimation
15	`FairnessAuditCell`	Algorithmic fairness across protected groups
16	`BenchmarkCell`	Automated IHDP/ACIC/Jobs evaluation
17	`VizCell`	Publication-ready figures
18	`DebateCell`	3-agent LLM debate (Growth Hacker / Risk Manager / PO)
19	`ExportCell`	JSON serialization + LLM debate results
20	`ReportCell`	Automated analysis reports

Multi-Agent Debate System

Three AI agents simulate real organizational decision-making:

Growth Hacker (10 evidence types): Finds revenue opportunities from causal signals
Risk Manager (8 attack vectors): Warns about potential losses and model vulnerabilities
Product Owner (Judge): Synthesizes Growth vs Risk → delivers actionable verdict
- 🚀 Rollout 100% | ⚖️ A/B Test 5% | 🛑 Reject

Supports LLM-enhanced debate (Gemini API) with automatic rule-based fallback.

Native MCP v2 Server — Agent Interop Standard

WhyLab ships with a built-in Model Context Protocol (MCP) server, enabling seamless integration with any MCP-compatible AI agent (Claude Desktop, Cursor, etc.).

7 Tools:

Tool	Description
`run_analysis`	Execute full causal pipeline (Scenario A or B)
`get_debate_verdict`	Get AI debate result (CAUSAL / NOT_CAUSAL / UNCERTAIN)
`simulate_intervention`	What-If policy simulation (intensity × target ratio → ROI)
`ask_rag`	Natural language Q&A with persona (Growth / Risk / PO)
`compare_scenarios`	Side-by-side scenario comparison
`run_drift_check`	Causal drift detection (ATE/CATE shift monitoring)
`get_monitoring_status`	Current monitoring system health

3 Resources: whylab://data/latest · whylab://report/latest · whylab://benchmark/summary

# Quick start: connect Claude Desktop to WhyLab
python -m engine.server.mcp_server

// claude_desktop_config.json
{
  "mcpServers": {
    "whylab": {
      "command": "python",
      "args": ["-m", "engine.server.mcp_server"]
    }
  }
}

Benchmark Results

Evaluated on 3 standard causal inference benchmarks (10 replications each):

IHDP (Hill 2011, n=747, p=25)

Method	sqrt(PEHE)	ATE Bias
T-Learner	1.164 +/- 0.024	0.039 +/- 0.031
DR-Learner	1.194 +/- 0.034	0.038 +/- 0.029
Ensemble	1.214 +/- 0.025	0.046 +/- 0.034
X-Learner	1.324 +/- 0.029	0.035 +/- 0.024
S-Learner	1.383 +/- 0.033	0.064 +/- 0.040
LinearDML	1.465 +/- 0.024	0.066 +/- 0.061

Ref: BART ~1.0 (Hill 2011), GANITE ~1.9 (Yoon 2018), CEVAE ~2.7 (Louizos 2017)

ACIC (Dorie 2019, n=4802, p=58)

Method	sqrt(PEHE)	ATE Bias
S-Learner	0.491 +/- 0.017	0.018 +/- 0.013
X-Learner	0.569 +/- 0.009	0.020 +/- 0.011
Ensemble	0.612 +/- 0.013	0.013 +/- 0.007
LinearDML	0.614 +/- 0.010	0.071 +/- 0.025
DR-Learner	0.799 +/- 0.017	0.040 +/- 0.018
T-Learner	0.835 +/- 0.013	0.041 +/- 0.018

Jobs (LaLonde 1986, n=722, p=8)

Method	sqrt(PEHE)	ATE Bias
LinearDML	170.5 +/- 32.3	39.2 +/- 36.6
S-Learner	288.4 +/- 11.3	79.2 +/- 36.8
X-Learner	377.2 +/- 22.4	38.6 +/- 16.3
Ensemble	381.8 +/- 18.4	39.8 +/- 33.8
T-Learner	482.7 +/- 23.2	35.2 +/- 21.7
DR-Learner	535.0 +/- 29.3	34.9 +/- 25.2

Quick Start

Prerequisites

Python 3.9+
Node.js 18+ (Dashboard)

Installation

# Clone
git clone https://github.com/Yesol-Pilot/WhyLab.git
cd WhyLab

# Python
pip install -e ".[all]"

# Dashboard
cd dashboard; npm install

Usage

1. Python SDK (3 Lines)

import whylab

result = whylab.analyze("data.csv", treatment="T", outcome="Y")
result.summary()

2. CLI — Causal Pipeline

python -m engine.pipeline --scenario A   # Credit limit -> Default
python -m engine.pipeline --scenario B   # Marketing coupon -> Signup

3. REST API Server

# Start
uvicorn whylab.server:app --reload --port 8000

# Analyze
curl -X POST http://localhost:8000/api/v1/analyze \
  -H "Content-Type: application/json" \
  -d '{"treatment": "T", "outcome": "Y", "data_path": "data.csv"}'

# Available methods
curl http://localhost:8000/api/v1/methods

4. Connect Your Data (CSV / SQL / BigQuery)

# CSV
python -m engine.cli --data "sales.csv" --treatment coupon --outcome purchase

# PostgreSQL
python -m engine.cli --data "postgresql://user:pass@host/db" \
  --db-query "SELECT * FROM users" --treatment coupon --outcome purchase

# BigQuery
python -m engine.cli --data "my-gcp-project" --source-type bigquery \
  --db-query "SELECT * FROM dataset.table" --treatment treatment --outcome outcome

5. Ask Questions (RAG Agent)

python -m engine.cli --query "쿠폰 효과가 있어?" --persona growth_hacker
python -m engine.cli --query "리스크는 없어?" --persona risk_manager

6. Run Benchmarks

python -m engine.pipeline --benchmark ihdp acic jobs \
  --replications 10 --output results/ --latex

7. Launch Dashboard

cd dashboard; npm run dev
# Open http://localhost:4000

8. Docker (GPU)

docker compose up whylab       # Default pipeline
docker compose up benchmark    # Benchmark mode
docker compose up pipeline     # Full pipeline + Debate

Project Structure

WhyLab/
  engine/
    cells/            # 22 modular analysis cells
    agents/           # 11 AI agents (debate, discovery, architect, etc.)
    connectors/       # Multi-source data (CSV/SQL/BigQuery)
    monitoring/       # Causal drift detection & alerting
    data/             # Benchmark data loaders (IHDP/ACIC/Jobs)
    rag/              # RAG-based Q&A agent (multi-turn, persona)
    server/           # MCP Protocol server (7 tools, 3 resources)
    audit.py          # Governance: analysis audit trail (JSONL)
    config.py         # Central configuration (no magic numbers)
    orchestrator.py   # 22-cell pipeline orchestrator
    cli.py            # CLI entry point
    audit/            # v2.0 Causal Audit Engine
      drift_monitor.py  # Information-theoretic drift detection (R1)
      sensitivity.py    # E-value + Partial R² (R2)
      lyapunov.py       # ζ stability controller (R5)
      outbox.py         # Transactional outbox (C3)
      llm_judge/        # ARES evaluator framework (R3)
      methods/          # DML, CausalImpact, GSC
    telemetry/          # OTel dynamic sampling (C4)
    deploy/             # Shadow deployment controller (P4)
  whylab/
    api.py            # 3-line SDK (analyze → CausalResult)
    server.py         # SDK REST API server (port 8000)
  api/
    main.py           # Dashboard Backend API (port 4001)
  dashboard/          # Next.js interactive dashboard
    components/       #   PolicySimulator, WhatIfSimulator, DebateVerdict, ...
  tests/              # 142 tests (v1 pipeline + v2 audit engine)
  results/            # Benchmark output (JSON + LaTeX)
  .github/workflows/  # CI (80% gate) + Deploy + PyPI Release (OIDC)

v2.0 — Causal Audit Engine

An audit framework that prevents autonomous agents from diverging due to hallucination feedback loops.

Component Maturity

Component	Maturity	Description
Drift Index (R1)	✅ Production	Information-theoretic dynamic weights (entropy-inverse)
E-value Sensitivity (R2)	✅ Production	VanderWeele 2017 + Cinelli 2020 Partial R²
Lyapunov ζ Controller (R5)	✅ Production	ζ_max bound clipping, convergence tracking
Outbox Pattern (C3)	✅ Production	WAL-based at-least-once delivery + DLQ
Partitioning + Rollup (C2)	✅ Production	Weekly partitions, daily rollup (permanent)
OTel Dynamic Sampling (C4)	✅ Production	5% normal, 100% errors/DI-spikes
Shadow Deploy (P4)	✅ Production	3-phase promotion, cost circuit breaker
Chaos Tests (C1)	✅ Tests	Retry storms, DLQ, partial failures
ARES Evaluator (R3)	⚠️ Framework	Monte Carlo + Beta-Binomial CI; LLM mock only
CausalFlip (R3)	⚠️ Framework	Keyword-based judge; needs real LLM integration
Who&When Benchmark (R4)	⚠️ Internal	Synthetic data only; NOT validated on real dataset

Legend: ✅ Production-ready | ⚠️ Framework/Mock (needs real integration)

Quick Test (v2.0 Audit Engine)

# All 142 tests (~4 seconds)
python -m pytest tests/ -q --tb=short

# By module
python -m pytest tests/test_audit.py tests/test_methods.py      # Core audit
python -m pytest tests/test_sensitivity.py tests/test_lyapunov.py  # R2 + R5
python -m pytest tests/test_causal_flip.py                        # R3 ARES
python -m pytest tests/test_otel.py tests/test_shadow.py          # C4 + P4

Tests

# Full suite
python -m pytest tests/ -v

142 tests passing (v1.0 Pipeline + v2.0 Audit Engine)

Citation

If you use WhyLab in your research, please cite:

@software{whylab2026,
  title={WhyLab: Causal Decision Intelligence Engine with Multi-Agent Debate},
  author={Yesol Heo},
  year={2026},
  url={https://github.com/Yesol-Pilot/WhyLab}
}

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 74 Commits
.agent		.agent
.gemini		.gemini
.github/workflows		.github/workflows
api		api
dashboard		dashboard
data		data
docs		docs
engine		engine
experiments		experiments
infra/sql		infra/sql
paper		paper
results		results
results_v2		results_v2
scripts		scripts
supabase/migrations		supabase/migrations
tests		tests
uploads		uploads
whylab		whylab
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml
pytest.ini		pytest.ini
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WhyLab: Causal Decision Intelligence Engine

🎯 Why WhyLab?

What Makes WhyLab Different?

Architecture: 22-Cell Pipeline + MCP Integration

Multi-Agent Debate System

Native MCP v2 Server — Agent Interop Standard

Benchmark Results

IHDP (Hill 2011, n=747, p=25)

ACIC (Dorie 2019, n=4802, p=58)

Jobs (LaLonde 1986, n=722, p=8)

Quick Start

Prerequisites

Installation

Usage

1. Python SDK (3 Lines)

2. CLI — Causal Pipeline

3. REST API Server

4. Connect Your Data (CSV / SQL / BigQuery)

5. Ask Questions (RAG Agent)

6. Run Benchmarks

7. Launch Dashboard

8. Docker (GPU)

Project Structure

v2.0 — Causal Audit Engine

Component Maturity

Quick Test (v2.0 Audit Engine)

Tests

Citation

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

License

Yesol-Pilot/WhyLab

Folders and files

Latest commit

History

Repository files navigation

WhyLab: Causal Decision Intelligence Engine

🎯 Why WhyLab?

What Makes WhyLab Different?

Architecture: 22-Cell Pipeline + MCP Integration

Multi-Agent Debate System

Native MCP v2 Server — Agent Interop Standard

Benchmark Results

IHDP (Hill 2011, n=747, p=25)

ACIC (Dorie 2019, n=4802, p=58)

Jobs (LaLonde 1986, n=722, p=8)

Quick Start

Prerequisites

Installation

Usage

1. Python SDK (3 Lines)

2. CLI — Causal Pipeline

3. REST API Server

4. Connect Your Data (CSV / SQL / BigQuery)

5. Ask Questions (RAG Agent)

6. Run Benchmarks

7. Launch Dashboard

8. Docker (GPU)

Project Structure

v2.0 — Causal Audit Engine

Component Maturity

Quick Test (v2.0 Audit Engine)

Tests

Citation

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages