Add finding validation pass to /review — reduce false positives by kristjanakkermann · Pull Request #610 · garrytan/gstack

kristjanakkermann · 2026-03-28T22:06:43Z

Hi Garry and team — thank you for open-sourcing gstack. I've been using it daily on a Rails monolith (payments, LLM integrations, POS terminals) and the /review skill consistently catches things I would have missed. This is a fantastic tool.

This PR proposes a targeted addition to /review that I believe would meaningfully improve its precision without sacrificing any of its existing coverage.

The Problem: False Positives Erode Trust

After running /review across dozens of branches, I noticed a pattern: the two-pass checklist is excellent at finding issues, but CRITICAL findings go straight to Fix-First without independent verification. This means occasional false positives — a "race condition" that's actually guarded by a DB constraint, a "missing validation" that's handled in a before_action upstream, a flagged issue that predates the branch — reach the user labeled as critical.

Each false positive teaches the developer to trust the reviewer a little less. Over time, this is the difference between a tool people rely on and one they skip. I think /review deserves better, because the detection quality is already there — it just needs a precision gate.

Inspiration: Anthropic's Native Code Review Plugin

I compared gstack's /review with Anthropic's official code-review plugin (shipped in claude-code-plugins). Their approach has an elegant mechanism that gstack currently lacks:

Multiple agents find issues independently (similar to gstack's two-pass)
Separate validation subagents are then launched per finding to independently verify each issue
Only validated findings survive to the output

The key insight is that the finder and the validator operate with different cognitive biases. The finder is primed to spot problems (confirmation bias toward flagging). The validator starts fresh with just the claim and the code, and must prove the claim true. This adversarial setup naturally filters false positives — the same principle as having a different person review your code review.

What This PR Adds

A new Step 4.9: Finding Validation Pass, inserted between the finding-producing steps (4, 4.5, 4.75) and Fix-First (Step 5). Three changes total:

1. New Step 4.9 — The Precision Gate

For each CRITICAL finding from Step 4, a parallel validation subagent is launched with:

The finding description and the file:line cited
The PR/branch context (commit messages, stated intent from Step 1.5)
The relevant code context (not just the diff hunk — the surrounding code)

Each subagent must independently confirm:

The issue actually exists in the current code (not a misread of the diff)
The issue is not already handled elsewhere (guards, validations, constraints the reviewer missed)
The issue is not a pre-existing condition (was it introduced by this branch?)

2. Three-Way Classification (an improvement over binary)

Rather than binary keep/discard, findings are classified as:

VALIDATED — subagent confirms with evidence → proceeds to Fix-First as CRITICAL
REJECTED — subagent proves the issue doesn't exist or is already handled → filtered out silently
UNCERTAIN — subagent can't confirm or deny → downgraded from CRITICAL to INFORMATIONAL with note "Unverified — flagged for manual review"

The UNCERTAIN category is intentional. "I can't prove it's a bug" is very different from "I proved it's not a bug" — especially for security findings. Downgrading preserves the signal without the false confidence of a CRITICAL label, and lets the developer make the final call.

3. Cost-Conscious Design

Not every finding needs validation. The step is selective:

Finding type	Validated?	Why
CRITICAL (SQL safety, races, XSS, LLM trust)	Yes — Opus subagents	Highest stakes, highest false-positive cost
CLAUDE.md compliance	Yes — Sonnet subagents	Lower stakes, cheaper model sufficient
INFORMATIONAL (Pass 2)	No (unless data safety)	Cost not justified for lower severity
Design review (Step 4.5)	No	Visual/judgment-based, not code-verifiable
Test coverage gaps (Step 4.75)	No	Factual — the test either exists or it doesn't

The false positive checklist is adapted from Anthropic's proven list: pre-existing issues, intentionally correct patterns, pedantic nitpicks, linter-catchable issues, general quality concerns, and CLAUDE.md items silenced by lint-ignore comments.

Why I Think This Belongs in gstack

gstack already has the adversarial review in Step 5.7 (cross-model Codex + Claude synthesis). That catches issues the structured review misses. This validation pass does the complementary thing: it removes findings that shouldn't be there. Together, they give /review both high recall (catch everything) and high precision (only flag what's real).

The addition is ~40 lines of Markdown, zero new dependencies, zero new binaries, and it respects the existing architecture — it slots cleanly between 4.75 and 5 without touching any other step's logic.

Changes

review/SKILL.md: New Step 4.9 between test coverage (4.75) and Fix-First (5)
review/SKILL.md: Step 5 header notes that CRITICAL findings are pre-validated
review/SKILL.md: Important Rules section adds validation-first principle

Test Plan

Run /review on a branch with a known false positive (e.g., a "race condition" guarded by a DB constraint) — verify Step 4.9 REJECTS it
Run /review on a branch with a real SQL injection — verify Step 4.9 VALIDATES it
Run /review on a small diff (<3 CRITICAL findings) — verify validation subagents fire
Run /review on a clean branch — verify no regression in the "No issues found" path
Check token cost delta on a medium-sized diff (~100 lines) with 2-3 CRITICAL findings

Thank you for considering this. Happy to iterate on the approach if you'd prefer a different design. I genuinely appreciate the work you and the team have put into making this available to everyone.

Introduce independent subagent validation for CRITICAL findings before they enter Fix-First, reducing false positives and building reviewer trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Add finding validation pass (Step 4.9) to /review skill

e9a9e06

Introduce independent subagent validation for CRITICAL findings before they enter Fix-First, reducing false positives and building reviewer trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add finding validation pass to /review — reduce false positives#610

Add finding validation pass to /review — reduce false positives#610
kristjanakkermann wants to merge 1 commit intogarrytan:mainfrom
kristjanakkermann:feat/review-validation-pass

kristjanakkermann commented Mar 28, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

kristjanakkermann commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

The Problem: False Positives Erode Trust

Inspiration: Anthropic's Native Code Review Plugin

What This PR Adds

1. New Step 4.9 — The Precision Gate

2. Three-Way Classification (an improvement over binary)

3. Cost-Conscious Design

Why I Think This Belongs in gstack

Changes

Test Plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kristjanakkermann commented Mar 28, 2026 •

edited

Loading