Add finding validation pass to /review — reduce false positives#610
Open
kristjanakkermann wants to merge 1 commit intogarrytan:mainfrom
Open
Add finding validation pass to /review — reduce false positives#610kristjanakkermann wants to merge 1 commit intogarrytan:mainfrom
kristjanakkermann wants to merge 1 commit intogarrytan:mainfrom
Conversation
Introduce independent subagent validation for CRITICAL findings before they enter Fix-First, reducing false positives and building reviewer trust. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hi Garry and team — thank you for open-sourcing gstack. I've been using it daily on a Rails monolith (payments, LLM integrations, POS terminals) and the
/reviewskill consistently catches things I would have missed. This is a fantastic tool.This PR proposes a targeted addition to
/reviewthat I believe would meaningfully improve its precision without sacrificing any of its existing coverage.The Problem: False Positives Erode Trust
After running
/reviewacross dozens of branches, I noticed a pattern: the two-pass checklist is excellent at finding issues, but CRITICAL findings go straight to Fix-First without independent verification. This means occasional false positives — a "race condition" that's actually guarded by a DB constraint, a "missing validation" that's handled in a before_action upstream, a flagged issue that predates the branch — reach the user labeled as critical.Each false positive teaches the developer to trust the reviewer a little less. Over time, this is the difference between a tool people rely on and one they skip. I think
/reviewdeserves better, because the detection quality is already there — it just needs a precision gate.Inspiration: Anthropic's Native Code Review Plugin
I compared gstack's
/reviewwith Anthropic's officialcode-reviewplugin (shipped inclaude-code-plugins). Their approach has an elegant mechanism that gstack currently lacks:The key insight is that the finder and the validator operate with different cognitive biases. The finder is primed to spot problems (confirmation bias toward flagging). The validator starts fresh with just the claim and the code, and must prove the claim true. This adversarial setup naturally filters false positives — the same principle as having a different person review your code review.
What This PR Adds
A new Step 4.9: Finding Validation Pass, inserted between the finding-producing steps (4, 4.5, 4.75) and Fix-First (Step 5). Three changes total:
1. New Step 4.9 — The Precision Gate
For each CRITICAL finding from Step 4, a parallel validation subagent is launched with:
Each subagent must independently confirm:
2. Three-Way Classification (an improvement over binary)
Rather than binary keep/discard, findings are classified as:
The UNCERTAIN category is intentional. "I can't prove it's a bug" is very different from "I proved it's not a bug" — especially for security findings. Downgrading preserves the signal without the false confidence of a CRITICAL label, and lets the developer make the final call.
3. Cost-Conscious Design
Not every finding needs validation. The step is selective:
The false positive checklist is adapted from Anthropic's proven list: pre-existing issues, intentionally correct patterns, pedantic nitpicks, linter-catchable issues, general quality concerns, and CLAUDE.md items silenced by lint-ignore comments.
Why I Think This Belongs in gstack
gstack already has the adversarial review in Step 5.7 (cross-model Codex + Claude synthesis). That catches issues the structured review misses. This validation pass does the complementary thing: it removes findings that shouldn't be there. Together, they give
/reviewboth high recall (catch everything) and high precision (only flag what's real).The addition is ~40 lines of Markdown, zero new dependencies, zero new binaries, and it respects the existing architecture — it slots cleanly between 4.75 and 5 without touching any other step's logic.
Changes
review/SKILL.md: New Step 4.9 between test coverage (4.75) and Fix-First (5)review/SKILL.md: Step 5 header notes that CRITICAL findings are pre-validatedreview/SKILL.md: Important Rules section adds validation-first principleTest Plan
/reviewon a branch with a known false positive (e.g., a "race condition" guarded by a DB constraint) — verify Step 4.9 REJECTS it/reviewon a branch with a real SQL injection — verify Step 4.9 VALIDATES it/reviewon a small diff (<3 CRITICAL findings) — verify validation subagents fire/reviewon a clean branch — verify no regression in the "No issues found" pathThank you for considering this. Happy to iterate on the approach if you'd prefer a different design. I genuinely appreciate the work you and the team have put into making this available to everyone.