Context
From a PR review patterns audit across 7 MongoDB Agent Skills, testing/evidence gaps appeared in 8 instances across 4 PRs. The most common review comment was some form of "have you tested agent performance with and without this file?" — a question that should be answered by eval results, not reviewer intuition.
Issue #39 added support for recognizing the evals/ directory. This issue proposes going further: checking that eval results actually exist.
Proposed check
Add an opt-in warning-level check that verifies a skill's evals/ directory contains result files, not just test definitions.
Behavior
- Severity: warning (opt-in via flag, e.g.,
--require-eval-results)
- Check: If
evals/ directory exists, verify it contains at least one file matching a results pattern (e.g., *results*, *output*, *report*, *.json with results-like structure)
- If
evals/ exists but contains only definition/config files and no results: warn
- If
evals/ does not exist: warn (when opt-in flag is set)
Message
No eval results found in evals/ directory. Consider running evals and including results to demonstrate skill effectiveness.
Why opt-in
Not all workflows will have eval infrastructure set up. Making this opt-in (and eventually default in CI mode) allows gradual adoption.
Examples from PR reviews
| PR |
Issue |
| 6 |
"Have we tested agent performance with and without this file?" (asked for 3 reference files) |
| 5 |
"Have you found agents unable to adhere to patterns without this guidance?" |
| 7 |
"Did you find in testing that the skill needed this?" (asked twice) |
| 7 |
Meta: "I wonder if there is a way to indicate what information a skill needs for efficacy" |
Related
Builds on #39 (recognize evals/ directory). Part of a series of checks derived from PR review pattern analysis.
Context
From a PR review patterns audit across 7 MongoDB Agent Skills, testing/evidence gaps appeared in 8 instances across 4 PRs. The most common review comment was some form of "have you tested agent performance with and without this file?" — a question that should be answered by eval results, not reviewer intuition.
Issue #39 added support for recognizing the
evals/directory. This issue proposes going further: checking that eval results actually exist.Proposed check
Add an opt-in warning-level check that verifies a skill's
evals/directory contains result files, not just test definitions.Behavior
--require-eval-results)evals/directory exists, verify it contains at least one file matching a results pattern (e.g.,*results*,*output*,*report*,*.jsonwith results-like structure)evals/exists but contains only definition/config files and no results: warnevals/does not exist: warn (when opt-in flag is set)Message
No eval results found in evals/ directory. Consider running evals and including results to demonstrate skill effectiveness.Why opt-in
Not all workflows will have eval infrastructure set up. Making this opt-in (and eventually default in CI mode) allows gradual adoption.
Examples from PR reviews
Related
Builds on #39 (recognize
evals/directory). Part of a series of checks derived from PR review pattern analysis.