Skip to content

docs: add esbuild benchmark results and pilot test#199

Open
jafreck wants to merge 1 commit intomainfrom
docs/benchmark-esbuild
Open

docs: add esbuild benchmark results and pilot test#199
jafreck wants to merge 1 commit intomainfrom
docs/benchmark-esbuild

Conversation

@jafreck
Copy link
Owner

@jafreck jafreck commented Mar 14, 2026

Summary

Add benchmark results and pilot test for Lore vs baseline evaluation on the esbuild codebase.

Changes

  • docs/benchmark-results-esbuild.md — Full benchmark report: aggregate metrics, per-task detail, tool usage breakdown, ground-truth corrections, and takeaways
  • tests/benchmark/pilot-esbuild.test.ts — 12 benchmark tasks targeting esbuild (callers, callees, blast radius, implementations, test mapping, complexity, dependency graph, cross-file consumers, circular deps, most-called functions, composite queries, deletion impact)

Key Results (12 tasks, 1 iteration each)

Metric Control Lore Delta
First-pass accuracy 25.0% 66.7% +41.7pp
Mean tokens 7,938 7,072 -10.9%
Mean tool calls 25.8 22.3 -14%
Lore tool usage 0% 83%

Note: 8 of 12 expected answers were fo> Note: 8 of 12 expected answers were fo> Note: 8 of 12 expected answers were fo> Note: 8 of 12 expected answers wert outputs and timing/token metrics remain valid.

- Add benchmark results for Lore vs baseline on esbuild codebase
- Add pilot-esbuild.test.ts with 12 benchmark tasks targeting esbuild
- Results show +41.7pp first-pass accuracy, -10.9% token usage with Lore
- 8 of 12 expected answers need ground-truth corrections (noted in results)
@codecov
Copy link

codecov bot commented Mar 14, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 87.78%. Comparing base (eaa10ac) to head (557685d).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #199   +/-   ##
=======================================
  Coverage   87.78%   87.78%           
=======================================
  Files          76       76           
  Lines        8610     8610           
  Branches     2708     2708           
=======================================
  Hits         7558     7558           
  Misses       1052     1052           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant