feat: add parallel shard policy benchmark matrix by flyingrobots · Pull Request #310 · flyingrobots/echo

flyingrobots · 2026-03-28T18:10:02Z

Summary

add explicit shard-assignment and delta-accumulation policy seams for parallel execution benchmarking
add the dedicated 1 worker = 1 shard = 1 delta comparison policy
expand parallel_baseline with a parallel_policy_matrix study across loads 100 / 1000 / 10000
export criterion results to repo-local JSON plus a static inline HTML report

Outputs

raw JSON: docs/benchmarks/parallel-policy-matrix.json
static report: docs/benchmarks/parallel-policy-matrix-inline.html
study notes: docs/benchmarks/PARALLEL_POLICY_MATRIX.md

Initial Read

On this machine, the best policy shifts with load:

100 items: static_per_worker / 1w
1000 items: static_per_worker / 4w
10000 items: static_per_worker / 8w

The dedicated 1 worker = 1 shard = 1 delta comparison policy is dramatically slower at every tested load, so it looks useful as a comparison extreme, not as a default direction.

Verification

cargo test -p warp-core parallel::exec::tests --message-format short
cargo clippy -p warp-core --lib -- -D warnings -D missing_docs
cargo bench -p warp-benches --bench parallel_baseline -- --list
cargo bench -p warp-benches --bench parallel_baseline -- parallel_policy_matrix
make bench-policy-export
pnpm exec prettier --check ...
pnpm exec markdownlint-cli2 ...

Summary by CodeRabbit

New Features
- Configurable parallel execution policies and a policy-matrix benchmark with baked/exportable results and an inline policy report tab.
Documentation
- Added policy-matrix guide and design-review notes; updated benchmark and baking guidance for the new bake/export workflow.
Chores
- Replaced Python bake with Rust-based bench commands, added Makefile targets and open/print behavior, and vendorized additional CSS assets for reports.
Tests
- Added/updated tests to validate policy variants, merge determinism, and artifact checks.

coderabbitai · 2026-03-28T18:10:16Z

Warning

Rate limit exceeded

@flyingrobots has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 21 minutes and 39 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 21 minutes and 39 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: bfb545d1-721b-4d4f-9dc7-1341730c320c

📥 Commits

Reviewing files that changed from the base of the PR and between c235096 and cf4dada.

📒 Files selected for processing (4)

crates/warp-benches/benches/parallel_baseline.rs
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
xtask/src/main.rs

📝 Walkthrough

Walkthrough

Adds configurable parallel execution policies and policy-aware entrypoints to warp-core; replaces the Python bench bake with an xtask-based bake/export/check pipeline that inlines vendor CSS; revamps the benchmarks dashboard into a two-tab view including a parallel-policy matrix and adds policy benchmark artifacts and docs.

Changes

Cohort / File(s)	Summary
Parallel execution API & impl `crates/warp-core/src/parallel/exec.rs`, `crates/warp-core/src/parallel/mod.rs`, `crates/warp-core/src/lib.rs`	Introduce `ShardAssignmentPolicy`, `DeltaAccumulationPolicy`, `ParallelExecutionPolicy` and policy-driven entrypoints `execute_parallel_with_policy` / `execute_parallel_sharded_with_policy`. Dispatch execution by policy variants; adjust empty-input semantics for dedicated-per-shard; add tests.
Benchmarks & benchmark data `crates/warp-benches/benches/parallel_baseline.rs`, `crates/warp-benches/benches/README.md`, `docs/benchmarks/parallel-policy-matrix.json`	Add `parallel_policy_matrix` benchmark sweeping loads/policies/workers, stable labeling and merge post-processing; include static policy-matrix JSON export and README updates.
Build tooling / xtask & Makefile `xtask/src/main.rs`, `xtask/Cargo.toml`, `Makefile`, `scripts/bench_bake.py`	Remove `scripts/bench_bake.py`; add `xtask bench` subcommands (Bake / PolicyExport / CheckArtifacts) and `bench-vendor` Make target; implement policy JSON export, inline HTML baking with vendor CSS, provenance/digest checks, and artifact validation. Add `sha2` dependency.
Dashboard & front-end assets `docs/benchmarks/index.html`, `docs/benchmarks/vendor/*`, `docs/benchmarks/vendor/.gitignore`	Rewrite dashboard to a two-tab UI (Core overview / Parallel policy matrix), add policy rendering pipeline, hash-based tab sync, centralized tooltip, and policy views. Add `open-props.min.css` and `normalize.dark.min.css` and un-ignore them.
Docs & design notes `docs/benchmarks/PARALLEL_POLICY_MATRIX.md`, `docs/plans/parallel-merge-and-footprint-design-review.md`, `docs/plans/parallel-merge-and-footprint-optimizations.md`, `docs/benchmarks/BENCHMARK_GUIDE.md`, `docs/benchmarks/RESERVE_BENCHMARK.md`	Add policy-matrix documentation and design-review; update bake integration guidance to use `xtask` and update benchmark guide examples.
Tests & test helper refactors `crates/warp-core/tests/...`	Centralize test helpers (`merged_ops`, `merge_conflict`) to standardize merge assertions and reduce unwrap panics.
Misc infra `.gitignore`, `docs/benchmarks/vendor/*.css`	Ignore Python cache files; add vendor CSS files (static assets) to repo.

Sequence Diagram(s)

sequenceDiagram
    participant Runner as Runner
    participant PolicyMux as PolicyRouter
    participant Executor as ShardExecutor
    participant Accum as DeltaAccumulator
    participant Merger as ResultMerger

    Runner->>PolicyMux: call execute_parallel_with_policy(items, workers, policy)
    PolicyMux->>Executor: configure assignment & worker plan
    Executor->>Executor: assign/claim shards (dynamic/static/dedicated)
    Executor->>Accum: emit TickDelta(s) (per-worker or per-shard)
    Accum->>Merger: deliver deltas
    Merger->>Merger: canonicalize, sort, deduplicate by key
    Merger-->>Runner: return unified Vec<WarpOp>

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

feat: developer CLI and provenance payload spec (PP-1) #289 — Adds/extends xtask bench subcommands and provenance-aware bake/export logic; overlaps xtask/Makefile changes.
rmg-core/scheduler: fail-fast drain; u32 radix histogram; keep PendingTx private #121 — Prior bench bake and Makefile targets that this change replaces (moves Python bake into xtask).
Graph BOAW #257 — Earlier changes to parallel execution surface; related to added policy-driven APIs and re-exports.

Suggested labels

tooling

Poem

Policies carve work across the cores,
Shards are claimed, then merged once more.
Bake the matrix, inline the style,
Tabs flip open—benchmarks compile. 📊

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: introducing a benchmark matrix for parallel shard policies (dynamic/static/dedicated) across multiple policy configurations and workload sizes.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/parallel-shard-policy-bench

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

echo/crates/warp-core/src/parallel/exec.rs

Lines 303 to 305 in f6cf00d

    
           if items.is_empty() { 
        
               // Can't use vec![TickDelta::new(); workers] because TickDelta doesn't impl Clone 
        
               return (0..workers).map(|_| TickDelta::new()).collect();

Preserve per-shard semantics for empty workloads

The empty-input fast path returns workers empty deltas before checking policy, so DeltaAccumulationPolicy::PerShard (including DedicatedPerShard) reports non-zero delta counts when items is empty. That contradicts the policy contract (“one delta per non-empty shard”) and can skew policy-matrix measurements or any caller logic that interprets delta count as shard activity.

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/warp-core/src/parallel/exec.rs (1)

303-306: ⚠️ Potential issue | 🟡 Minor

Respect PerShard semantics on the empty-input fast path.

Lines 303-306 always return workers empty deltas. For DeltaAccumulationPolicy::PerShard and ShardAssignmentPolicy::DedicatedPerShard, zero items means zero non-empty shards, so the correct result is Vec::new(). The current branch makes len() lie about the selected policy and will skew any future zero-load benchmark.

Suggested fix

     if items.is_empty() {
-        // Can't use vec![TickDelta::new(); workers] because TickDelta doesn't impl Clone
-        return (0..workers).map(|_| TickDelta::new()).collect();
+        return match (policy.assignment, policy.accumulation) {
+            (
+                ShardAssignmentPolicy::DynamicSteal | ShardAssignmentPolicy::StaticRoundRobin,
+                DeltaAccumulationPolicy::PerWorker,
+            ) => (0..workers).map(|_| TickDelta::new()).collect(),
+            _ => Vec::new(),
+        };
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/warp-core/src/parallel/exec.rs` around lines 303 - 306, The
empty-input fast path currently always returns (0..workers).map(|_|
TickDelta::new()).collect(), which violates DeltaAccumulationPolicy::PerShard
when ShardAssignmentPolicy::DedicatedPerShard (zero items should yield zero
shards); modify the items.is_empty() branch to check the accumulation/assignment
policies (DeltaAccumulationPolicy::PerShard and
ShardAssignmentPolicy::DedicatedPerShard) and return Vec::new() in that case,
otherwise keep returning workers empty TickDelta instances (still using
TickDelta::new() because TickDelta doesn't impl Clone).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 852-855: The loop that builds a fixture NodeId is truncating i
with `bytes[0] = i as u8` (in the for i in 0..count block) which triggers
clippy::cast_possible_truncation and will reuse IDs for count > 256; replace the
plain cast with an explicit checked conversion such as `u8::try_from(i)` and
handle the failure (e.g., propagate/return an error or call expect/panic with a
clear message) so the code fails fast instead of silently truncating when
constructing NodeId from the bytes array.
- Around line 882-893: The two test sites use expect() which is banned by the
lint profile; replace the expect calls around merge_deltas_ok(...) and
merge_deltas_ok(deltas) in this block so they explicitly match the Result and
panic! with the original error context on Err. Concretely, in the baseline
assignment (using merge_deltas_ok(execute_parallel_with_policy(...))) and inside
the for policy loop (merging deltas), replace the expect(...) calls with a match
that on Ok returns the merged value and on Err calls panic! with a descriptive
message including the error (preserve the existing messages "baseline merge
failed" and "policy merge failed" plus the error/debug info) so merge_deltas_ok,
execute_parallel_with_policy, baseline and the policies loop keep full failure
context without using expect().

In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md`:
- Around line 44-47: The two markdown links pointing to absolute machine-local
paths for parallel-policy-matrix.json and parallel-policy-matrix-inline.html
should be replaced with repo-relative paths (or relative links) so the docs work
across environments; update the link targets in
docs/benchmarks/PARALLEL_POLICY_MATRIX.md to something like
docs/benchmarks/parallel-policy-matrix.json and
docs/benchmarks/parallel-policy-matrix-inline.html (or use
./parallel-policy-matrix.json if colocated) and keep the same link text so the
rendered docs and CI artifacts resolve correctly.

In `@docs/plans/parallel-merge-and-footprint-design-review.md`:
- Around line 49-66: The doc incorrectly states the live merge path sorts by
(WarpOpKey, OpOrigin); update the text to split the live engine path from the
validation/test path and cite the correct functions: note that
execute_parallel_sharded() yields one TickDelta per worker,
engine_impl::merge_parallel_deltas() (live path) sorts only by WarpOpKey, while
parallel/merge.rs::merge_deltas() and delta_validate/test helpers add OpOrigin
and perform the global (WarpOpKey, OpOrigin) sort; also mention
TickDelta::into_parts_unsorted() explicitly to explain the unsorted worker
emission order so the subsequent k-way-merge proof obligations use the correct
key for each path.

In `@docs/plans/parallel-merge-and-footprint-optimizations.md`:
- Around line 11-18: The document contains a contradiction: the top-level
disposition downgrades k-way merge and shard-aware cross-shard footprint
skipping to conditional/hypothesis status, but later sections still assert
sorted-run assumptions and cross-shard independence as proven facts; update all
later references (search for terms "k-way merge", "canonical (WarpOpKey,
OpOrigin) order", and "shard-aware cross-shard footprint skipping") to use
conditional language (e.g., "if inputs can be proven/enforced to be sorted" and
"hypothesis / not proven-safe against default scheduler") and add a short note
linking back to the disposition that these optimizations require formal proof or
scheduler guarantees before being treated as implementation-ready.

In `@Makefile`:
- Around line 164-170: The Makefile target bench-policy-export currently runs
the export then unconditionally runs the optional formatter via "pnpm exec
prettier", causing the whole target to fail if pnpm is missing; update the
bench-policy-export target so that formatting is conditional (check for pnpm
existence before running prettier) or split the formatter into a separate target
(e.g., bench-policy-export-format) and have bench-policy-export only run the
python export command (scripts/bench_parallel_policy_bake.py) while the new
bench-policy-export-format invokes pnpm exec prettier on
docs/benchmarks/parallel-policy-matrix-inline.html when pnpm is available.

In `@scripts/bench_parallel_policy_bake.py`:
- Around line 127-139: The logging prints call out_path.relative_to(ROOT) which
raises ValueError for absolute paths; update bake_html to compute a display path
safely by attempting out_path.relative_to(ROOT) in a try/except and falling back
to out_path (or str(out_path)) for absolute paths before calling print; apply
the same safe-display change to the analogous JSON write/logging location (the
code that writes/prints the parallel-policy-matrix.json output) so both
bake_html and the JSON output use the same try/except fallback instead of
unconditionally calling relative_to(ROOT).
- Around line 127-137: The current bake_html function uses a brittle marker
string (marker) that expects single quotes and thus misses the template variant;
change the injection to use an explicit placeholder token (e.g. "<!--
POLICY_MATRIX_INLINE_DATA -->") instead of matching formatted source text:
update bake_html to look for that placeholder (using TEMPLATE and
build_inline_script) and inject the inline script there, and also add the same
placeholder into the docs/benchmarks/parallel-policy-matrix.html template at the
intended injection point so the branch reliably matches.
- Around line 31-64: The load_estimate function currently swallows parse/schema
errors and returns None, making malformed estimates.json indistinguishable from
a missing benchmark; change load_estimate so that if the file exists but parsing
or schema validation fails it raises a descriptive exception (e.g., ValueError
or re-raise the JSON error) instead of returning None, while still returning
None only when no estimates.json is present for any of the
("new","base","change") kinds; update the try/except around json.loads and
subsequent access (in function load_estimate) to let errors bubble up with a
clear message referencing the file path (p / "estimates.json") and ensure
callers of load_estimate treat exceptions as fatal export errors rather than
ignoring None results.

---

Outside diff comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 303-306: The empty-input fast path currently always returns
(0..workers).map(|_| TickDelta::new()).collect(), which violates
DeltaAccumulationPolicy::PerShard when ShardAssignmentPolicy::DedicatedPerShard
(zero items should yield zero shards); modify the items.is_empty() branch to
check the accumulation/assignment policies (DeltaAccumulationPolicy::PerShard
and ShardAssignmentPolicy::DedicatedPerShard) and return Vec::new() in that
case, otherwise keep returning workers empty TickDelta instances (still using
TickDelta::new() because TickDelta doesn't impl Clone).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 1d20286e-8995-44da-ad1b-8655ca7103dc

📥 Commits

Reviewing files that changed from the base of the PR and between 6e5cf4c and f6cf00d.

📒 Files selected for processing (13)

Makefile
crates/warp-benches/benches/README.md
crates/warp-benches/benches/parallel_baseline.rs
crates/warp-core/src/lib.rs
crates/warp-core/src/parallel/exec.rs
crates/warp-core/src/parallel/mod.rs
docs/benchmarks/PARALLEL_POLICY_MATRIX.md
docs/benchmarks/parallel-policy-matrix-inline.html
docs/benchmarks/parallel-policy-matrix.html
docs/benchmarks/parallel-policy-matrix.json
docs/plans/parallel-merge-and-footprint-design-review.md
docs/plans/parallel-merge-and-footprint-optimizations.md
scripts/bench_parallel_policy_bake.py

crates/warp-core/src/parallel/exec.rs

docs/benchmarks/PARALLEL_POLICY_MATRIX.md

docs/plans/parallel-merge-and-footprint-design-review.md

docs/plans/parallel-merge-and-footprint-optimizations.md

Makefile

scripts/bench_parallel_policy_bake.py

coderabbitai

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/benchmarks/parallel-policy-matrix-inline.html`:
- Around line 620-1011: The inline baked payload assigned to
window.__POLICY_MATRIX__ contains dead fields "path" and "series"; remove those
keys from every object in the array so each entry only includes policy, workers,
load, mean_ns, lb_ns, and ub_ns (i.e., strip "path" and "series" when generating
or emitting the inline JSON during the bake step that produces
window.__POLICY_MATRIX__). Ensure the bake/emit logic that serializes the
benchmark results drops those fields (or post-process the produced array before
embedding) so committed docs no longer contain repo-local target/criterion/...
paths or series values.

In `@docs/benchmarks/parallel-policy-matrix.html`:
- Around line 313-314: The code is inserting untrusted benchmark labels into the
DOM via item.innerHTML which opens XSS; change all places that interpolate
policy or workers into HTML (e.g., the block using item.innerHTML with
COLORS[policy] and the other sections mentioned around lines 440-460, 502-525,
571-600) to construct DOM nodes programmatically: create the swatch element with
document.createElement, set its style.background using COLORS[policy], create a
text node or set textContent for the label (policy/workers) and append children
to legend or table cells instead of using innerHTML; ensure no unescaped strings
are ever passed into innerHTML or inserted as HTML.
- Around line 269-270: The bake step fails because
scripts/bench_parallel_policy_bake.py relies on an exact formatted script
snippet ("<script>\n      const DATA_URL = 'parallel-policy-matrix.json';")
which no longer matches; change the HTML (parallel-policy-matrix.html) to
include an explicit, invariant placeholder token (e.g. <!--
POLICY_MATRIX_PLACEHOLDER --> or a single-line marker like
window.__POLICY_MATRIX_PLACEHOLDER__) where the payload should be injected, and
update scripts/bench_parallel_policy_bake.py to locate that placeholder token
instead of matching the formatted <script> + DATA_URL string; ensure the loader
still calls fetch(DATA_URL) and run() as before but the bake script replaces
only the placeholder so the inlined payload is present before run() is invoked.
- Around line 531-549: The overall scoreboard calculation (inside bySeries ->
overall) currently computes avgSlowdown and avgRank by dividing by rows.length,
which lets series missing some loads rank unfairly; before computing for each
series (label, rows) verify it covers the full set of expected loads (e.g.,
derive expectedLoads from bestByLoad.keys() or perLoadRanks load keys) by
comparing rows.map(r => r.load) to that set; if the series is incomplete either
skip adding it to overall or set its overall fields to N/A/null so it is
excluded from ranking; update the block that computes normalized, avgSlowdown,
avgRank (and selection of bestRow) to only run when the coverage check passes
(use symbols: bySeries, overall, caseLabel, bestByLoad, perLoadRanks, rows,
normalized, avgSlowdown, avgRank, bestRow).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 112cf0e8-ffb0-4c87-b1a0-f4605962b18e

📥 Commits

Reviewing files that changed from the base of the PR and between f6cf00d and 492f163.

📒 Files selected for processing (2)

docs/benchmarks/parallel-policy-matrix-inline.html
docs/benchmarks/parallel-policy-matrix.html

docs/benchmarks/parallel-policy-matrix-inline.html

docs/benchmarks/parallel-policy-matrix.html

coderabbitai

Actionable comments posted: 3

♻️ Duplicate comments (4)

docs/benchmarks/PARALLEL_POLICY_MATRIX.md (1)

44-47: ⚠️ Potential issue | 🟠 Major

Machine-local absolute paths break portability.

Lines 44-46 contain /Users/james/git/echo/... paths that resolve only on one workstation. Use repo-relative paths instead.

-- raw JSON: [parallel-policy-matrix.json](/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json)
-- unified static benchmarks page:
-  [report-inline.html](/Users/james/git/echo/docs/benchmarks/report-inline.html)
+- raw JSON: [parallel-policy-matrix.json](parallel-policy-matrix.json)
+- unified static benchmarks page:
+  [report-inline.html](report-inline.html)

As per coding guidelines: docs/**/*.md: Flag factual errors and stale cross-references. Do not flag stylistic preferences in prose.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md` around lines 44 - 47, Replace the
machine-local absolute paths to the JSON and HTML reports with repo-relative
links so the markdown is portable: update the references to
parallel-policy-matrix.json and report-inline.html to use relative paths (e.g.,
./benchmarks/parallel-policy-matrix.json and ./benchmarks/report-inline.html or
whatever the correct repo-relative location is) and keep the instruction to open
the "Parallel policy matrix" tab unchanged.

Makefile (1)

164-170: ⚠️ Potential issue | 🟡 Minor

pnpm dependency should be conditional to avoid breaking exports.

Line 170 unconditionally runs pnpm exec prettier, so bench-policy-export fails if pnpm isn't installed even though the core export (JSON generation + bake) completed successfully.

 bench-policy-export: vendor-d3
 	`@echo` "Exporting parallel policy matrix JSON..."
 	`@python3` scripts/bench_parallel_policy_bake.py \
 	  --json-out docs/benchmarks/parallel-policy-matrix.json
 	`@echo` "Baking unified inline report..."
 	`@python3` scripts/bench_bake.py --out docs/benchmarks/report-inline.html
-	`@pnpm` exec prettier --write docs/benchmarks/report-inline.html >/dev/null
+	`@if` command -v pnpm >/dev/null 2>&1; then \
+	  pnpm exec prettier --write docs/benchmarks/report-inline.html >/dev/null; \
+	else \
+	  echo "[bench-policy-export] pnpm not found; skipping prettier"; \
+	fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 164 - 170, The Makefile target bench-policy-export
currently runs pnpm exec prettier unconditionally (pnpm exec prettier --write
docs/benchmarks/report-inline.html) which breaks the target when pnpm is not
installed; update the bench-policy-export recipe to conditionally run the
prettier step by testing for pnpm (e.g., command -v pnpm >/dev/null) or by
making the pnpm invocation optional/fail-safe so that
scripts/bench_parallel_policy_bake.py and scripts/bench_bake.py still produce
docs/benchmarks/parallel-policy-matrix.json and
docs/benchmarks/report-inline.html without causing the make target to fail when
pnpm is absent. Ensure the conditional preserves the original log output and
exit status when prettier is skipped.

scripts/bench_parallel_policy_bake.py (2)

131-132: ⚠️ Potential issue | 🟡 Minor

relative_to(ROOT) crashes for absolute paths outside repository.

Line 132 will raise ValueError if --json-out is an absolute path like /tmp/parallel-policy-matrix.json (write succeeds, then logging crashes).

Proposed fix

+def display_path(path: Path) -> str:
+    try:
+        return str(path.relative_to(ROOT))
+    except ValueError:
+        return str(path)
+
+
 def main():
     # ... existing code ...
     json_out.write_text(json.dumps(payload, indent=2) + "\n")
-    print(f"[bench-parallel-policy] Wrote {json_out.relative_to(ROOT)}")
+    print(f"[bench-parallel-policy] Wrote {display_path(json_out)}")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/bench_parallel_policy_bake.py` around lines 131 - 132, The print line
using json_out.relative_to(ROOT) will raise ValueError for absolute paths
outside the repo (e.g., /tmp/...), so change logging to safely compute a
relative path: attempt Path.relative_to(ROOT) in a try/except or use
os.path.relpath to fall back to the absolute path; update the print call that
references json_out and ROOT to use the safe result (e.g., try
json_out.relative_to(ROOT) and on exception use json_out or
Path(os.path.relpath(json_out, ROOT))). Ensure you adjust the print statement
near where json_out is written so logging never crashes when --json-out is
outside the repository.

27-60: ⚠️ Potential issue | 🟠 Major

Silent failure on malformed benchmark data can corrupt results.

Lines 50-51 and 58-59 return None for parse errors, making malformed estimates.json indistinguishable from a missing benchmark. If one case has corrupt data, it vanishes from the matrix without any failure signal, potentially changing which policy appears to "win."

Proposed fix: fail loudly on existing but malformed files

 def load_estimate(bench_dir: Path):
     for kind in ("new", "base", "change"):
         p = bench_dir / kind / "estimates.json"
         if p.exists():
             try:
                 obj = json.loads(p.read_text())
                 mean = (
                     obj.get("mean", {}).get("point_estimate")
                     if isinstance(obj.get("mean"), dict)
                     else None
                 )
                 if mean is None and isinstance(obj.get("Mean"), dict):
                     mean = obj["Mean"].get("point_estimate")
                 lb = (
                     obj.get("mean", {})
                     .get("confidence_interval", {})
                     .get("lower_bound")
                 )
                 ub = (
                     obj.get("mean", {})
                     .get("confidence_interval", {})
                     .get("upper_bound")
                 )
                 if mean is None:
-                    return None
+                    raise RuntimeError(f"missing point_estimate in {p}")
                 return {
                     "path": str(p.relative_to(ROOT)),
                     "mean_ns": float(mean),
                     "lb_ns": float(lb) if lb is not None else None,
                     "ub_ns": float(ub) if ub is not None else None,
                 }
-            except (json.JSONDecodeError, KeyError, TypeError, ValueError):
-                return None
+            except (json.JSONDecodeError, KeyError, TypeError, ValueError) as exc:
+                raise RuntimeError(f"failed to parse {p}: {exc}") from exc
     return None

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@scripts/bench_parallel_policy_bake.py` around lines 27 - 60, The function
load_estimate currently swallows parse errors and returns None for both missing
and malformed estimates.json; update load_estimate so that it still returns None
when no estimates.json exists, but if p.exists() and parsing/validation fails
(JSONDecodeError, KeyError, TypeError, ValueError or missing required fields
like mean), raise an explicit exception (e.g., ValueError or RuntimeError) that
includes the relative path (str(p.relative_to(ROOT))) and the underlying error
message instead of returning None; locate the parsing block in load_estimate and
change the except handler to re-raise a descriptive error (or raise a new one)
so malformed files fail loudly while keeping the missing-file path behavior
unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/benchmarks/index.html`:
- Around line 399-420: Add keyboard arrow navigation to the tab set: attach a
keydown handler to elements with class "tab-button" (e.g., elements with id
"tab-overview" and "tab-policy") that listens for ArrowLeft and ArrowRight,
moves focus to the previous/next tab button, and updates aria-selected
(true/false) and focus accordingly while ensuring only the active tab is
focusable (manage tabindex or focus programmatically). Ensure the handler uses
aria-controls ("overview-view" / "policy-view") to keep the active view in sync
and prevents default scrolling on arrow keys.
- Around line 9-12: The page currently imports Open Props via the two `@import`
rules "@import 'https://unpkg.com/open-props';" and "@import
'https://unpkg.com/open-props/normalize.dark.min.css'", which breaks in
air-gapped/offline environments; replace those remote imports by vendoring the
Open Props files into the repo (e.g., open-props.css and normalize.dark.min.css)
and update the stylesheet imports in the document to reference the local files,
or implement a simple fallback that attempts the CDN import and on failure loads
the local copies (or add a build step that bundles these CSS assets into a
single local file); update references where the two `@import` lines appear so the
page will work offline.

In `@scripts/bench_parallel_policy_bake.py`:
- Around line 63-90: The path parsing in parse_case can silently misparse
unexpected names; add a validation step in collect_results that calls parse_case
and then verifies the returned "policy" and "workers" against explicit allowed
sets (e.g., expected_policies and expected_workers like {"dedicated_per_shard",
...} and {"1w","4w","8w","dedicated"}) and skip or log and ignore entries that
fail validation (return None or continue) so misnamed BenchmarkId entries don't
get silently accepted; update places that consume parse_case results to handle
None/invalid entries gracefully.

---

Duplicate comments:
In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md`:
- Around line 44-47: Replace the machine-local absolute paths to the JSON and
HTML reports with repo-relative links so the markdown is portable: update the
references to parallel-policy-matrix.json and report-inline.html to use relative
paths (e.g., ./benchmarks/parallel-policy-matrix.json and
./benchmarks/report-inline.html or whatever the correct repo-relative location
is) and keep the instruction to open the "Parallel policy matrix" tab unchanged.

In `@Makefile`:
- Around line 164-170: The Makefile target bench-policy-export currently runs
pnpm exec prettier unconditionally (pnpm exec prettier --write
docs/benchmarks/report-inline.html) which breaks the target when pnpm is not
installed; update the bench-policy-export recipe to conditionally run the
prettier step by testing for pnpm (e.g., command -v pnpm >/dev/null) or by
making the pnpm invocation optional/fail-safe so that
scripts/bench_parallel_policy_bake.py and scripts/bench_bake.py still produce
docs/benchmarks/parallel-policy-matrix.json and
docs/benchmarks/report-inline.html without causing the make target to fail when
pnpm is absent. Ensure the conditional preserves the original log output and
exit status when prettier is skipped.

In `@scripts/bench_parallel_policy_bake.py`:
- Around line 131-132: The print line using json_out.relative_to(ROOT) will
raise ValueError for absolute paths outside the repo (e.g., /tmp/...), so change
logging to safely compute a relative path: attempt Path.relative_to(ROOT) in a
try/except or use os.path.relpath to fall back to the absolute path; update the
print call that references json_out and ROOT to use the safe result (e.g., try
json_out.relative_to(ROOT) and on exception use json_out or
Path(os.path.relpath(json_out, ROOT))). Ensure you adjust the print statement
near where json_out is written so logging never crashes when --json-out is
outside the repository.
- Around line 27-60: The function load_estimate currently swallows parse errors
and returns None for both missing and malformed estimates.json; update
load_estimate so that it still returns None when no estimates.json exists, but
if p.exists() and parsing/validation fails (JSONDecodeError, KeyError,
TypeError, ValueError or missing required fields like mean), raise an explicit
exception (e.g., ValueError or RuntimeError) that includes the relative path
(str(p.relative_to(ROOT))) and the underlying error message instead of returning
None; locate the parsing block in load_estimate and change the except handler to
re-raise a descriptive error (or raise a new one) so malformed files fail loudly
while keeping the missing-file path behavior unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: a3ca12ef-7780-4adc-8c2f-c37dcf62d2f3

📥 Commits

Reviewing files that changed from the base of the PR and between 492f163 and 85aa10a.

📒 Files selected for processing (8)

.gitignore
Makefile
crates/warp-benches/benches/README.md
docs/benchmarks/PARALLEL_POLICY_MATRIX.md
docs/benchmarks/index.html
docs/benchmarks/report-inline.html
scripts/bench_bake.py
scripts/bench_parallel_policy_bake.py

docs/benchmarks/index.html

scripts/bench_parallel_policy_bake.py

coderabbitai

Actionable comments posted: 9

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

Makefile (1)
82-106: 🧹 Nitpick | 🔵 Trivial

Pin D3 to specific version for deterministic vendor snapshots.

D3 currently uses @7 (latest v7.x), while Open Props pins to @1.7.16. Unpinned versions drift between builds, breaking reproducibility for benchmarks. Pin D3 to @7.9.0.
Pin D3 version
-  curl -fsSL https://unpkg.com/d3@7/dist/d3.min.js -o docs/benchmarks/vendor/d3.v7.min.js; \
+  curl -fsSL https://unpkg.com/d3@7.9.0/dist/d3.min.js -o docs/benchmarks/vendor/d3.v7.min.js; \
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 82 - 106, Update the D3 download URL in the Makefile's
bench-vendor target to pin the D3 version to 7.9.0 for deterministic vendor
snapshots: replace the current unpkg reference that uses the floating `@7` tag
with an explicit `@7.9.0` (the curl invocation for d3 in the bench-vendor
target), leaving the rest of the logic and the vendor-d3 target unchanged.

♻️ Duplicate comments (1)

docs/benchmarks/PARALLEL_POLICY_MATRIX.md (1)
44-47: ⚠️ Potential issue | 🟠 Major

Absolute machine-local paths break cross-environment usage.

Lines 45 and 47 contain absolute paths (/Users/james/git/echo/...) that resolve only on the author's workstation. These links are broken in the repo browser, docs site, CI artifacts, and on other developers' machines. Replace with relative paths.
🔧 Proposed fix
-- raw JSON with provenance metadata:
-  [parallel-policy-matrix.json](/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json)
+  [parallel-policy-matrix.json](parallel-policy-matrix.json)
 - unified static benchmarks page:
-  [report-inline.html](/Users/james/git/echo/docs/benchmarks/report-inline.html)
+  [report-inline.html](report-inline.html)
   Open the `Parallel policy matrix` tab.
As per coding guidelines, Echo documentation must avoid stale cross-references and remain factually usable across environments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md` around lines 44 - 47, The markdown
currently links to absolute machine-local paths for the two artifacts; replace
those absolute URLs with repository-relative links to the files named
parallel-policy-matrix.json and report-inline.html (e.g.,
./parallel-policy-matrix.json and ./report-inline.html or simply
[parallel-policy-matrix.json](parallel-policy-matrix.json) and
[report-inline.html](report-inline.html)) so the links work across environments,
CI, and the docs site.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@docs/BENCHMARK_GUIDE.md`:
- Around line 136-146: The docs reference benchmarks that are missing from the
BENCH_CORE_GROUP_KEYS constant; update the constant BENCH_CORE_GROUP_KEYS to
include the missing benchmark keys (e.g., "reserve_independence" and
"parallel_policy_matrix" or any other documented keys) so it matches the actual
benchmarks (like reserve_independence) referenced in
docs/benchmarks/RESERVE_BENCHMARK.md and benches, or alternatively remove the
stale mentions of those keys from the guide so documentation and
BENCH_CORE_GROUP_KEYS stay consistent; edit the BENCH_CORE_GROUP_KEYS
declaration in xtask/src/main.rs to add the missing string entries or remove the
stale doc references accordingly.

In `@docs/benchmarks/index.html`:
- Around line 615-619: The start() function currently calls the async run()
without handling promise rejections, so any thrown error is swallowed; update
start() to handle run()'s promise (either make start() async and await run()
inside try/catch or call run().catch(...)) and in the catch block log the error
(console.error or existing logger) and update the UI state (replace the stale
"Loading..." text with an error message or hide the loader) so failures are
visible; locate the start() function and the async run() function to implement
the try/catch or .catch handling.
- Around line 1515-1518: The log scale domain for x (defined via d3.scaleLog())
can degenerate when loads has a single unique value; add a guard before creating
the domain to detect when loads[0] === loads[loads.length - 1] and expand the
domain slightly (e.g., multiply/divide by a small factor) or switch to a
linear/point fallback so domain([...]) is not identical; update the code around
the x scale creation (variable x and the loads array usage) to compute a
safeDomain that is used in .domain(safeDomain) instead of using loads[0] and
loads[loads.length - 1] directly.

In `@docs/benchmarks/vendor/open-props.min.css`:
- Line 1: The vendored CSS file docs/benchmarks/vendor/open-props.min.css is
missing a trailing newline (starts with :where(html){...} and ends without EOF
newline); fix by adding a single newline character at the end of that file so
tooling and diff viewers handle the file cleanly—no other changes to the
minified content required.

In `@Makefile`:
- Around line 177-180: The Makefile uses the macOS-only command open directly in
the bench-open-inline target (and the duplicate line above it) instead of the
cross-platform variable; update the bench-open-inline target and the preceding
invocation to use $(OPEN) (the variable defined earlier as OPEN) so the open
action delegates to the platform-detection logic already present in the
Makefile.

In `@xtask/src/main.rs`:
- Around line 994-1008: The parsing currently treats only specific worker
suffixes (e.g., 1w/4w/8w) so unknown counts like "16w" fall through and get
labelled as "dedicated"; update the logic in split_policy_case (and the branch
creating ParsedPolicyCase) to generically recognize a trailing "_<number>w" or
"-<number>w" suffix (e.g., via a regex or string strip) and return that
extracted workers string and the base policy name, and keep existing behavior
for no-suffix cases; also add a regression test that asserts a policy like
"static_per_worker_16w/1000" yields policy = "static_per_worker" (or the
intended base) and workers = "16w" to prevent future regressions (apply same fix
where similar parsing occurs around the other block noted).
- Around line 919-924: The code is embedding a wall-clock timestamp and local
machine identifiers into the benchmark payload via generated_at
(OffsetDateTime::now_utc().format(&Rfc3339)) and machine
(local_benchmark_machine_descriptor()), which causes churn and leaks hostnames;
change the payload construction so generated_at and machine are omitted by
default (or only included behind an explicit opt-in flag), keep git_sha
(git_short_head_sha()) for provenance, and preserve criterion_root_display
(display_repo_relative) as optional; update any serialization/structs that
expect these fields to accept None/absent values instead of always populating
generated_at and machine.
- Around line 1118-1158: The template matching is brittle because
inline_benchmark_vendor_styles currently searches for the full <link> literals
(BENCH_OPEN_PROPS_LINK_TAG / BENCH_NORMALIZE_DARK_LINK_TAG); change it to match
a stable semantic marker (e.g. the data-bench-inline attribute or an explicit
HTML comment) instead of exact attribute order/whitespace and pass the actual
template path through the error path so failures point at the provided
--template. Concretely: update replace_once_or_bail to accept an identifier for
the template path (e.g. template_path: &Path) and match on a short marker string
(like "data-bench-inline=\"open-props\"" and
"data-bench-inline=\"normalize-dark\"") when called from
inline_benchmark_vendor_styles, and modify the bail! message to include the
provided template_path (not the hardcoded docs/benchmarks/index.html) so errors
reference the real template.

---

Outside diff comments:
In `@Makefile`:
- Around line 82-106: Update the D3 download URL in the Makefile's bench-vendor
target to pin the D3 version to 7.9.0 for deterministic vendor snapshots:
replace the current unpkg reference that uses the floating `@7` tag with an
explicit `@7.9.0` (the curl invocation for d3 in the bench-vendor target),
leaving the rest of the logic and the vendor-d3 target unchanged.

---

Duplicate comments:
In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md`:
- Around line 44-47: The markdown currently links to absolute machine-local
paths for the two artifacts; replace those absolute URLs with
repository-relative links to the files named parallel-policy-matrix.json and
report-inline.html (e.g., ./parallel-policy-matrix.json and ./report-inline.html
or simply [parallel-policy-matrix.json](parallel-policy-matrix.json) and
[report-inline.html](report-inline.html)) so the links work across environments,
CI, and the docs site.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: f6a758b1-d2f7-401c-b136-ded3f5d14d11

📥 Commits

Reviewing files that changed from the base of the PR and between 85aa10a and f0af6bc.

📒 Files selected for processing (13)

Makefile
crates/warp-benches/benches/README.md
docs/BENCHMARK_GUIDE.md
docs/benchmarks/PARALLEL_POLICY_MATRIX.md
docs/benchmarks/RESERVE_BENCHMARK.md
docs/benchmarks/index.html
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
docs/benchmarks/vendor/.gitignore
docs/benchmarks/vendor/normalize.dark.min.css
docs/benchmarks/vendor/open-props.min.css
scripts/bench_bake.py
xtask/src/main.rs

💤 Files with no reviewable changes (1)

scripts/bench_bake.py

docs/BENCHMARK_GUIDE.md

docs/benchmarks/index.html

docs/benchmarks/vendor/open-props.min.css

Makefile

xtask/src/main.rs

coderabbitai

Actionable comments posted: 1

♻️ Duplicate comments (4)

xtask/src/main.rs (4)
1118-1142: ⚠️ Potential issue | 🟡 Minor

Report marker failures against the actual --template path.

This bail! still hardcodes docs/benchmarks/index.html, so cargo xtask bench bake --template ... points at the wrong file when marker replacement fails. Thread the caller's template path through inline_benchmark_vendor_styles() / replace_inline_link_tag() and use it here.

Also applies to: 1146-1156
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1118 - 1142, The error message uses a
hardcoded path instead of the actual template path, so change
inline_benchmark_vendor_styles to accept and propagate the caller's template
path (currently passed as template: &str) down into replace_inline_link_tag and
ensure replace_inline_link_tag uses that path in its bail!/context messages
instead of "docs/benchmarks/index.html"; specifically thread the template path
through the calls to replace_inline_link_tag from inline_benchmark_vendor_styles
and update replace_inline_link_tag's error reporting to reference the provided
template path (and adjust its signature if needed) so marker replacement
failures point at the real --template file.
919-924: ⚠️ Potential issue | 🟠 Major

Stop serializing volatile local provenance into the benchmark payload.

generated_at forces churn on every bake, and machine can still leak a workstation identifier because local_benchmark_machine_descriptor() folds HOSTNAME / COMPUTERNAME into label. The dashboard already treats both fields as optional, so default them to None (or gate them behind an explicit opt-in) and keep git_sha as the stable provenance field.

Based on learnings: Avoid introducing sources of nondeterminism without a mitigation plan.

Also applies to: 927-933
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 919 - 924, Stop serializing volatile local
provenance by removing the dynamic timestamp and local machine info: set
generated_at to None instead of calling OffsetDateTime::now_utc()/format and set
machine to None instead of calling local_benchmark_machine_descriptor(); keep
git_sha as Some(git_short_head_sha()?) as the stable provenance field and leave
criterion_root_display unchanged (or gate generated_at/machine behind an
explicit opt-in flag if you need them); make the same change for the other
occurrence referenced around the block handling generated_at/machine (also
applies to the later block using generated_at, machine,
local_benchmark_machine_descriptor).
994-1008: ⚠️ Potential issue | 🟠 Major

Parse worker counts generically instead of hardcoding 1w/4w/8w.

Any new worker count is silently mislabeled. static_per_worker_16w/1000 currently falls through the two-segment branch and is exported as policy = "static_per_worker_16w", workers = "dedicated". Parse a trailing _<number>w token generically and add a regression test for an unlisted count.
Suggested fix
 fn split_policy_case(policy_case: &str) -> Option<(String, String)> {
-    for suffix in ["_1w", "_4w", "_8w"] {
-        if let Some(policy) = policy_case.strip_suffix(suffix) {
-            return Some((policy.to_owned(), suffix.trim_start_matches('_').to_owned()));
-        }
-    }
-    None
+    let (policy, workers) = policy_case.rsplit_once('_')?;
+    let digits = workers.strip_suffix('w')?;
+    if digits.is_empty() || !digits.chars().all(|ch| ch.is_ascii_digit()) {
+        return None;
+    }
+    Some((policy.to_owned(), workers.to_owned()))
 }
Also applies to: 1019-1025
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 994 - 1008, The code currently only
recognizes specific worker tokens and mis-parses policies like
"static_per_worker_16w/1000"; update split_policy_case (the function that
returns Some((policy, workers))) to generically detect a trailing suffix of the
form _<digits>w (e.g. _16w) using a simple pattern/regex, strip that suffix into
policy (e.g. "static_per_worker") and set workers to the captured "<digits>w"
string instead of defaulting to "dedicated"; apply the same change to the second
occurrence referenced in the diff (lines ~1019-1025) and add a unit/regression
test that parses "static_per_worker_16w/1000" (or similar unlisted counts) to
assert policy == "static_per_worker" and workers == "16w".
937-965: ⚠️ Potential issue | 🟠 Major

Don't emit a valid-looking policy matrix when collection failed.

collect_policy_matrix_rows() returns an empty vector when the group directory is absent and silently drops unreadable cases with continue. run_bench_bake() / run_bench_policy_export() then overwrite the tracked JSON/HTML with an empty or partial matrix and exit 0. Make row collection return Result<Vec<PolicyMatrixRow>> and fail on a missing group, zero rows, or any discovered case load error.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 937 - 965, Change collect_policy_matrix_rows
to return Result<Vec<PolicyMatrixRow>, Box<dyn std::error::Error>> (or your
crate error type) instead of Vec, and make it fail fast: if
criterion_root.is_dir() is false return Err("missing group"), propagate
std::fs::read_dir errors instead of continuing, treat a failed
parse_policy_case(relative_dir) as an error (return Err) and propagate any
load_criterion_estimate(&path, repo_root) errors instead of skipping; after
traversal return Err if results.is_empty(). Update callers run_bench_bake and
run_bench_policy_export to handle the Result (bubble the error or exit nonzero)
so we never overwrite tracked JSON/HTML with an empty or partial matrix.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@xtask/src/main.rs`:
- Around line 1106-1114: The current injection logic silently does nothing if
the template contains neither BENCH_INLINE_DATA_MARKER nor a closing "</body>",
so change the block that handles html injection (the code using
html.contains(BENCH_INLINE_DATA_MARKER), html.replacen(...), and
html.replace("</body>", ...)) to detect the fallback-no-op case and return/fail
with a clear error instead of continuing; specifically, after checking for
BENCH_INLINE_DATA_MARKER and before or after attempting the "</body>"
replacement, verify whether the replacement actually occurred (or that html
contains either BENCH_INLINE_DATA_MARKER or "</body>") and if not, propagate an
Err or call the existing failure path for the bench bake command with a message
mentioning missing BENCH_INLINE_DATA_MARKER and "</body>" so the bake operation
exits with a hard error.

---

Duplicate comments:
In `@xtask/src/main.rs`:
- Around line 1118-1142: The error message uses a hardcoded path instead of the
actual template path, so change inline_benchmark_vendor_styles to accept and
propagate the caller's template path (currently passed as template: &str) down
into replace_inline_link_tag and ensure replace_inline_link_tag uses that path
in its bail!/context messages instead of "docs/benchmarks/index.html";
specifically thread the template path through the calls to
replace_inline_link_tag from inline_benchmark_vendor_styles and update
replace_inline_link_tag's error reporting to reference the provided template
path (and adjust its signature if needed) so marker replacement failures point
at the real --template file.
- Around line 919-924: Stop serializing volatile local provenance by removing
the dynamic timestamp and local machine info: set generated_at to None instead
of calling OffsetDateTime::now_utc()/format and set machine to None instead of
calling local_benchmark_machine_descriptor(); keep git_sha as
Some(git_short_head_sha()?) as the stable provenance field and leave
criterion_root_display unchanged (or gate generated_at/machine behind an
explicit opt-in flag if you need them); make the same change for the other
occurrence referenced around the block handling generated_at/machine (also
applies to the later block using generated_at, machine,
local_benchmark_machine_descriptor).
- Around line 994-1008: The code currently only recognizes specific worker
tokens and mis-parses policies like "static_per_worker_16w/1000"; update
split_policy_case (the function that returns Some((policy, workers))) to
generically detect a trailing suffix of the form _<digits>w (e.g. _16w) using a
simple pattern/regex, strip that suffix into policy (e.g. "static_per_worker")
and set workers to the captured "<digits>w" string instead of defaulting to
"dedicated"; apply the same change to the second occurrence referenced in the
diff (lines ~1019-1025) and add a unit/regression test that parses
"static_per_worker_16w/1000" (or similar unlisted counts) to assert policy ==
"static_per_worker" and workers == "16w".
- Around line 937-965: Change collect_policy_matrix_rows to return
Result<Vec<PolicyMatrixRow>, Box<dyn std::error::Error>> (or your crate error
type) instead of Vec, and make it fail fast: if criterion_root.is_dir() is false
return Err("missing group"), propagate std::fs::read_dir errors instead of
continuing, treat a failed parse_policy_case(relative_dir) as an error (return
Err) and propagate any load_criterion_estimate(&path, repo_root) errors instead
of skipping; after traversal return Err if results.is_empty(). Update callers
run_bench_bake and run_bench_policy_export to handle the Result (bubble the
error or exit nonzero) so we never overwrite tracked JSON/HTML with an empty or
partial matrix.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 27b01826-17c6-4657-9782-671fe0aca4bc

📥 Commits

Reviewing files that changed from the base of the PR and between f0af6bc and fb7efc9.

📒 Files selected for processing (4)

docs/benchmarks/index.html
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
xtask/src/main.rs

xtask/src/main.rs

coderabbitai

♻️ Duplicate comments (8)

Makefile (2)

179-183: ⚠️ Potential issue | 🟡 Minor

Use $(OPEN) instead of hardcoded open in inline-report targets.

Line 180 and Line 183 bypass the cross-platform launcher defined at Line 112–114, so these targets fail on non-macOS systems.

Proposed fix

 	`@echo` "Opening inline report..."
-	`@open` docs/benchmarks/report-inline.html
+	`@if` [ -n "$(OPEN)" ]; then \
+	  $(OPEN) "docs/benchmarks/report-inline.html" >/dev/null 2>&1 || echo "Open file: docs/benchmarks/report-inline.html" ; \
+	else \
+	  echo "Open file: docs/benchmarks/report-inline.html" ; \
+	fi

 bench-open-inline:
-	`@open` docs/benchmarks/report-inline.html
+	`@if` [ -n "$(OPEN)" ]; then \
+	  $(OPEN) "docs/benchmarks/report-inline.html" >/dev/null 2>&1 || echo "Open file: docs/benchmarks/report-inline.html" ; \
+	else \
+	  echo "Open file: docs/benchmarks/report-inline.html" ; \
+	fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 179 - 183, Replace hardcoded open invocations in the
inline report targets with the cross-platform launcher variable so targets use
the shared OPEN variable; update both occurrences where "open
docs/benchmarks/report-inline.html" is used (the implicit inline-report command
and the bench-open-inline target) to use $(OPEN) instead of the literal "open"
so non-macOS systems respect the launcher defined earlier (OPEN).

185-195: ⚠️ Potential issue | 🟡 Minor

Do not make policy export depend on pnpm availability.

Line 194 can fail the whole bench-policy-export target even when JSON/HTML export already succeeded.

Proposed fix

 	`@cargo` xtask bench check-artifacts \
 	  --html docs/benchmarks/report-inline.html \
 	  --json docs/benchmarks/parallel-policy-matrix.json
-	`@pnpm` exec prettier --write docs/benchmarks/report-inline.html >/dev/null
+	`@if` command -v pnpm >/dev/null 2>&1; then \
+	  pnpm exec prettier --write docs/benchmarks/report-inline.html >/dev/null; \
+	else \
+	  echo "[bench-policy-export] pnpm not found; skipping prettier"; \
+	fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 185 - 195, The bench-policy-export Makefile target
should not fail if pnpm/prettier is absent; change the final formatting step
(the line invoking "@pnpm exec prettier --write
docs/benchmarks/report-inline.html >/dev/null") to be non-fatal so JSON/HTML
exports still count as success—for example, guard the command by checking pnpm
exists, or prefix it with a dash to ignore failure (e.g., "-@pnpm ...") or
append "|| true" so the bench-policy-export target completes even when
pnpm/prettier is unavailable.

docs/benchmarks/PARALLEL_POLICY_MATRIX.md (1)

48-51: ⚠️ Potential issue | 🟠 Major

Replace machine-local absolute links with repo-relative links.

Line 49 and Line 51 currently point to a local filesystem path, which breaks in repo browsers, CI artifacts, and other machines.

Proposed fix

 - raw JSON with provenance metadata:
-  [parallel-policy-matrix.json](/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json)
+  [parallel-policy-matrix.json](parallel-policy-matrix.json)
 - unified static benchmarks page:
-  [report-inline.html](/Users/james/git/echo/docs/benchmarks/report-inline.html)
+  [report-inline.html](report-inline.html)

As per coding guidelines, documentation under docs/**/*.md must avoid stale cross-references and remain factually usable across environments.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md` around lines 48 - 51, Replace the
machine-local absolute filesystem links with repo-relative paths: update the
references to
"/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json" and
"/Users/james/git/echo/docs/benchmarks/report-inline.html" so they point to
"docs/benchmarks/parallel-policy-matrix.json" and
"docs/benchmarks/report-inline.html" (or simply "parallel-policy-matrix.json"
and "report-inline.html" if relative to this markdown file) in the
PARALLEL_POLICY_MATRIX.md content to ensure links work in repo browsers and CI.

xtask/src/main.rs (5)

1172-1196: ⚠️ Potential issue | 🟡 Minor

Report the actual template that failed marker replacement.

replace_inline_link_tag() always blames docs/benchmarks/index.html, even when --template points somewhere else. When a custom template is missing a data-bench-inline=... marker, the operator gets sent to the wrong file. Thread the real template_path through the style inliner and use it in the error.

Suggested fix

-fn inline_benchmark_vendor_styles(template: &str, repo_root: &Path) -> Result<String> {
+fn inline_benchmark_vendor_styles(
+    template: &str,
+    template_path: &Path,
+    repo_root: &Path,
+) -> Result<String> {
@@
     html = replace_inline_link_tag(
         &html,
         BENCH_OPEN_PROPS_INLINE_MARKER,
         &format!(
             "<style data-bench-inline=\"open-props\">\n{}\n</style>",
             open_props.trim()
         ),
+        template_path,
         repo_root,
     )?;
@@
     html = replace_inline_link_tag(
         &html,
         BENCH_NORMALIZE_DARK_INLINE_MARKER,
         &format!(
             "<style data-bench-inline=\"normalize-dark\">\n{}\n</style>",
             normalize_dark.trim()
         ),
+        template_path,
         repo_root,
     )?;
     Ok(html)
 }
@@
 fn replace_inline_link_tag(
     haystack: &str,
     marker: &str,
     replacement: &str,
+    template_path: &Path,
     repo_root: &Path,
 ) -> Result<String> {
     let Some(marker_index) = haystack.find(marker) else {
         bail!(
             "benchmark template is missing expected marker `{marker}` while baking from {}",
-            display_repo_relative(Path::new("docs/benchmarks/index.html"), repo_root)
+            display_repo_relative(template_path, repo_root)
         );
     };

Also applies to: 1200-1210

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1172 - 1196, The error message for missing
inline markers currently always references docs/benchmarks/index.html; modify
inline_benchmark_vendor_styles to accept and pass the actual template path
(e.g., a template_path: &Path or &str) into replace_inline_link_tag so
replace_inline_link_tag can include that real template_path in its Err context
instead of the hardcoded docs/benchmarks/index.html; update the calls that
construct the replacement for BENCH_OPEN_PROPS_INLINE_MARKER and
BENCH_NORMALIZE_DARK_INLINE_MARKER to pass through the template_path and ensure
replace_inline_link_tag's error messages use that template_path when reporting a
missing marker.

1158-1168: ⚠️ Potential issue | 🟠 Major

Fail if the template has no payload injection point.

If the template contains neither BENCH_INLINE_DATA_MARKER nor </body>, html.replace("</body>", ...) is a no-op and bench bake still writes HTML without any embedded benchmark payload. Bail when neither replacement path matches.

Suggested fix

     if html.contains(BENCH_INLINE_DATA_MARKER) {
         html = html.replacen(
             BENCH_INLINE_DATA_MARKER,
             &(inject + BENCH_INLINE_DATA_MARKER),
             1,
         );
-    } else {
-        html = html.replace("</body>", &(inject + "</body>"));
+    } else if html.contains("</body>") {
+        html = html.replacen("</body>", &(inject + "</body>"), 1);
+    } else {
+        bail!(
+            "benchmark template is missing both `{}` and `</body>`",
+            BENCH_INLINE_DATA_MARKER
+        );
     }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1158 - 1168, The current injection logic can
silently fail when the template contains neither BENCH_INLINE_DATA_MARKER nor a
</body> tag; after calling inline_benchmark_vendor_styles and
build_benchmark_inline_script (and before doing html.replacen/replace),
explicitly check whether html.contains(BENCH_INLINE_DATA_MARKER) or
html.contains("</body>") and if neither is true return an Err (or bail) with a
clear message (e.g., "template has no payload injection point") so the caller
(bench bake) fails instead of producing HTML without the payload; update the
function that contains this logic to propagate the Result accordingly.

817-825: ⚠️ Potential issue | 🟠 Major

Stop baking wall-clock time and local host identity into tracked benchmark artifacts.

baked_at guarantees churn on every bake, and local_benchmark_machine_descriptor() can leak the workstation name twice (hostname and label) into both the committed JSON and the inlined HTML. Line 12 explicitly says to avoid timestamp churn where possible. Keep baked_git_sha for provenance, but make baked_at and machine opt-in or omit them by default.

Also applies to: 973-987, 1262-1283

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 817 - 825, PolicyMatrixPayload currently
bakes wall-clock time and local host identity into artifacts via the baked_at
and machine fields and by calling local_benchmark_machine_descriptor(); make
these opt-in and omitted by default by adding #[serde(skip_serializing_if =
"Option::is_none")] to the baked_at and machine fields on struct
PolicyMatrixPayload, stop populating baked_at and machine unless an explicit
opt-in flag is passed (e.g., --include-timestamps or --include-machine), and
only call local_benchmark_machine_descriptor() when that flag is set; keep
baked_git_sha populated as-is for provenance.

1048-1079: ⚠️ Potential issue | 🟠 Major

Parse worker suffixes generically instead of hardcoding 1w/4w/8w.

dynamic_per_worker_16w/1000 currently falls through the two-segment branch and is exported as policy = "dynamic_per_worker_16w", workers = "dedicated". That silently corrupts the matrix as soon as a new worker count is added. Parse the trailing _<digits>w suffix generically and add a regression test for an unlisted worker count.

Suggested fix

 fn split_policy_case(policy_case: &str) -> Option<(String, String)> {
-    for suffix in ["_1w", "_4w", "_8w"] {
-        if let Some(policy) = policy_case.strip_suffix(suffix) {
-            return Some((policy.to_owned(), suffix.trim_start_matches('_').to_owned()));
-        }
-    }
-    None
+    let (policy, workers) = policy_case.rsplit_once('_')?;
+    let digits = workers.strip_suffix('w')?;
+    if digits.is_empty() || !digits.chars().all(|ch| ch.is_ascii_digit()) {
+        return None;
+    }
+    Some((policy.to_owned(), workers.to_owned()))
 }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1048 - 1079, split_policy_case currently only
recognizes hardcoded "_1w/_4w/_8w" suffixes; change it to detect any trailing
"_<digits>w" generically by finding the last '_' in policy_case, extracting the
suffix, verifying it ends with 'w' and that the substring before 'w' is all
digits, then return (policy_without_suffix.to_owned(), suffix.to_owned()) so
workers becomes e.g. "16w"; update split_policy_case to use this logic so
ParsedPolicyCase branches handle arbitrary worker counts and add a regression
test for an unlisted count like "dynamic_per_worker_16w/1000".

991-1019: ⚠️ Potential issue | 🟠 Major

Do not write a valid-looking payload when collection failed.

!criterion_root.is_dir() returns an empty matrix, read_dir failures are skipped, entries.flatten() discards per-entry I/O errors, and unreadable estimates.json files are ignored. bench bake / bench policy-export can therefore overwrite the tracked artifacts with an empty or partial policy matrix and still exit 0. Make collect_policy_matrix_rows() fallible and fail when no rows are produced or when a discovered case cannot be loaded.

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 991 - 1019, collect_policy_matrix_rows
currently swallows errors and returns an empty/partial matrix; change it to a
fallible API (e.g., fn collect_policy_matrix_rows(...) ->
Result<Vec<PolicyMatrixRow>, anyhow::Error>) and propagate failures instead of
continuing: return Err when criterion_root is not a directory, propagate
read_dir errors rather than skipping entries, surface per-entry errors (don't
use entries.flatten()), fail if strip_prefix or parse_policy_case returns None,
and propagate load_criterion_estimate errors instead of ignoring them; also
return Err if the final Vec is empty so callers (e.g., bench bake /
policy-export) cannot overwrite artifacts with an empty matrix and update
callers to handle the Result.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md`:
- Around line 48-51: Replace the machine-local absolute filesystem links with
repo-relative paths: update the references to
"/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json" and
"/Users/james/git/echo/docs/benchmarks/report-inline.html" so they point to
"docs/benchmarks/parallel-policy-matrix.json" and
"docs/benchmarks/report-inline.html" (or simply "parallel-policy-matrix.json"
and "report-inline.html" if relative to this markdown file) in the
PARALLEL_POLICY_MATRIX.md content to ensure links work in repo browsers and CI.

In `@Makefile`:
- Around line 179-183: Replace hardcoded open invocations in the inline report
targets with the cross-platform launcher variable so targets use the shared OPEN
variable; update both occurrences where "open
docs/benchmarks/report-inline.html" is used (the implicit inline-report command
and the bench-open-inline target) to use $(OPEN) instead of the literal "open"
so non-macOS systems respect the launcher defined earlier (OPEN).
- Around line 185-195: The bench-policy-export Makefile target should not fail
if pnpm/prettier is absent; change the final formatting step (the line invoking
"@pnpm exec prettier --write docs/benchmarks/report-inline.html >/dev/null") to
be non-fatal so JSON/HTML exports still count as success—for example, guard the
command by checking pnpm exists, or prefix it with a dash to ignore failure
(e.g., "-@pnpm ...") or append "|| true" so the bench-policy-export target
completes even when pnpm/prettier is unavailable.

In `@xtask/src/main.rs`:
- Around line 1172-1196: The error message for missing inline markers currently
always references docs/benchmarks/index.html; modify
inline_benchmark_vendor_styles to accept and pass the actual template path
(e.g., a template_path: &Path or &str) into replace_inline_link_tag so
replace_inline_link_tag can include that real template_path in its Err context
instead of the hardcoded docs/benchmarks/index.html; update the calls that
construct the replacement for BENCH_OPEN_PROPS_INLINE_MARKER and
BENCH_NORMALIZE_DARK_INLINE_MARKER to pass through the template_path and ensure
replace_inline_link_tag's error messages use that template_path when reporting a
missing marker.
- Around line 1158-1168: The current injection logic can silently fail when the
template contains neither BENCH_INLINE_DATA_MARKER nor a </body> tag; after
calling inline_benchmark_vendor_styles and build_benchmark_inline_script (and
before doing html.replacen/replace), explicitly check whether
html.contains(BENCH_INLINE_DATA_MARKER) or html.contains("</body>") and if
neither is true return an Err (or bail) with a clear message (e.g., "template
has no payload injection point") so the caller (bench bake) fails instead of
producing HTML without the payload; update the function that contains this logic
to propagate the Result accordingly.
- Around line 817-825: PolicyMatrixPayload currently bakes wall-clock time and
local host identity into artifacts via the baked_at and machine fields and by
calling local_benchmark_machine_descriptor(); make these opt-in and omitted by
default by adding #[serde(skip_serializing_if = "Option::is_none")] to the
baked_at and machine fields on struct PolicyMatrixPayload, stop populating
baked_at and machine unless an explicit opt-in flag is passed (e.g.,
--include-timestamps or --include-machine), and only call
local_benchmark_machine_descriptor() when that flag is set; keep baked_git_sha
populated as-is for provenance.
- Around line 1048-1079: split_policy_case currently only recognizes hardcoded
"_1w/_4w/_8w" suffixes; change it to detect any trailing "_<digits>w"
generically by finding the last '_' in policy_case, extracting the suffix,
verifying it ends with 'w' and that the substring before 'w' is all digits, then
return (policy_without_suffix.to_owned(), suffix.to_owned()) so workers becomes
e.g. "16w"; update split_policy_case to use this logic so ParsedPolicyCase
branches handle arbitrary worker counts and add a regression test for an
unlisted count like "dynamic_per_worker_16w/1000".
- Around line 991-1019: collect_policy_matrix_rows currently swallows errors and
returns an empty/partial matrix; change it to a fallible API (e.g., fn
collect_policy_matrix_rows(...) -> Result<Vec<PolicyMatrixRow>, anyhow::Error>)
and propagate failures instead of continuing: return Err when criterion_root is
not a directory, propagate read_dir errors rather than skipping entries, surface
per-entry errors (don't use entries.flatten()), fail if strip_prefix or
parse_policy_case returns None, and propagate load_criterion_estimate errors
instead of ignoring them; also return Err if the final Vec is empty so callers
(e.g., bench bake / policy-export) cannot overwrite artifacts with an empty
matrix and update callers to handle the Result.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 7b94d08f-fd36-4cc7-b950-06381ecfd104

📥 Commits

Reviewing files that changed from the base of the PR and between f0af6bc and 2f4d2f0.

📒 Files selected for processing (9)

Makefile
crates/warp-benches/Cargo.toml
crates/warp-benches/benches/README.md
crates/warp-benches/benches/parallel_baseline.rs
docs/benchmarks/PARALLEL_POLICY_MATRIX.md
docs/benchmarks/index.html
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
xtask/src/main.rs

coderabbitai

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)

crates/warp-core/tests/parallel_merge_warpopkey.rs (1)
493-494: 🧹 Nitpick | 🔵 Trivial

Bare .unwrap() usage here is fine but inconsistent with new pattern.

Lines 493-494 still use .unwrap() (via .position(...).unwrap()), which is the correct approach for test code. The refactoring to avoid .expect() elsewhere in this file is inconsistent.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/warp-core/tests/parallel_merge_warpopkey.rs` around lines 493 - 494,
Replace the bare .unwrap() calls on the iterator position lookups for delete_idx
and upsert_idx with explicit expect messages to match the test error-handling
style: change result_ops.iter().position(|op| op == &delete_op).unwrap() and the
upsert equivalent to use .expect("delete_op not found in result_ops") and
.expect("upsert_op not found in result_ops") respectively so the failure points
(delete_idx, upsert_idx, result_ops, delete_op, upsert_op) produce clear
messages when the test fails.
crates/warp-core/src/parallel/exec.rs (1)
303-326: ⚠️ Potential issue | 🟠 Major

DedicatedPerShard does not honor workers, and the empty fast path lies about that.

Lines 323-326 route dedicated mode straight to execute_dedicated_per_shard(view, &shards), so workers never affects non-empty execution. But Lines 303-306 still return workers empty deltas before dispatch. That makes any 1w/4w/8w comparison for this policy semantically misleading, and it also violates the PerShard contract on empty workloads. Split dedicated mode into a workerless path (or exclude it from the worker matrix), and return Vec::new() when there are zero non-empty shards.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/warp-core/src/parallel/exec.rs` around lines 303 - 326, The empty
fast-path and dispatch need to respect DedicatedPerShard: change the initial
empty-items branch and the match for ShardAssignmentPolicy::DedicatedPerShard so
that dedicated mode is treated workerless — after partition_into_shards(items)
if shards.is_empty() return Vec::new() for DedicatedPerShard (instead of
returning a vec sized by workers), and route DedicatedPerShard to
execute_dedicated_per_shard without relying on the workers parameter;
alternatively exclude DedicatedPerShard from the worker-matrix by checking
policy.assignment before returning the workers-sized empty deltas so
PerShard/DedicatedPerShard semantics are preserved (use symbols
partition_into_shards, execute_dedicated_per_shard, and the match on
policy.assignment/policy.accumulation to implement this).

♻️ Duplicate comments (6)

Makefile (1)
193-202: ⚠️ Potential issue | 🟡 Minor

Don’t make export success depend on optional pnpm.

cargo xtask bench policy-export and cargo xtask bench bake can both succeed, then the unconditional formatter flips the whole target red. Gate pnpm behind an availability check or move formatting into a separate target.
Proposed fix
-	`@pnpm` exec prettier --write docs/benchmarks/report-inline.html >/dev/null
+	`@if` command -v pnpm >/dev/null 2>&1; then \
+	  pnpm exec prettier --write docs/benchmarks/report-inline.html >/dev/null; \
+	else \
+	  echo "[bench-policy-export] pnpm not found; skipping prettier"; \
+	fi
#!/bin/bash
set -euo pipefail
make -n bench-policy-export | sed -n '1,80p'
if command -v pnpm >/dev/null 2>&1; then
  printf 'pnpm present\n'
else
  printf 'pnpm missing\n'
fi
Expected: the dry-run still shows an unconditional pnpm exec prettier ...; if pnpm is missing, the target fails after the export/bake commands.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 193 - 202, The bench-policy-export make target
currently runs `pnpm exec prettier --write docs/benchmarks/report-inline.html`
unconditionally which makes the whole target fail when pnpm is absent; modify
the `bench-policy-export` recipe (target name: bench-policy-export) to gate the
formatter by checking for pnpm availability (e.g., run `command -v pnpm` or
similar) before invoking `pnpm exec prettier ...`, or alternatively move the
formatting step into a separate target (e.g., bench-policy-export-format) and
call it optionally so `cargo xtask bench policy-export` and `cargo xtask bench
bake` can succeed without pnpm.
docs/benchmarks/PARALLEL_POLICY_MATRIX.md (1)
48-51: ⚠️ Potential issue | 🟠 Major

Replace machine-local links with repo-relative paths.

These /Users/james/... targets only resolve on one workstation. They are broken in the repo browser, docs site, and CI artifacts.
Proposed fix
-  [parallel-policy-matrix.json](/Users/james/git/echo/docs/benchmarks/parallel-policy-matrix.json)
+  [parallel-policy-matrix.json](parallel-policy-matrix.json)
 ...
-  [report-inline.html](/Users/james/git/echo/docs/benchmarks/report-inline.html)
+  [report-inline.html](report-inline.html)
As per coding guidelines, `docs/**/*.md`: “Flag factual errors and stale cross-references.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md` around lines 48 - 51, The markdown
contains machine-local absolute paths for the two artifact links (the lines
referencing parallel-policy-matrix.json and report-inline.html); update those
link targets to repo-relative paths (e.g., point to
docs/benchmarks/parallel-policy-matrix.json and
docs/benchmarks/report-inline.html or the correct relative location in the
repository) so they resolve in the repo browser, docs site, and CI
artifacts—edit the two link hrefs in PARALLEL_POLICY_MATRIX.md to use the
repo-relative paths instead of /Users/james/...
xtask/src/main.rs (4)
1212-1216: 🧹 Nitpick | 🔵 Trivial

Use the caller’s template path in the error message.

This always blames docs/benchmarks/index.html, even when --template points somewhere else. Thread the actual template path through inline_benchmark_vendor_styles / replace_inline_link_tag so custom-template failures are actionable.

As per coding guidelines, xtask/**: “Review for correctness and usability, not for performance micro-optimizations.”
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1212 - 1216, The error message that reports a
missing marker currently always references "docs/benchmarks/index.html"; update
the call path so the actual template path provided by the caller is threaded
into inline_benchmark_vendor_styles and replace_inline_link_tag and used in the
bail! message instead of the hardcoded path. Concretely, add a template_path
parameter (or carry the existing template argument) through
inline_benchmark_vendor_styles -> replace_inline_link_tag, and change the bail!
invocation that constructs the message around haystack.find(marker) to include
that template_path (or its display) so failures reflect the real --template
input.
991-1019: ⚠️ Potential issue | 🟠 Major

Fail the export on empty or partial policy matrices.

Missing target/criterion/parallel_policy_matrix returns Vec::new(), and unreadable cases are dropped with continue. cargo xtask bench policy-export still writes a valid-looking payload and exits 0, which can overwrite committed docs with incomplete data. Return Result<Vec<_>> here, bail on a missing group or unreadable discovered case, and add a regression test for both paths.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 991 - 1019, The function
collect_policy_matrix_rows currently returns an empty Vec and silently skips
unreadable cases; change its signature to return Result<Vec<PolicyMatrixRow>,
Box<dyn std::error::Error>> (or your crate's Error type) so failures are
propagated, use early Err for a missing or unreadable criterion_root instead of
returning Vec::new(), replace the internal continue paths (where
parse_policy_case(...) or load_criterion_estimate(...) fail) with error returns
(or map_err + ?), and update any callers (e.g., the bench policy-export flow) to
handle the Result and fail the export on error; add regression tests that assert
export fails when the policy matrix dir is missing and when a discovered case is
unreadable.
1166-1174: ⚠️ Potential issue | 🟠 Major

Hard-fail when the template has no data injection point.

If a custom --template contains neither BENCH_INLINE_DATA_MARKER nor </body>, the fallback is a no-op and bench bake still emits an HTML file with no benchmark payload. Switch the fallback to replacen plus an explicit error, and add a regression test for the missing-marker case.
Proposed fix
-    } else {
-        html = html.replace("</body>", &(inject + "</body>"));
+    } else if html.contains("</body>") {
+        html = html.replacen("</body>", &(inject + "</body>"), 1);
+    } else {
+        bail!(
+            "benchmark template is missing both `{}` and `</body>`",
+            BENCH_INLINE_DATA_MARKER
+        );
     }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1166 - 1174, The current fallback uses
html.replace("</body>", ...) which silently no-ops if the template lacks both
BENCH_INLINE_DATA_MARKER and </body>; change the else branch to use
html.replacen("</body>", &(inject + "</body>"), 1) and then check if the
replacement actually occurred (e.g., compare before/after or inspect the return)
and hard-fail with a clear error if no injection point was found; update the
logic surrounding BENCH_INLINE_DATA_MARKER, inject, and html to perform the
replacen + explicit error path and add a regression test for the missing-marker
case to assert that bench bake fails when neither BENCH_INLINE_DATA_MARKER nor
</body> exist.
973-987: ⚠️ Potential issue | 🟠 Major

Stop emitting wall-clock and machine identity by default.

baked_at guarantees diff churn on every export, and machine.hostname / machine.label leak workstation identifiers into the committed JSON and inlined HTML. Keep baked_git_sha for provenance; make timestamp and machine metadata opt-in or omit them by default. The payload already uses Option<_>, so this can be done without a format break.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 973 - 987, The code currently always
populates baked_at and machine in the PolicyMatrixPayload causing churn and
leaking host identifiers; change the construction in main.rs so baked_at is None
and machine is None by default (keep baked_git_sha as-is for provenance), and if
you want to make them optional enablement via an explicit opt-in (e.g., env var
or config) then call OffsetDateTime::now_utc().format(&Rfc3339) and
local_benchmark_machine_descriptor() only when that opt-in is set; update the
variables baked_at and machine and the PolicyMatrixPayload initializer
accordingly (referencing baked_at, baked_git_sha,
local_benchmark_machine_descriptor, and PolicyMatrixPayload).

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 875-915: The tests all_parallel_policies_preserve_merged_ops (and
the similar block at 918-950) use ParallelExecutionPolicy::DYNAMIC_PER_WORKER as
the oracle which only proves policy agreement, not correctness; change the tests
to compute the ground truth by calling execute_serial() (or a fixed expected op
set) on the same GraphView/items and use that result as the baseline to compare
every policy (including the 1-worker-per-shard case) so any shared bug in
partition_into_shards() or execute_shard_into_delta() is detected; ensure you
replace the baseline_result/ baseline usage that currently runs
execute_parallel_with_policy(..., DYNAMIC_PER_WORKER) with the serial execution
result and keep the existing merge_deltas_ok/ assert_eq comparisons against that
serial baseline.
- Around line 23-92: Mark ShardAssignmentPolicy and DeltaAccumulationPolicy with
#[non_exhaustive] and make ParallelExecutionPolicy's fields private (e.g.,
change pub assignment/accumulation to non-pub) so callers can only use the
provided consts (DEFAULT, DYNAMIC_PER_WORKER, etc.); update any
constructor/usage sites to access those consts instead. Fix
execute_dedicated_per_shard (the function dispatch currently ignoring the
workers parameter) to either respect the workers argument when spawning/limiting
per-shard workers or clearly document that workers is intentionally ignored for
DedicatedPerShard and adjust the empty-workload fast-path to return semantics
consistent with that decision. Finally, update the test
all_parallel_policies_preserve_merged_ops to compare each parallel policy's
result against a serial execution oracle (run the serial executor for the same
input) instead of comparing to another policy baseline to catch shared
implementation bugs.

In `@crates/warp-core/tests/parallel_merge_warpopkey.rs`:
- Around line 328-336: The code redundantly asserts the error variant then
re-checks it; replace the assert+if-let pattern by directly extracting the
conflict from merged using a single pattern binding since the assert guarantees
the variant — for example, use a guarded let binding like: let
Err(MergeError::Conflict(conflict)) = merged else { unreachable!() }; targeting
the symbols merged and MergeError::Conflict (also apply the same change at the
similar block around lines 782-790).
- Around line 94-98: The test currently asserts merged.is_ok() then matches
merged with if let Ok(ops) = merged { ops } else { return }, which is redundant
and triggers clippy; replace that pattern by calling merged.unwrap() (e.g., let
ops = merged.unwrap()) so the assert and extraction are consistent; apply the
same change for all other occurrences of this assert + manual if-let/else-return
pattern in this test file (instances where a variable like merged is asserted
with assert!(...is_ok()) and then destructured with if let Ok(...) { ... } else
{ return }).

In `@Makefile`:
- Around line 204-219: bench-policy-bake and bench-policy-open-inline call
$(OPEN) with a quoted path that treats the fragment "#parallel-policy" as part
of the filename; change the argument to a proper file URI so the fragment is
preserved by browsers (e.g. build
"file://$(PWD)/docs/benchmarks/report-inline.html#parallel-policy" without
surrounding quotes) and pass that to $(OPEN) (keep the existing >/dev/null 2>&1
fallback and echo fallback logic intact); update the calls in the
bench-policy-bake and bench-policy-open-inline targets that currently use
$(OPEN) "docs/benchmarks/report-inline.html#parallel-policy" to use a
constructed file:// URI instead.

---

Outside diff comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 303-326: The empty fast-path and dispatch need to respect
DedicatedPerShard: change the initial empty-items branch and the match for
ShardAssignmentPolicy::DedicatedPerShard so that dedicated mode is treated
workerless — after partition_into_shards(items) if shards.is_empty() return
Vec::new() for DedicatedPerShard (instead of returning a vec sized by workers),
and route DedicatedPerShard to execute_dedicated_per_shard without relying on
the workers parameter; alternatively exclude DedicatedPerShard from the
worker-matrix by checking policy.assignment before returning the workers-sized
empty deltas so PerShard/DedicatedPerShard semantics are preserved (use symbols
partition_into_shards, execute_dedicated_per_shard, and the match on
policy.assignment/policy.accumulation to implement this).

In `@crates/warp-core/tests/parallel_merge_warpopkey.rs`:
- Around line 493-494: Replace the bare .unwrap() calls on the iterator position
lookups for delete_idx and upsert_idx with explicit expect messages to match the
test error-handling style: change result_ops.iter().position(|op| op ==
&delete_op).unwrap() and the upsert equivalent to use .expect("delete_op not
found in result_ops") and .expect("upsert_op not found in result_ops")
respectively so the failure points (delete_idx, upsert_idx, result_ops,
delete_op, upsert_op) produce clear messages when the test fails.

---

Duplicate comments:
In `@docs/benchmarks/PARALLEL_POLICY_MATRIX.md`:
- Around line 48-51: The markdown contains machine-local absolute paths for the
two artifact links (the lines referencing parallel-policy-matrix.json and
report-inline.html); update those link targets to repo-relative paths (e.g.,
point to docs/benchmarks/parallel-policy-matrix.json and
docs/benchmarks/report-inline.html or the correct relative location in the
repository) so they resolve in the repo browser, docs site, and CI
artifacts—edit the two link hrefs in PARALLEL_POLICY_MATRIX.md to use the
repo-relative paths instead of /Users/james/...

In `@Makefile`:
- Around line 193-202: The bench-policy-export make target currently runs `pnpm
exec prettier --write docs/benchmarks/report-inline.html` unconditionally which
makes the whole target fail when pnpm is absent; modify the
`bench-policy-export` recipe (target name: bench-policy-export) to gate the
formatter by checking for pnpm availability (e.g., run `command -v pnpm` or
similar) before invoking `pnpm exec prettier ...`, or alternatively move the
formatting step into a separate target (e.g., bench-policy-export-format) and
call it optionally so `cargo xtask bench policy-export` and `cargo xtask bench
bake` can succeed without pnpm.

In `@xtask/src/main.rs`:
- Around line 1212-1216: The error message that reports a missing marker
currently always references "docs/benchmarks/index.html"; update the call path
so the actual template path provided by the caller is threaded into
inline_benchmark_vendor_styles and replace_inline_link_tag and used in the bail!
message instead of the hardcoded path. Concretely, add a template_path parameter
(or carry the existing template argument) through inline_benchmark_vendor_styles
-> replace_inline_link_tag, and change the bail! invocation that constructs the
message around haystack.find(marker) to include that template_path (or its
display) so failures reflect the real --template input.
- Around line 991-1019: The function collect_policy_matrix_rows currently
returns an empty Vec and silently skips unreadable cases; change its signature
to return Result<Vec<PolicyMatrixRow>, Box<dyn std::error::Error>> (or your
crate's Error type) so failures are propagated, use early Err for a missing or
unreadable criterion_root instead of returning Vec::new(), replace the internal
continue paths (where parse_policy_case(...) or load_criterion_estimate(...)
fail) with error returns (or map_err + ?), and update any callers (e.g., the
bench policy-export flow) to handle the Result and fail the export on error; add
regression tests that assert export fails when the policy matrix dir is missing
and when a discovered case is unreadable.
- Around line 1166-1174: The current fallback uses html.replace("</body>", ...)
which silently no-ops if the template lacks both BENCH_INLINE_DATA_MARKER and
</body>; change the else branch to use html.replacen("</body>", &(inject +
"</body>"), 1) and then check if the replacement actually occurred (e.g.,
compare before/after or inspect the return) and hard-fail with a clear error if
no injection point was found; update the logic surrounding
BENCH_INLINE_DATA_MARKER, inject, and html to perform the replacen + explicit
error path and add a regression test for the missing-marker case to assert that
bench bake fails when neither BENCH_INLINE_DATA_MARKER nor </body> exist.
- Around line 973-987: The code currently always populates baked_at and machine
in the PolicyMatrixPayload causing churn and leaking host identifiers; change
the construction in main.rs so baked_at is None and machine is None by default
(keep baked_git_sha as-is for provenance), and if you want to make them optional
enablement via an explicit opt-in (e.g., env var or config) then call
OffsetDateTime::now_utc().format(&Rfc3339) and
local_benchmark_machine_descriptor() only when that opt-in is set; update the
variables baked_at and machine and the PolicyMatrixPayload initializer
accordingly (referencing baked_at, baked_git_sha,
local_benchmark_machine_descriptor, and PolicyMatrixPayload).

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 17c455a2-0624-4ee3-9591-bb76d592c207

📥 Commits

Reviewing files that changed from the base of the PR and between f0af6bc and 13cfb98.

📒 Files selected for processing (14)

Makefile
crates/warp-benches/Cargo.toml
crates/warp-benches/benches/README.md
crates/warp-benches/benches/parallel_baseline.rs
crates/warp-core/src/parallel/exec.rs
crates/warp-core/tests/parallel_merge_warpopkey.rs
docs/BENCHMARK_GUIDE.md
docs/benchmarks/PARALLEL_POLICY_MATRIX.md
docs/benchmarks/index.html
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
docs/plans/parallel-merge-and-footprint-design-review.md
docs/plans/parallel-merge-and-footprint-optimizations.md
xtask/src/main.rs

crates/warp-core/src/parallel/exec.rs

crates/warp-core/tests/parallel_merge_warpopkey.rs

Makefile

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/warp-benches/benches/parallel_baseline.rs`:
- Around line 53-72: Remove the pre-merge pairwise conflict scan inside
merge_for_commit_path: delete the for window in flat.windows(2) { assert!(...);
} loop (the assertion that compares window[0].0 and window[1].0) so the function
mirrors the release merge path; keep the sorting, flat.dedup_by(|a, b| a.0 ==
b.0), and the final flat.into_iter().map(|(_, op)| op).collect() unchanged.

In `@crates/warp-core/tests/parallel_parallel_exec.rs`:
- Around line 79-87: The helper merged_ops(deltas: Vec<TickDelta>) currently
hides test metadata by collapsing merge failures into a single generic
assertion; change its signature to accept a context: &str (e.g.,
merged_ops(deltas: Vec<TickDelta>, ctx: &str)) and propagate that context into
the merge assertion and panic path: call merge_deltas_ok(deltas) as before but
when asserting success use process-including context in the error message (e.g.,
assert!(result.is_ok(), "merge failed for {ctx}: {result:?}"), and when
unwrapping the Ok branch include ctx in the unreachable/panic message), so
callers can pass run metadata (seed/worker/load) to improve triage while keeping
behavior otherwise identical.

In `@docs/benchmarks/parallel-policy-matrix.json`:
- Around line 73-81: Regenerate the "best policy by load" takeaways and update
the inline narrative/study notes using the actual numbers in this JSON (keys:
"policy", "load", "mean_ns", "lb_ns", "ub_ns", "series"); specifically, for
load=100 do not claim a clear winner because the two static 1w rows overlap on
confidence intervals, for load=1000 choose static_per_worker:1w (mean_ns
253363.697) instead of static_per_worker:4w, and for load=10000 choose
static_per_shard:8w (mean_ns 1984663.141) instead of static_per_worker:8w;
repeat corresponding corrections for the other ranges mentioned (lines around
103-121, 233-241, 333-371) so the published recommendations reflect the dataset
exactly.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 4eb1590d-f28c-43af-95b7-1fc83b040e73

📥 Commits

Reviewing files that changed from the base of the PR and between 13cfb98 and 75c2fc4.

📒 Files selected for processing (4)

crates/warp-benches/benches/parallel_baseline.rs
crates/warp-core/tests/parallel_parallel_exec.rs
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html

crates/warp-benches/benches/parallel_baseline.rs

crates/warp-core/tests/parallel_parallel_exec.rs

docs/benchmarks/parallel-policy-matrix.json

crates/warp-core/src/parallel/exec.rs

coderabbitai

Actionable comments posted: 2

♻️ Duplicate comments (3)

crates/warp-core/tests/parallel_merge_warpopkey.rs (1)

24-31: ⚠️ Potential issue | 🟡 Minor

Collapse redundant assert+destructure control flow into single extraction path.

Line 26/35 assert success/variant, then Line 27/39 re-check with let ... else { unreachable! }. Same pattern appears again at Line 461-465. Use one extraction expression (expect or let ... else panic!) to avoid duplicate branching and dead fallback paths in tests.

Proposed refactor

 fn merged_ops(deltas: Vec<TickDelta>) -> Vec<WarpOp> {
-    let merged = merge_deltas_ok(deltas);
-    assert!(merged.is_ok(), "merge should succeed: {merged:?}");
-    let Ok(ops) = merged else {
-        unreachable!("assert above guarantees merge success");
-    };
+    let Ok(ops) = merge_deltas_ok(deltas) else {
+        panic!("merge should succeed");
+    };
     ops
 }
 
 fn merge_conflict(deltas: Vec<TickDelta>) -> warp_core::MergeConflict {
-    let merged = merge_deltas_ok(deltas);
-    assert!(
-        matches!(&merged, Err(MergeError::Conflict(_))),
-        "expected MergeError::Conflict, got: {merged:?}"
-    );
-    let Err(MergeError::Conflict(conflict)) = merged else {
-        unreachable!("assert above guarantees merge conflict");
-    };
+    let Err(MergeError::Conflict(conflict)) = merge_deltas_ok(deltas) else {
+        panic!("expected MergeError::Conflict");
+    };
     *conflict
 }
@@
-    let delete_idx = result_ops.iter().position(|op| op == &delete_op);
-    let upsert_idx = result_ops.iter().position(|op| op == &upsert_op);
-    assert!(delete_idx.is_some(), "DeleteNode must be present");
-    assert!(upsert_idx.is_some(), "UpsertNode must be present");
-    let (Some(delete_idx), Some(upsert_idx)) = (delete_idx, upsert_idx) else {
-        unreachable!("asserts above guarantee indices");
-    };
+    let delete_idx = result_ops
+        .iter()
+        .position(|op| op == &delete_op)
+        .expect("DeleteNode must be present");
+    let upsert_idx = result_ops
+        .iter()
+        .position(|op| op == &upsert_op)
+        .expect("UpsertNode must be present");

#!/bin/bash
# Read-only verification: find remaining assert+unreachable and double-check patterns in this test file.
rg -n -C2 'unreachable!\("assert above guarantees' crates/warp-core/tests/parallel_merge_warpopkey.rs
rg -n -C3 'assert!\(.*is_ok\(\)|assert!\(.*matches!\(' crates/warp-core/tests/parallel_merge_warpopkey.rs
rg -n -C3 'position\(\|.*\)\s*;[[:space:]]*$' crates/warp-core/tests/parallel_merge_warpopkey.rs

Also applies to: 33-43, 459-465

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@crates/warp-core/tests/parallel_merge_warpopkey.rs` around lines 24 - 31, The
helper merged_ops currently calls merge_deltas_ok, then asserts merged.is_ok()
and immediately re-destructures with `let Ok(ops) = merged else { unreachable!()
}`; collapse this into a single extraction by replacing the assert+separate
destructure with a single fallible-unwrapping expression (e.g., call
merge_deltas_ok(...).expect("merge should succeed") or use `let Ok(ops) =
merge_deltas_ok(...) else { panic!(...) }`) so you remove the redundant assert
and unreachable path—apply the same change to the other duplicate occurrences in
this test (the two places around lines 33-43 and 459-465) referencing merged_ops
and merge_deltas_ok to keep tests concise and avoid dead fallback branches.

Makefile (1)

196-205: ⚠️ Potential issue | 🟡 Minor

Don't make export success depend on Prettier.

bench-policy-export already finished the cargo export/bake at Lines 198-204. The unconditional pnpm exec prettier on Line 205 turns that success into a target failure on machines without pnpm or without JS deps. Guard it or move formatting to a separate target.

Proposed fix

-	`@pnpm` exec prettier --write docs/benchmarks/report-inline.html >/dev/null
+	`@if` command -v pnpm >/dev/null 2>&1; then \
+	  pnpm exec prettier --write docs/benchmarks/report-inline.html >/dev/null; \
+	else \
+	  echo "[bench-policy-export] pnpm not found; skipping prettier"; \
+	fi

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@Makefile` around lines 196 - 205, The Makefile target bench-policy-export
currently runs pnpm exec prettier unconditionally which can make the whole
target fail on systems without pnpm or JS deps; modify the bench-policy-export
target so the cargo steps (cargo xtask bench policy-export, bake,
check-artifacts) remain the required success path and the prettier formatting is
either moved to a separate target (e.g., bench-policy-format) or
guarded/optional (e.g., check for pnpm and JS deps before running or append a
fallback like `|| true`) so a missing pnpm does not mark bench-policy-export as
failed.

crates/warp-benches/benches/parallel_baseline.rs (1)

50-69: ⚠️ Potential issue | 🟠 Major

merge_for_commit_path still diverges from the production merge shape.

merge_deltas() in crates/warp-core/src/parallel/merge.rs sorts (WarpOpKey, OpOrigin), validates writes to newly created warps, and groups same-key ops by full equality. This helper drops origins entirely and replaces that with a benchmark-only windows(2) assert plus key-only dedup. The new policy matrix is therefore timing a different merge stage than the engine commit path.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/warp-benches/benches/parallel_baseline.rs` around lines 50 - 69,
merge_for_commit_path currently drops OpOrigin and only dedups by key, diverging
from production merge_deltas; update merge_for_commit_path to mirror
merge_deltas by preserving the op origin when collecting (e.g., collect
(op.sort_key(), op_origin, op)), sort by the compound (WarpOpKey, OpOrigin)
ordering used by merge_deltas, run the same validation step that ensures writes
to newly created warps are allowed (the "validate writes to newly created warps"
check from merge_deltas), and perform grouping/dedup by full equality of the
grouped items (key+origin+op) rather than key-only so same-key but non-equal ops
are not collapsed; reference merge_for_commit_path,
TickDelta::into_ops_unsorted, merge_deltas, WarpOpKey/OpOrigin and ensure the
final return maps back to Vec<WarpOp> after applying those checks.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 317-323: The public functions execute_parallel_sharded_with_policy
and execute_parallel_with_policy must not abort on invalid input; replace the
assert!(workers > 0) checks by making the workers parameter non-zero or by
returning a Result error: either change the signature to accept
std::num::NonZeroUsize for workers (update all call sites) or keep usize and
validate at the start returning Result<Vec<TickDelta>, InvalidInputError>
(create a simple error type) when workers == 0, and remove the panic/assert
statements from both function bodies.
- Around line 906-940: The test all_parallel_policies_match_serial_oracle
currently only compares parallel merges to the serial oracle for 4 workers;
extend it (and the similar blocks around execute_parallel_with_policy /
merge_deltas_ok) to run and assert semantic equivalence against execute_serial()
for worker counts 1, 4, and 8. Concretely, for each ParallelExecutionPolicy in
the policies array iterate over worker_counts [1,4,8], call
execute_parallel_with_policy(view, &items, workers, policy), merge with
merge_deltas_ok, ensure the result is Ok, then assert_eq!(merged, serial_oracle)
(where serial_oracle is produced by execute_serial(view, &items)). Apply the
same worker-count driven comparison in the other affected test blocks that call
execute_parallel_with_policy and only check len() today.

---

Duplicate comments:
In `@crates/warp-benches/benches/parallel_baseline.rs`:
- Around line 50-69: merge_for_commit_path currently drops OpOrigin and only
dedups by key, diverging from production merge_deltas; update
merge_for_commit_path to mirror merge_deltas by preserving the op origin when
collecting (e.g., collect (op.sort_key(), op_origin, op)), sort by the compound
(WarpOpKey, OpOrigin) ordering used by merge_deltas, run the same validation
step that ensures writes to newly created warps are allowed (the "validate
writes to newly created warps" check from merge_deltas), and perform
grouping/dedup by full equality of the grouped items (key+origin+op) rather than
key-only so same-key but non-equal ops are not collapsed; reference
merge_for_commit_path, TickDelta::into_ops_unsorted, merge_deltas,
WarpOpKey/OpOrigin and ensure the final return maps back to Vec<WarpOp> after
applying those checks.

In `@crates/warp-core/tests/parallel_merge_warpopkey.rs`:
- Around line 24-31: The helper merged_ops currently calls merge_deltas_ok, then
asserts merged.is_ok() and immediately re-destructures with `let Ok(ops) =
merged else { unreachable!() }`; collapse this into a single extraction by
replacing the assert+separate destructure with a single fallible-unwrapping
expression (e.g., call merge_deltas_ok(...).expect("merge should succeed") or
use `let Ok(ops) = merge_deltas_ok(...) else { panic!(...) }`) so you remove the
redundant assert and unreachable path—apply the same change to the other
duplicate occurrences in this test (the two places around lines 33-43 and
459-465) referencing merged_ops and merge_deltas_ok to keep tests concise and
avoid dead fallback branches.

In `@Makefile`:
- Around line 196-205: The Makefile target bench-policy-export currently runs
pnpm exec prettier unconditionally which can make the whole target fail on
systems without pnpm or JS deps; modify the bench-policy-export target so the
cargo steps (cargo xtask bench policy-export, bake, check-artifacts) remain the
required success path and the prettier formatting is either moved to a separate
target (e.g., bench-policy-format) or guarded/optional (e.g., check for pnpm and
JS deps before running or append a fallback like `|| true`) so a missing pnpm
does not mark bench-policy-export as failed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 01afa788-1905-4c2f-a8f8-269e1166aba8

📥 Commits

Reviewing files that changed from the base of the PR and between 75c2fc4 and 6182672.

📒 Files selected for processing (4)

Makefile
crates/warp-benches/benches/parallel_baseline.rs
crates/warp-core/src/parallel/exec.rs
crates/warp-core/tests/parallel_merge_warpopkey.rs

crates/warp-core/src/parallel/exec.rs

coderabbitai

Actionable comments posted: 4

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

crates/warp-core/src/parallel/exec.rs (1)
298-306: 🧹 Nitpick | 🔵 Trivial

execute_parallel_sharded doesn't cap workers at NUM_SHARDS, but execute_parallel does.

Lines 263–268 cap workers to avoid spawning more threads than shards. This function skips that cap. With workers > NUM_SHARDS, you spawn extra threads that immediately hit shard_id >= NUM_SHARDS and exit — pure overhead on every call.

The same inconsistency exists between execute_parallel_with_policy (caps at line 375–376) and execute_parallel_sharded_with_policy (no cap).

Either apply the cap uniformly or document that _sharded variants intentionally allow uncapped workers for benchmarking raw thread overhead.
Proposed fix
 pub fn execute_parallel_sharded(
     view: GraphView<'_>,
     items: &[ExecItem],
     workers: usize,
 ) -> Vec<TickDelta> {
     assert!(workers >= 1, "need at least one worker");
-    let workers = NonZeroUsize::new(workers).map_or(NonZeroUsize::MIN, |w| w);
+    let capped = workers.min(NUM_SHARDS);
+    let workers = NonZeroUsize::new(capped).map_or(NonZeroUsize::MIN, |w| w);
     execute_parallel_sharded_with_policy(view, items, workers, ParallelExecutionPolicy::DEFAULT)
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@crates/warp-core/src/parallel/exec.rs` around lines 298 - 306, The sharded
variants fail to cap requested worker count to NUM_SHARDS, causing wasted
threads that immediately exit; update execute_parallel_sharded and
execute_parallel_sharded_with_policy to mirror the cap logic used in
execute_parallel/execute_parallel_with_policy by reducing the input workers to
at most NUM_SHARDS (wrap into a NonZeroUsize the same way you do for
execute_parallel), then call the existing _with_policy path—i.e., compute
workers = min(workers, NUM_SHARDS) before proceeding so no more threads than
NUM_SHARDS are spawned.

♻️ Duplicate comments (4)

xtask/src/main.rs (4)
1286-1290: ⚠️ Potential issue | 🟡 Minor

Report the actual template path in replacement failures.

This error always points at docs/benchmarks/index.html, even when bench bake --template ... is using a different file. Thread the caller's template path through here so the failure identifies the broken template instead of the default one.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1286 - 1290, The bail! message in the
marker-search block always references the hardcoded docs/benchmarks/index.html
path; instead thread the caller-provided template path into this code and report
it on failure. Modify the function containing the marker lookup (the block using
let Some(marker_index) = haystack.find(marker) and the bail! call) to accept a
template_path parameter (or use the existing template variable from the caller)
and replace the hardcoded Path::new("docs/benchmarks/index.html") passed to
display_repo_relative with that template_path so the error message names the
actual template that failed.
1238-1248: ⚠️ Potential issue | 🟠 Major

Missing inline-data anchor should be fatal.

Appending before </body> is not a safe fallback here. If the consumer script runs earlier, the page can initialize before window.__CRITERION_* exists, and if </body> is absent replace is a silent no-op. Require BENCH_INLINE_DATA_MARKER, or at minimum verify that exactly one replacement happened before the consumer script.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 1238 - 1248, The code currently falls back to
inserting the benchmark inline script before "</body>" if
BENCH_INLINE_DATA_MARKER is missing; instead make the absence of
BENCH_INLINE_DATA_MARKER a fatal error (or at least assert exactly one
replacement occurred) to avoid race/No-op issues: after generating html via
inline_benchmark_vendor_styles(...) and inject via
build_benchmark_inline_script(...), check
html.contains(BENCH_INLINE_DATA_MARKER) and return an Err (with context) if
false, or perform a single html.replacen(BENCH_INLINE_DATA_MARKER, &(inject +
BENCH_INLINE_DATA_MARKER), 1) and verify that exactly one replacement happened
before continuing so the consumer script (window.__CRITERION_*) is guaranteed to
be present.
986-1005: ⚠️ Potential issue | 🟠 Major

Omit nondeterministic machine metadata from committed benchmark payloads.

baked_at changes on every bake, and machine can serialize the local hostname into tracked JSON/HTML. That guarantees artifact churn and leaks workstation identifiers even when the benchmark results are identical. Keep baked_git_sha for provenance, but make timestamp/host metadata opt-in or omit them by default.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@xtask/src/main.rs` around lines 986 - 1005, Summary: Remove nondeterministic
baked_at and machine metadata from the produced PolicyMatrixPayload to avoid
churn and leaking hostnames. Fix: stop computing and assigning baked_at and
machine in xtask::main: remove or omit the OffsetDateTime::now_utc() formatting
and the local_benchmark_machine_descriptor() call, and set the corresponding
PolicyMatrixPayload fields to None instead of Some(...). Keep baked_git_sha,
baked_source_digest, criterion_root, and results as-is; if you want
timestamp/host back later make them opt-in behind a flag or config in the code
that constructs PolicyMatrixPayload.
1065-1113: ⚠️ Potential issue | 🟠 Major

Validate the discovered policy-case set instead of silently walking the tree.

This exporter treats whatever parseable directories happen to exist as ground truth and drops unreadable or misparsed cases with continue. One stray leftover directory or one read failure is enough to publish a valid-looking but wrong matrix and exit 0. Build an explicit expected (policy, workers, load) set and fail on missing, unreadable, or unexpected rows.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@crates/warp-benches/benches/parallel_baseline.rs`:
- Around line 320-329: The policy_label function currently collapses any unknown
ParallelExecutionPolicy variant into "unknown_policy" via a wildcard arm; remove
the "_ =>" fallback so the match is exhaustive (list every current
ParallelExecutionPolicy variant explicitly in policy_label) so the compiler will
error when a new variant is added instead of silently labeling it
"unknown_policy". Ensure you reference the ParallelExecutionPolicy enum and the
policy_label function and add explicit arms for all existing variants (or
otherwise return a compile-time-failing branch) so new variants force an update.

In `@xtask/Cargo.toml`:
- Line 19: The artifact check only validates inputs via
compute_policy_matrix_source_digest() and baked_source_digest but does not
verify that the committed results[] in
docs/benchmarks/parallel-policy-matrix.json were actually derived from
target/criterion/*/estimates.json; update run_bench_check_artifacts() to either
(A) re-derive the results array from the estimates.json files (using the same
logic that produces mean_ns/lb_ns/ub_ns) and compare it byte-for-byte with the
committed results[] (fail on mismatch), or (B) include a hash of the serialized
results[] into baked_source_digest when computing
compute_policy_matrix_source_digest() so any manual edits to results[] change
the digest and cause the check to fail—refer to baked_source_digest,
compute_policy_matrix_source_digest(), run_bench_check_artifacts(), and the
results[] field in docs/benchmarks/parallel-policy-matrix.json when implementing
the chosen approach.

In `@xtask/src/main.rs`:
- Line 995: The code currently uses display_repo_relative(...) (e.g., assigned
to criterion_root_display) which emits platform-native separators; normalize
repo-relative artifact paths to use forward-slash separators before
serializing/hashing so Windows backslashes don't change digests. Replace calls
to display_repo_relative (and any variables like criterion_root_display /
path_display at the other sites you noted) with a version that returns a
repo-relative string with '/' separators (either by changing
display_repo_relative to normalize separators or by post-processing its output
to replace '\' with '/'), and use that normalized string for serialization and
baked_source_digest computation.
- Around line 851-871: The freshness check currently only includes the
policy-matrix tree and a hardcoded docs/benchmarks/index.html; extend it to
include the Criterion input files and the actual template used so stale core
rows or templates are detected. In run_bench_bake, when you compute the
baked_source_digest (and the equivalent places at lines ~902-946, ~1019-1033,
~1307-1317), add paths derived from collect_core_benchmark_rows(&criterion_root,
&repo_root) (i.e., the Criterion output files under criterion_root that produced
core_data/core_missing) and include the template_path (the file read into
template) in the set of files hashed; ensure bake_benchmark_report(...) and
write_policy_matrix_payload(...) behavior is unchanged but their inputs are
included in the digest so check-artifacts will fail if core_data, core_missing,
or the template are stale.

---

Outside diff comments:
In `@crates/warp-core/src/parallel/exec.rs`:
- Around line 298-306: The sharded variants fail to cap requested worker count
to NUM_SHARDS, causing wasted threads that immediately exit; update
execute_parallel_sharded and execute_parallel_sharded_with_policy to mirror the
cap logic used in execute_parallel/execute_parallel_with_policy by reducing the
input workers to at most NUM_SHARDS (wrap into a NonZeroUsize the same way you
do for execute_parallel), then call the existing _with_policy path—i.e., compute
workers = min(workers, NUM_SHARDS) before proceeding so no more threads than
NUM_SHARDS are spawned.

---

Duplicate comments:
In `@xtask/src/main.rs`:
- Around line 1286-1290: The bail! message in the marker-search block always
references the hardcoded docs/benchmarks/index.html path; instead thread the
caller-provided template path into this code and report it on failure. Modify
the function containing the marker lookup (the block using let
Some(marker_index) = haystack.find(marker) and the bail! call) to accept a
template_path parameter (or use the existing template variable from the caller)
and replace the hardcoded Path::new("docs/benchmarks/index.html") passed to
display_repo_relative with that template_path so the error message names the
actual template that failed.
- Around line 1238-1248: The code currently falls back to inserting the
benchmark inline script before "</body>" if BENCH_INLINE_DATA_MARKER is missing;
instead make the absence of BENCH_INLINE_DATA_MARKER a fatal error (or at least
assert exactly one replacement occurred) to avoid race/No-op issues: after
generating html via inline_benchmark_vendor_styles(...) and inject via
build_benchmark_inline_script(...), check
html.contains(BENCH_INLINE_DATA_MARKER) and return an Err (with context) if
false, or perform a single html.replacen(BENCH_INLINE_DATA_MARKER, &(inject +
BENCH_INLINE_DATA_MARKER), 1) and verify that exactly one replacement happened
before continuing so the consumer script (window.__CRITERION_*) is guaranteed to
be present.
- Around line 986-1005: Summary: Remove nondeterministic baked_at and machine
metadata from the produced PolicyMatrixPayload to avoid churn and leaking
hostnames. Fix: stop computing and assigning baked_at and machine in
xtask::main: remove or omit the OffsetDateTime::now_utc() formatting and the
local_benchmark_machine_descriptor() call, and set the corresponding
PolicyMatrixPayload fields to None instead of Some(...). Keep baked_git_sha,
baked_source_digest, criterion_root, and results as-is; if you want
timestamp/host back later make them opt-in behind a flag or config in the code
that constructs PolicyMatrixPayload.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: b73991cb-8d5e-45e2-9916-49011a2d529d

📥 Commits

Reviewing files that changed from the base of the PR and between 6182672 and c235096.

⛔ Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

📒 Files selected for processing (8)

crates/warp-benches/benches/parallel_baseline.rs
crates/warp-core/src/parallel/exec.rs
crates/warp-core/tests/parallel_parallel_exec.rs
docs/benchmarks/index.html
docs/benchmarks/parallel-policy-matrix.json
docs/benchmarks/report-inline.html
xtask/Cargo.toml
xtask/src/main.rs

crates/warp-benches/benches/parallel_baseline.rs

xtask/Cargo.toml

xtask/src/main.rs

flyingrobots added 3 commits March 28, 2026 06:57

docs: review parallel merge and shard optimizations

a82a501

feat(parallel): add shard policy benchmark matrix

55f4021

feat(bench): add shard policy matrix study

f6cf00d

docs(bench): explain policy matrix methodology

492f163

chatgpt-codex-connector bot reviewed Mar 28, 2026

View reviewed changes

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

flyingrobots added 2 commits March 28, 2026 11:34

feat(bench): unify benchmark report surface

973eb72

chore(gitignore): ignore python cache files

85aa10a

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

docs/benchmarks/index.html Outdated Show resolved Hide resolved

docs/benchmarks/index.html Show resolved Hide resolved

scripts/bench_parallel_policy_bake.py Outdated Show resolved Hide resolved

refactor(xtask): move benchmark bake tooling out of python

f0af6bc

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

docs(bench): improve report design and copy

fb7efc9

coderabbitai bot reviewed Mar 28, 2026

View reviewed changes

xtask/src/main.rs Show resolved Hide resolved

fix(bench): harden policy matrix study

2f4d2f0

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

fix(bench): clear review and CI blockers

13cfb98

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

fix(bench): clear policy matrix CI blockers

75c2fc4

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

crates/warp-benches/benches/parallel_baseline.rs Show resolved Hide resolved

crates/warp-core/tests/parallel_parallel_exec.rs Show resolved Hide resolved

docs/benchmarks/parallel-policy-matrix.json Show resolved Hide resolved

fix(parallel): tighten shard policy review follow-ups

6182672

flyingrobots commented Mar 29, 2026

View reviewed changes

crates/warp-core/src/parallel/exec.rs Show resolved Hide resolved

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

crates/warp-core/src/parallel/exec.rs Outdated Show resolved Hide resolved

crates/warp-core/src/parallel/exec.rs Show resolved Hide resolved

flyingrobots added 2 commits March 29, 2026 12:11

fix(bench): harden policy report follow-ups

ef916fc

fix(parallel): harden policy entrypoints

c235096

coderabbitai bot reviewed Mar 29, 2026

View reviewed changes

crates/warp-benches/benches/parallel_baseline.rs Show resolved Hide resolved

xtask/Cargo.toml Show resolved Hide resolved

xtask/src/main.rs Show resolved Hide resolved

xtask/src/main.rs Show resolved Hide resolved

fix(bench): tighten artifact bake contract

cf4dada

flyingrobots merged commit 2629822 into main Mar 29, 2026
31 checks passed

flyingrobots deleted the feat/parallel-shard-policy-bench branch March 29, 2026 20:21

	if items.is_empty() {
	// Can't use vec![TickDelta::new(); workers] because TickDelta doesn't impl Clone
	return (0..workers).map(\|_\| TickDelta::new()).collect();

Conversation

flyingrobots commented Mar 28, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Outputs

Initial Read

Verification

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Mar 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested labels

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

flyingrobots commented Mar 28, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Mar 28, 2026 •

edited

Loading