Skip to content

fix: address review comments on model runner memory/latency optimizations#995

Merged
patricklundquist merged 2 commits intobolt-runner-perf-optimizations-240856538465656847from
copilot/sub-pr-994
Mar 19, 2026
Merged

fix: address review comments on model runner memory/latency optimizations#995
patricklundquist merged 2 commits intobolt-runner-perf-optimizations-240856538465656847from
copilot/sub-pr-994

Conversation

Copy link
Contributor

Copilot AI commented Mar 19, 2026

Follow-up to the model runner perf PR, addressing all reviewer feedback on correctness and robustness issues.

Why

The previous optimization introduced four bugs:

  1. Mutating resp.status.code/description in-place left stale proto fields (details, internal_details, stack_trace) on the response.
  2. runner_item_stream called next() directly on its argument, raising TypeError if passed a plain iterable (e.g. a list).
  3. itertools.chain([first_runner_item], ...) allocated a one-element list on every call—counterproductive on the hot path.
  4. runner_item_stream used num_total > 0 and num_success == num_total instead of num_success == num_total, diverging from the predict/generate paths: empty resp.outputs was treated as RUNNER_PROCESSING_FAILED instead of SUCCESS.

How

  • Stale status fields: Call resp.status.Clear() before setting code/description in all three paths (runner_item_predict, runner_item_generate, runner_item_stream).
  • iter() wrapper: runner_item_stream now does runner_item_iterator = iter(runner_item_iterator) before next().
  • Tuple over list: itertools.chain((first_runner_item,), runner_item_iterator).
  • Empty-outputs alignment: Unified success condition to num_success == num_total and failure code to FAILURE across all three paths.

Tests

Added tests/runners/test_model_runner_unit.py (13 tests, no API credentials required):

  • Empty resp.outputsSUCCESS in predict/generate/stream
  • Stale status fields (details, internal_details) cleared in all paths
  • Plain list passed to runner_item_stream does not raise TypeError
  • Empty input stream yields no output items
  • Mixed and all-failed output scenarios for the stream path

Notes


⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.

- Call resp.status.Clear() before mutating code/description to prevent stale fields
- Wrap runner_item_iterator with iter() in runner_item_stream to accept any iterable
- Use tuple (first_runner_item,) instead of list [first_runner_item] in itertools.chain
- Fix empty outputs to be treated as SUCCESS (aligned across predict/generate/stream)
- Align FAILURE status code consistently across all three paths
- Add unit tests covering all edge cases

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize model runner memory and latency fix: address review comments on model runner memory/latency optimizations Mar 19, 2026
Copilot AI requested a review from patricklundquist March 19, 2026 14:43
@patricklundquist patricklundquist marked this pull request as ready for review March 19, 2026 16:06
@patricklundquist patricklundquist merged commit 73aafef into bolt-runner-perf-optimizations-240856538465656847 Mar 19, 2026
1 check passed
@patricklundquist patricklundquist deleted the copilot/sub-pr-994 branch March 19, 2026 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants