fix: address review comments on model runner memory/latency optimizations#995
Merged
patricklundquist merged 2 commits intobolt-runner-perf-optimizations-240856538465656847from Mar 19, 2026
Conversation
- Call resp.status.Clear() before mutating code/description to prevent stale fields - Wrap runner_item_iterator with iter() in runner_item_stream to accept any iterable - Use tuple (first_runner_item,) instead of list [first_runner_item] in itertools.chain - Fix empty outputs to be treated as SUCCESS (aligned across predict/generate/stream) - Align FAILURE status code consistently across all three paths - Add unit tests covering all edge cases Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Copilot
AI
changed the title
[WIP] Optimize model runner memory and latency
fix: address review comments on model runner memory/latency optimizations
Mar 19, 2026
73aafef
into
bolt-runner-perf-optimizations-240856538465656847
1 check passed
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Follow-up to the model runner perf PR, addressing all reviewer feedback on correctness and robustness issues.
Why
The previous optimization introduced four bugs:
resp.status.code/descriptionin-place left stale proto fields (details,internal_details,stack_trace) on the response.runner_item_streamcallednext()directly on its argument, raisingTypeErrorif passed a plain iterable (e.g. a list).itertools.chain([first_runner_item], ...)allocated a one-element list on every call—counterproductive on the hot path.runner_item_streamusednum_total > 0 and num_success == num_totalinstead ofnum_success == num_total, diverging from the predict/generate paths: emptyresp.outputswas treated asRUNNER_PROCESSING_FAILEDinstead ofSUCCESS.How
resp.status.Clear()before settingcode/descriptionin all three paths (runner_item_predict,runner_item_generate,runner_item_stream).iter()wrapper:runner_item_streamnow doesrunner_item_iterator = iter(runner_item_iterator)beforenext().itertools.chain((first_runner_item,), runner_item_iterator).num_success == num_totaland failure code toFAILUREacross all three paths.Tests
Added
tests/runners/test_model_runner_unit.py(13 tests, no API credentials required):resp.outputs→SUCCESSin predict/generate/streamdetails,internal_details) cleared in all pathsrunner_item_streamdoes not raiseTypeErrorNotes
⌨️ Start Copilot coding agent tasks without leaving your editor — available in VS Code, Visual Studio, JetBrains IDEs and Eclipse.