perf: optimize model runner memory and latency#994
perf: optimize model runner memory and latency#994patricklundquist wants to merge 11 commits intomasterfrom
Conversation
1. Fix massive memory leak / blocking bug in `runner_item_stream` by using `next()` and `itertools.chain()` instead of `list()`. 2. Eliminate list allocations for status checks. 3. Mutate status messages in place instead of allocating new ones. Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Optimizes the model runner hot paths to reduce allocations and avoid materializing streaming iterators into memory, improving latency and preventing high memory usage during streaming.
Changes:
- Replaces per-output
successeslist construction with counter-based aggregation in predict/generate/stream paths. - Updates streaming setup to peek the first request via
next()and rebuild the iterator withitertools.chain()instead oflist(). - Mutates
resp.statusfields directly instead of allocating a newStatusproto.
|
@copilot open a new pull request to apply changes based on the comments in this thread |
|
@patricklundquist I've opened a new pull request, #995, to work on those changes. Once the pull request is ready, I'll request review from you. |
…ions (#995) * Initial plan * fix: address review comments on model runner optimizations - Call resp.status.Clear() before mutating code/description to prevent stale fields - Wrap runner_item_iterator with iter() in runner_item_stream to accept any iterable - Use tuple (first_runner_item,) instead of list [first_runner_item] in itertools.chain - Fix empty outputs to be treated as SUCCESS (aligned across predict/generate/stream) - Align FAILURE status code consistently across all three paths - Add unit tests covering all edge cases Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR optimizes the ModelRunner hot paths by reducing allocations during status aggregation and by fixing runner_item_stream to avoid materializing entire request streams into memory.
Changes:
- Replaced per-output boolean list allocations with success/total counters in
runner_item_predict,runner_item_generate, andrunner_item_stream, and now clears/mutatesresp.statusin place. - Updated
runner_item_streamto usenext()+itertools.chain()to safely peek the first item without consuming the entire iterable into a list. - Added unit tests covering status aggregation behavior and iterable handling for streaming.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
clarifai/runners/models/model_runner.py |
Improves latency/memory by avoiding intermediate lists and by streaming safely without buffering the entire iterator. |
tests/runners/test_model_runner_unit.py |
Adds unit coverage for status aggregation semantics and streaming iterator/iterable handling. |
Pre-computed cast type lookup dictionaries in `_MethodInfo` and removed dynamic reflection per-input (e.g. `hasattr`, `__args__`, `__origin__`) inside `_convert_input_protos_to_python`. This prevents a massive O(N) overhead during streaming and batching requests where every item was previously type-checked dynamically. Also cached `_get_method_infos()` lookups in wrappers. Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
|
@copilot update test assertions per: =========================== short test summary info ===========================
|
|
@patricklundquist I've opened a new pull request, #997, to work on those changes. Once the pull request is ready, I'll request review from you. |
…tatus code (#997) * Initial plan * fix: update test assertion to use RUNNER_PROCESSING_FAILED for all-failed stream outputs Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com> --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Optimizes the Clarifai runner hot path to reduce memory usage and allocations during status aggregation and streaming, and refactors input-proto casting in ModelClass to avoid repeated work.
Changes:
- Optimize
ModelRunnerstatus aggregation to avoid per-output list allocations and clear stale status fields in-place. - Fix
runner_item_streamiterator handling by peeking withnext()and rebuilding the stream viaitertools.chain()instead of materializing the full iterator. - Add unit tests covering status aggregation behavior and iterable-vs-iterator handling for streaming.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
clarifai/runners/models/model_runner.py |
Reduces allocations in status aggregation; updates stream handling to avoid list materialization. |
clarifai/runners/models/model_class.py |
Caches method registry lookups and precomputes cast targets to streamline input conversion. |
tests/runners/test_model_runner_unit.py |
Adds unit tests for status aggregation and iterator/iterable behavior in streaming. |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
@copilot resolve merge conflicts to maintain functionality and best performance. |
|
@patricklundquist I've opened a new pull request, #1000, to work on those changes. Once the pull request is ready, I'll request review from you. |
…ast-path and counter-based aggregation (#1000) * Initial plan * Merge origin/master: resolve conflict in runner_item_generate by combining fast-path single-output optimization with counter-based multi-output aggregation Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com> Agent-Logs-Url: https://github.com/Clarifai/clarifai-python/sessions/f0c05276-4c5a-4a7b-a438-99f3746bef0a --------- Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com> Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Minimum allowed line rate is |
runner_item_streamby usingnext()anditertools.chain()instead oflist().