Skip to content

perf: optimize model runner memory and latency#994

Open
patricklundquist wants to merge 11 commits intomasterfrom
bolt-runner-perf-optimizations-240856538465656847
Open

perf: optimize model runner memory and latency#994
patricklundquist wants to merge 11 commits intomasterfrom
bolt-runner-perf-optimizations-240856538465656847

Conversation

@patricklundquist
Copy link
Contributor

@patricklundquist patricklundquist commented Mar 18, 2026

  1. Fix memory leak / blocking bug in runner_item_stream by using next() and itertools.chain() instead of list().
  2. Eliminate list allocations for status checks.
  3. Mutate status messages in place instead of allocating new ones.
  4. Optimize _convert_input_protos_to_python in ModelClass

1. Fix massive memory leak / blocking bug in `runner_item_stream` by using `next()` and `itertools.chain()` instead of `list()`.
2. Eliminate list allocations for status checks.
3. Mutate status messages in place instead of allocating new ones.

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes the model runner hot paths to reduce allocations and avoid materializing streaming iterators into memory, improving latency and preventing high memory usage during streaming.

Changes:

  • Replaces per-output successes list construction with counter-based aggregation in predict/generate/stream paths.
  • Updates streaming setup to peek the first request via next() and rebuild the iterator with itertools.chain() instead of list().
  • Mutates resp.status fields directly instead of allocating a new Status proto.

@patricklundquist
Copy link
Contributor Author

@copilot open a new pull request to apply changes based on the comments in this thread

Copy link
Contributor

Copilot AI commented Mar 19, 2026

@patricklundquist I've opened a new pull request, #995, to work on those changes. Once the pull request is ready, I'll request review from you.

…ions (#995)

* Initial plan

* fix: address review comments on model runner optimizations

- Call resp.status.Clear() before mutating code/description to prevent stale fields
- Wrap runner_item_iterator with iter() in runner_item_stream to accept any iterable
- Use tuple (first_runner_item,) instead of list [first_runner_item] in itertools.chain
- Fix empty outputs to be treated as SUCCESS (aligned across predict/generate/stream)
- Align FAILURE status code consistently across all three paths
- Add unit tests covering all edge cases

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes the ModelRunner hot paths by reducing allocations during status aggregation and by fixing runner_item_stream to avoid materializing entire request streams into memory.

Changes:

  • Replaced per-output boolean list allocations with success/total counters in runner_item_predict, runner_item_generate, and runner_item_stream, and now clears/mutates resp.status in place.
  • Updated runner_item_stream to use next() + itertools.chain() to safely peek the first item without consuming the entire iterable into a list.
  • Added unit tests covering status aggregation behavior and iterable handling for streaming.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
clarifai/runners/models/model_runner.py Improves latency/memory by avoiding intermediate lists and by streaming safely without buffering the entire iterator.
tests/runners/test_model_runner_unit.py Adds unit coverage for status aggregation semantics and streaming iterator/iterable handling.

google-labs-jules bot and others added 4 commits March 19, 2026 09:49
Pre-computed cast type lookup dictionaries in `_MethodInfo` and removed
dynamic reflection per-input (e.g. `hasattr`, `__args__`, `__origin__`)
inside `_convert_input_protos_to_python`.

This prevents a massive O(N) overhead during streaming and batching
requests where every item was previously type-checked dynamically.

Also cached `_get_method_infos()` lookups in wrappers.

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
@patricklundquist
Copy link
Contributor Author

@copilot update test assertions per:

=========================== short test summary info ===========================
FAILED tests/runners/test_model_runner_unit.py::TestRunnerItemStreamStatus::test_all_failed_outputs - assert 25608 == 10020

  • where 25608 = code: RUNNER_PROCESSING_FAILED\ndescription: "Failed"\n.code
  • where code: RUNNER_PROCESSING_FAILED\ndescription: "Failed"\n = status {\n code: RUNNER_PROCESSING_FAILED\n description: "Failed"\n}\noutputs {\n status {\n code: FAILURE\n }\n}\n.status
  •  where status {\n  code: RUNNER_PROCESSING_FAILED\n  description: "Failed"\n}\noutputs {\n  status {\n    code: FAILURE\n  }\n}\n = multi_output_response {\n  status {\n    code: RUNNER_PROCESSING_FAILED\n    description: "Failed"\n  }\n  outputs {\n    status {\n      code: FAILURE\n    }\n  }\n}\n.multi_output_response
    
  • and 10020 = status_code_pb2.FAILURE

Copy link
Contributor

Copilot AI commented Mar 19, 2026

@patricklundquist I've opened a new pull request, #997, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 19, 2026 14:44
…tatus code (#997)

* Initial plan

* fix: update test assertion to use RUNNER_PROCESSING_FAILED for all-failed stream outputs

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Optimizes the Clarifai runner hot path to reduce memory usage and allocations during status aggregation and streaming, and refactors input-proto casting in ModelClass to avoid repeated work.

Changes:

  • Optimize ModelRunner status aggregation to avoid per-output list allocations and clear stale status fields in-place.
  • Fix runner_item_stream iterator handling by peeking with next() and rebuilding the stream via itertools.chain() instead of materializing the full iterator.
  • Add unit tests covering status aggregation behavior and iterable-vs-iterator handling for streaming.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
clarifai/runners/models/model_runner.py Reduces allocations in status aggregation; updates stream handling to avoid list materialization.
clarifai/runners/models/model_class.py Caches method registry lookups and precomputes cast targets to streamline input conversion.
tests/runners/test_model_runner_unit.py Adds unit tests for status aggregation and iterator/iterable behavior in streaming.

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@patricklundquist
Copy link
Contributor Author

@copilot resolve merge conflicts to maintain functionality and best performance.

Copy link
Contributor

Copilot AI commented Mar 24, 2026

@patricklundquist I've opened a new pull request, #1000, to work on those changes. Once the pull request is ready, I'll request review from you.

Copilot AI and others added 2 commits March 24, 2026 10:28
…ast-path and counter-based aggregation (#1000)

* Initial plan

* Merge origin/master: resolve conflict in runner_item_generate by combining fast-path single-output optimization with counter-based multi-output aggregation

Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
Agent-Logs-Url: https://github.com/Clarifai/clarifai-python/sessions/f0c05276-4c5a-4a7b-a438-99f3746bef0a

---------

Co-authored-by: copilot-swe-agent[bot] <198982749+Copilot@users.noreply.github.com>
Co-authored-by: patricklundquist <1460278+patricklundquist@users.noreply.github.com>
@github-actions
Copy link

Code Coverage

Package Line Rate Health
clarifai 45%
clarifai.cli 62%
clarifai.cli.templates 67%
clarifai.cli.templates.toolkits 100%
clarifai.client 65%
clarifai.client.auth 67%
clarifai.constants 100%
clarifai.datasets 100%
clarifai.datasets.export 69%
clarifai.datasets.upload 75%
clarifai.datasets.upload.loaders 37%
clarifai.models 100%
clarifai.rag 0%
clarifai.runners 52%
clarifai.runners.models 60%
clarifai.runners.pipeline_steps 39%
clarifai.runners.pipelines 72%
clarifai.runners.utils 62%
clarifai.runners.utils.data_types 72%
clarifai.schema 100%
clarifai.urls 58%
clarifai.utils 65%
clarifai.utils.evaluation 16%
clarifai.workflows 95%
Summary 60% (11642 / 19284)

Minimum allowed line rate is 50%

@patricklundquist patricklundquist marked this pull request as ready for review March 24, 2026 22:36
@patricklundquist patricklundquist requested a review from zeiler March 25, 2026 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants