Make fusion rewrite backend specific#2004
Draft
ricardoV94 wants to merge 7 commits intopymc-devs:v3from
Draft
Conversation
All concrete ScalarOp subclasses already define impl (either directly or via ScalarInnerGraphOp). Making it abstract enforces this at instantiation time rather than at call time. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change py_perform_fn to convert inner ops via op.impl instead of wrapping op.perform with storage allocation. This removes per-element storage allocation overhead in Composite.impl and Composite.perform. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite Elemwise.perform to use a single _create_node_ufunc method that builds a closure with inplace logic, dtype handling, and ufunc selection: - nfunc_spec path: uses numpy/scipy ufuncs directly with out= for inplace and sig= for discrete->float dtype coercion - frompyfunc path (<=32 operands): C iteration loop via np.frompyfunc, .astype() for object->correct dtype conversion. No inplace (destroy_map is a permission, not an obligation; frompyfunc already allocates). - Blockwise vectorize fallback (>32 operands): _vectorize_node_perform with inplace_mapping support - Scalar (0-d) outputs without nfunc_spec: calls impl directly with np.asarray wrapper Removes stale self.ufunc/self.nfunc attributes and __getstate__/__setstate__. Renames fake_node to dummy_node. Adds out= and inplace_mapping parameters to _vectorize_node_perform in blockwise. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test that Composite.impl works with scalar ops that only define impl (no c_code). This exercises the py_perform_fn path with impl_convert. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FusionOptimizer now takes a backend parameter ("c" or "numba") that
determines which ops are fuseable:
- C fusion: "cxx_only" tag, checks scalar ops have C implementations
(cached, since supports_c_code is expensive)
- Numba fusion: "numba" tag, fuses unconditionally
Python mode does not benefit from fusion: frompyfunc's C iteration loop
is faster than fused Composite.impl per-element overhead.
Also adds py-mode perform benchmarks (nfunc_spec vs frompyfunc paths)
and dummy scalar ops for testing.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes # ?
This depends on the frozen Apply PR so Composite doesn't require C to even compute hash/eq!