Skip to content

Make fusion rewrite backend specific#2004

Draft
ricardoV94 wants to merge 7 commits intopymc-devs:v3from
ricardoV94:fusion_backend_specific
Draft

Make fusion rewrite backend specific#2004
ricardoV94 wants to merge 7 commits intopymc-devs:v3from
ricardoV94:fusion_backend_specific

Conversation

@ricardoV94
Copy link
Copy Markdown
Member

Closes # ?

This depends on the frozen Apply PR so Composite doesn't require C to even compute hash/eq!

  • Make ScalarOp.impl an abstract method
  • Cleanup ScalarInnerGraphOp py_perform_fn to use impl directly
  • Cleanup Elemwise perform method
  • Add regression test for Composite with ops without C code
  • Make FusionOptimizer backend-specific (C and Numba)

ricardoV94 and others added 7 commits March 26, 2026 10:10
All concrete ScalarOp subclasses already define impl (either directly
or via ScalarInnerGraphOp). Making it abstract enforces this at
instantiation time rather than at call time.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Change py_perform_fn to convert inner ops via op.impl instead of wrapping
op.perform with storage allocation. This removes per-element storage
allocation overhead in Composite.impl and Composite.perform.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Rewrite Elemwise.perform to use a single _create_node_ufunc method that
builds a closure with inplace logic, dtype handling, and ufunc selection:

- nfunc_spec path: uses numpy/scipy ufuncs directly with out= for inplace
  and sig= for discrete->float dtype coercion
- frompyfunc path (<=32 operands): C iteration loop via np.frompyfunc,
  .astype() for object->correct dtype conversion. No inplace (destroy_map
  is a permission, not an obligation; frompyfunc already allocates).
- Blockwise vectorize fallback (>32 operands): _vectorize_node_perform
  with inplace_mapping support
- Scalar (0-d) outputs without nfunc_spec: calls impl directly with
  np.asarray wrapper

Removes stale self.ufunc/self.nfunc attributes and __getstate__/__setstate__.
Renames fake_node to dummy_node. Adds out= and inplace_mapping parameters
to _vectorize_node_perform in blockwise.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Test that Composite.impl works with scalar ops that only define impl
(no c_code). This exercises the py_perform_fn path with impl_convert.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FusionOptimizer now takes a backend parameter ("c" or "numba") that
determines which ops are fuseable:
- C fusion: "cxx_only" tag, checks scalar ops have C implementations
  (cached, since supports_c_code is expensive)
- Numba fusion: "numba" tag, fuses unconditionally

Python mode does not benefit from fusion: frompyfunc's C iteration loop
is faster than fused Composite.impl per-element overhead.

Also adds py-mode perform benchmarks (nfunc_spec vs frompyfunc paths)
and dummy scalar ops for testing.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant