Skip to content

WIP fuse elemwise+careduce#2200

Draft
ricardoV94 wants to merge 4 commits into
pymc-devs:mainfrom
ricardoV94:fuse_reduction
Draft

WIP fuse elemwise+careduce#2200
ricardoV94 wants to merge 4 commits into
pymc-devs:mainfrom
ricardoV94:fuse_reduction

Conversation

@ricardoV94

Copy link
Copy Markdown
Member

No description provided.

The fused loop writes the elementwise result to the write buffer, never to
the inplaced input, but the inner fgraph kept the inplace Elemwise: the
Python-mode fallback (OpFromGraph.perform) would destroy that input without
the outer destroy map declaring it, losing the ordering constraint for other
readers of the destroyed buffer. The JIT path was unaffected (write buffers
shadow the inplace pattern in make_outputs).

Write-and-direct duplication now runs before the strip and preserves the
inplace pattern, so an inplace on an output that stays materialized (the
write consuming a duplicate) still survives the fusion.
An output consumed by several eligible CAReduces previously disqualified
itself entirely (the detection required exactly one reduce client). Peel one
extra reduction per rewrite pass onto a duplicate output until each reduction
has its own copy, so e.g. [sum(f), max(f), prod(f), f] becomes a single
FusedElemwise with three fused reductions.
sum(x[idx]) had no Elemwise for FuseElemwise to anchor on, so the gather
materialized and the reduction stayed external. A new pre-rewrite wraps such
reductions in an identity Elemwise (covering the bare AdvancedSubtensor1,
axis-swap DimShuffle and flattened-ND-index Reshape forms), letting gather,
identity and reduction collapse into one fused loop.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant