Symmetry-aware einsum cost rewrite + de-vendor opt_einsum#91
Open
spMohanty wants to merge 135 commits into
Open
Symmetry-aware einsum cost rewrite + de-vendor opt_einsum#91spMohanty wants to merge 135 commits into
spMohanty wants to merge 135 commits into
Conversation
Adds empty stub modules for the new symmetry-aware einsum cost machinery that mirrors website/components/symmetry-aware-einsum-contractions/engine/. Subsequent tasks fill in each module.
H = Stab_G(V)|_V helpers and canonical tuple operations. Direct port of website/components/symmetry-aware-einsum-contractions/engine/outputOrbit.js with Python idioms (string key dedup, Sequence types, no JS Map/Set).
…perms The previous test passed two identical permutations, only proving hash dedup on identical inputs. Replace with two globally distinct permutations whose restrictions to V both collapse to the local identity (kernel of restriction is non-trivial). This actually validates the invariant claimed in restrict_stabilizer_to_positions's docstring: |H| <= |Stab_G(V)| because distinct g, g' in Stab_G(V) can yield the same g|_V.
Direct port of sizeAware/burnside.js. Validates cycle-size invariants (all labels in a cycle share a size) and Burnside-sum divisibility.
Direct port of shapeLayer.js. Four shapes: trivial / allVisible / allSummed / mixed. Diagnostic label that accompanies the regime ID.
…n.py Falling factorial, partition normalization, typed-partition enumeration with domain-class restriction (only same-sized positions can merge), labeling counts. Cached by sizes tuple via lru_cache(maxsize=256). Orbit-rep / induced-block-action utilities follow in the next commit.
Completes the port of partition/typedPartitions.js. Partition-orbit reps, induced-block-permutation (uses IMAGE on blocks, not raw stabilizer order), prefix-map dedup, output-side orbit count under H. These are the pieces the partitionCount regime calls per typed partition.
partition_budget (default 100_000): caps typed-partition enumeration in the partitionCount regime. Components exceeding the budget fall back to dense cost. dimino_budget (default 500_000): caps whole-expression G_pt closure to bound pathological declared-symmetry inputs. Also adds set_setting() as a thin public wrapper around configure() and a minimal _VALIDATORS map (used only for the two new budget keys) that rejects negative integers at set time.
RegimeContext, Verdict, RegimeOutput, Regime, RegimeStep, AccumulationResult. All frozen dataclasses; Literal types for regime_id and shape match the JS engine's string vocabulary.
Direct port of regimes/functionalProjection.js. Fires when every g preserves V as a set; computes alpha = M via size-aware Burnside. Covers JS appendix B.2 (allVisible), B.3 (allSummed), and B.4 (mixed-but-functional).
Direct port of regimes/singleton.js. Closed-form weighted Burnside + inclusion- exclusion over the visible label's G-orbit. Used for the |V|=1 case after functional-projection refuses (i.e. when projection branches on the single output coordinate's orbit).
Direct port of regimes/young.js. Closed-form multiset formula alpha = C(n+|V|-1, |V|) * C(n+|W|-1, |W|) when G is the full symmetric group on L_c, both V and W are nonempty, |V| >= 2, and all sizes agree.
Direct port of regimes/partitionCount.js. Iterates typed equality patterns up to G-equivalence; per-pattern contribution is (typed_labelings / |Ḡ_x̃|) * |A_x̃ / H|. Sub-trace records per-partition counts for diagnostic display + parity tests. This is the general fallback that handles mixed-shape cases the closed-form regimes (singleton, young, functionalProjection) refuse.
Direct port of accumulationCount.js. Three-stage dispatch: 1. trivial short-circuit (|G| <= 1) 2a. functionalProjection priority (covers allVisible/allSummed/mixed-functional) 2b. mixed ladder: singleton -> young -> partitionCount Fallthrough returns regime_id='unavailable' (brute-force disabled by policy). Trace captures every refused regime with its reason for debugging.
Direct port of algorithm.js#buildBipartite + buildIncidenceMatrix. One U-vertex per axis of each operand (no axis merging — per-operand symmetry handled by the wreath enumeration in the next task). Column fingerprints and fp_to_labels reverse map for derive_pi_canonical.
Direct port of wreath.js. Enumerates Pi_i (H_i wr S_{m_i}) where H_i is
operand i's declared axis symmetry and m_i is its multiplicity. Produces
row permutations on U-vertices for the sigma-loop. Supports None / symmetric
/ cyclic / dihedral / SymmetryGroup-typed declarations.
Direct port of algorithm.js#runSigmaLoop and derivePi. For each wreath element sigma, applies sigma to the incidence matrix, derives a label permutation pi via column-fingerprint matching, classifies pi as identity / v-only / w-only / cross-v-w. Cross-v-w actions are valid (a deliberate deviation from the deprecated partition-preserving rejection).
Direct port of fullGroup.js#buildFullGroup and algorithm.js#classifyGroupName.
Collects valid pi's, dedupes by array form, greedy minimal-generating-set
selection, dimino closure, classifies the resulting group name (S_n{...},
C_n{...}, D_n{...}, Z2{...}, S2{...}, or PermGroup<...>).
Direct port of componentDecomposition.js. Builds the label-interaction graph from G_pt's generators (labels coupled by any single generator), unions via union-find, restricts each generator to each component, runs dimino on the restriction, classifies the per-component group name. Each Component carries its own labels, va/wa split, sizes, visible_positions, generators, and elements ready for the regime ladder.
ComponentCost wraps a per-component AccumulationResult with M-side Burnside, dense_count, and group metadata. run_ladder_per_component is the pure transformation that both einsum and future reduction code paths reuse.
aggregate_einsum implements total = (k-1) * prod(M) + prod(alpha). When any component is unavailable, total falls back to k * dense_baseline (no-symmetry direct-event count) and a CostFallbackWarning fires. Gaming-resistant: exceeding partition budget never lowers the charge.
Wires the whole detection + decomposition + ladder + aggregation pipeline. Inputs: subscripts, shapes, per-operand symmetries, identity pattern, output subscript. Output: AccumulationCost with per-component breakdown and total.
Body raises NotImplementedError pointing to a future sprint. Locking the signature now lets us reuse run_ladder_per_component and decompose_into_components unchanged for ufunc.reduce when that sprint lands.
LaTeX strings for each regime + shape are stored as module-level dicts and looked up by describe(). Keeps dataclass instances small while preserving diagnostic strings for users / IDE completion.
Pure inspection function: extracts per-operand SymmetryGroup from SymmetricTensor inputs, builds an identity_pattern from id() groupings, delegates to compute_accumulation_cost. Re-exported from flopscope as einsum_accumulation_cost, AccumulationCost, ComponentCost, RegimeStep.
Adds an accumulation field plus a property-based optimized_cost that returns the accumulation total when attached. Falls back to the inner PathInfo's legacy optimized_cost otherwise. __getattr__ forwards all other field accesses.
Path search now uses stock opt_einsum behavior (no SubgraphSymmetryOracle). The path cache keys only on (subscripts, shapes, optimize). Symmetry-aware cost computation moves to a separate accumulation cache wired in the next task. Test churn for tests/test_einsum*.py is handled in Tasks 34-37.
Mirrors _path_cache but caches AccumulationCost objects keyed by (canonical_subscripts, shapes, sym_fingerprint, identity_pattern). Decision Q1 in the spec: two separate caches so path-cache misses don't trigger accumulation recomputation and vice versa.
…imized_cost Path search remains stock opt_einsum (Task 26). Now the BudgetContext deduction uses the new whole-expression direct-event count from _get_accumulation_cost. PathInfo wraps in FlopscopePathInfo so .accumulation surfaces to callers.
…used) Path search no longer threads a symmetry oracle (Task 26). The whole SubgraphSymmetryOracle module + its test file are dead code. Deletion- safety test asserts the module and its public name can no longer be imported.
Adds _walk_path_and_aggregate which decomposes k>=3 einsums into binary contractions via opt_einsum.contract_path, computing per-step AccumulationCost by calling compute_accumulation_cost recursively for each binary step (k=2 path). Fixes Wilson's bug where a 3-operand chain ij,jk,kl->il was charging 29900 (fictitious 3-way cost) instead of 3800 (two binary matmuls at 2*n^3 - n^2 each). Full-expression per_component is preserved in the returned AccumulationCost so JS-parity tests remain unaffected.
Route _walk_path_and_aggregate binary sub-steps through get_accumulation_cost_cached so shared steps (e.g. "ij,jk->ik") hit the LRU cache across different top-level expressions. Add test_per_step_cache_hits_across_expressions to assert >=1 hit when two 3-operand chains share a binary sub-step.
…ulation_cost Per-step costs in build_path_info now call get_accumulation_cost_cached via symmetric_flop_count, making info.steps[i].flop_cost == info.accumulation.per_step[i].total by construction. Updated test_build_path_info expected values (60→105, 120→105) to reflect the accumulation formula (fma_cost-independent) and added parity test confirming the two layers agree.
…term The Task 7 refactor exposed a latent bug: aggregate_einsum never applied fma_cost() to the mu = (k-1)·M multiplication term, so configure(fma_cost=2) had no effect on accumulation-based costs. Fix: multiply mu by _fma_cost() in aggregate_einsum. The alpha − num_output_orbits accumulation term is intentionally NOT multiplied — accumulation adds are 1 op regardless of FMA convention. Also add fma_cost to the _accumulation_cache key so that calls under different fma_cost settings produce distinct cache entries instead of returning stale results. Updated tests: - test_build_path_info_uses_fma_two_when_configured: 105 → 165 (correct for fma=2) - test_fma_cost_in_path_cache_key: 8/16 → 12/20 (correct with alpha term) - test_fma_cost_affects_multiplication_term_only: new regression test (12 and 20)
…k 7 cascade) Fix real bug in _walk_path_and_aggregate: m_total was computed as the product of per-step intermediate m_total values, which always exceeds dense_baseline and makes _has_savings() return False for all multi-operand expressions. Fix uses prod(c.m for c in full_expression_component_costs) instead, which correctly reflects the unique output count of the full k-ary expression. Update 4 test assertions that hardcoded pre-path-aware single-step formula values (speedup 5x→2.778x, savings 80%→64%, optimized_cost 6380→20000 for triple S3 case).
…kes it unnecessary) inner.optimized_cost == accumulation.total by construction after §6.4 reconciliation; simple delegation to fmt() is sufficient. Update test to remove the now-invalid assertion that naive_cost must not appear in the header (naive_cost is not reconciled, only optimized_cost is).
Compute dense_flop_cost (helpers.flop_count baseline) and symmetry_savings (clamped to [0,1]) in the build_path_info per-step loop; add test asserting all steps have non-zero dense_flop_cost and in-range savings.
Adds _try_named_group, _fmt_generators, _fmt_sym, _fmt_step_sym, and _fmt_unique_dense helpers; replaces the stripped format_table body with main's full version including dense_flops, savings %, and symmetry columns.
Add missing _RICH_SYMMETRY_STYLES module-level dict and the _rich_symmetry_token_text / _rich_step_sym_text helpers, then replace the branch's stripped _rich_step_table with main's symmetry-aware version (dense_flops, savings, unique/total, and symmetry columns). Smoke test added for info.print(verbose=False/True).
…m main Also restores three helper files required by _paths.py: _subgraph_symmetry.py (529 lines), _symmetry.py (134 lines), _typing.py (37 lines). Updates test_deletion_safety.py to reflect that these modules are now present again.
…edy) from main Update test_deletion_safety.py: flip is-gone assertion to is-importable for _path_random.
…rent branch
- Cherry-pick 625 lines from origin/main:tests/test_opt_einsum_paths.py
- Fix 3 API-drift import errors:
* _parser.get_symbol -> opt_einsum.parser.get_symbol (deleted in 9c44177)
* _testing.build_shapes/rand_equation -> opt_einsum.testing (deleted same)
* _typing.* unchanged (still present on branch)
- Add PEP-562 __getattr__ hook to _opt_einsum/__init__.py exposing
oe._helpers / oe._paths / oe._path_random without shadowing the local
_helpers submodule (shadowing broke naive_cost calculation in 3 unrelated tests)
- Adapt 10 stale-assertion failures:
* test_custom_dp_can_set_minimize: update 7 expected FMA-2 costs to FMA-1 values
* test_custom_random_greedy / test_custom_branchbound / test_parallel_random_greedy:
remove opt_cost == optimizer.best["flops"] assertions with TODO comments
(FMA convention mismatch; path correctness is still verified)
125 passed, 2 skipped — full suite 0 failed.
…ggregate - Build SubgraphSymmetryOracle once per k>=3 einsum from per_op_symmetries and identity_pattern - For each binary step, query the oracle per input subset to derive sym_fingerprint and step_identity_pattern - Propagate step_identity_pattern (restricted from original expression's identity groups) to per-step cache calls - Sync inner.steps[i].flop_cost to acc_step.total in FlopscopePathInfo.from_inner to maintain reconciliation invariant - Regression test: ij,jk,ki->ijk with S2 symmetric A gives cost.total=104 < 128 (was 128 with dense intermediates) - Update tests for Task 17b tighter values: ij,ik,il->jkl 20000→11000, ijk,ai,bj->abk sym<dense
Insert a regime column (between subscript and flops) into PathInfo.format_table and the matching FlopscopePathInfo.__str__ renderer. The column shows the per-step regime id (trivial, functionalProjection, singleton, young, partitionCount, unavailable) drawn from accumulation.per_step[i].per_component[0]. _fmt_step_regime returns '-' defensively when _regime is absent.
Mirror format_table's regime column in the Rich variant. The column is shown conditionally (only when any step carries _regime, mirroring the any_unique / any_regime pattern) so the verbose-detail layout is not compressed on narrow terminals.
Attaches _acc_step to each StepInfo in FlopscopePathInfo.__str__, then reads m_total/alpha/num_output_orbits from it in both the plain-text format_table verbose branch and the Rich _rich_verbose_detail_text helper.
…attern Symmetric and dense operands sharing the same subscripts/shapes now get distinct _path_cache entries, preventing a dense-optimal path from being silently reused for symmetric inputs once symmetry-aware path search is enabled. The cache key is extended with a per-operand symmetry fingerprint (tuple of SymmetryGroup-or-None) and the identity_pattern. Test added to verify the symmetric call is always a cache miss relative to the dense call.
…tributes All accessed via __getattr__ hook returning upstream opt_einsum modules; pyright resolves them as object. Also add explicit importlib.util import in test_path_info_renderer.py to satisfy reportAttributeAccessIssue.
86f7952 to
feecd54
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rewrite the symmetry-aware cost model to mirror the JS explorer's α/M direct-event model. Cost is now path-independent:
(k−1)·∏Mₐ + ∏αₐsummed across connected components, with a 5-regime ladder (trivial → functionalProjection → singleton → young → partitionCount). Applies to:fnp.einsum/einsum_path(symmetry-agnostic path search; α/M total via the new model)np.ufunc.reducefamily (sum,prod,max,min, ...) - Tier-1 reductions charge(n_input − n_output_orbits)accumulations;meanadds an_output_orbitsdividenp.median/np.percentile/np.quantile- Tier-2 discount (orbit-count selection)De-vendor opt_einsum. After (1) the fork wasn't doing anything custom for path search, so
opt_einsum>=3.3.0,<4.0.0becomes a runtime dep.flopscope._opt_einsum/is now a 1.3k-line shim (was 4.5k) that adapts upstream'sPathInfoand recomputes per-step FLOPs under flopscope's FMA convention.Public surface
flopscope.einsum_accumulation_cost(...)returns anAccumulationCostwith per-component breakdown, regime trace, anddescribe()for LaTeX.flopscope.reduction_accumulation_cost(input_shape, axes_summed, symmetry=None)- same model applied to ufunc reductions.path_info.accumulationfield oneinsum_path()results.fma_cost(default 1; set to 2 for textbook convention - uniform across all flopscope cost surfaces),partition_budget(100k),dimino_budget(500k).Breaking
path_info.optimized_costchanges for expressions with declared symmetry - now the whole-expression α/M total, not the old per-stepdense · unique/totalsum.flop_countreverts to dense; symmetry-aware total lives onpath_info.accumulation.total.np.ufunc.reducefamily (sum,prod,max,min, ...) now charges(n_input − n_output_orbits)accumulations instead ofn_input. For unsymmetric inputs this is the issue-Off-by-one in sum/mean reductions #56 off-by-one fix; for symmetric inputs the orbit count gives further savings.np.meancharges sum-cost +n_output_orbitsdivides (one per unique output element).np.median/np.percentile/np.quantilecharge a Tier-2 discount:n_output_orbitsselections instead of dense._flops.analytical_reduction_costandflopscope.accounting.reduction_costroute through the new model - same call signatures, different (lower) numbers for symmetric cases.FMA_COSTconstant gone →fma_cost()function.use_inner_symmetrysetting removed._opt_einsum/_paths.py,_path_random.py,_parser.py,_blas.py,_testing.py,_typing.py,_subgraph_symmetry.py,_symmetry.py.Migration notes in
CHANGELOG.md.Issues fully addressed
Closes Symmetric einsum FLOP counting: only count multiplications + use symmetry group of unsummed tensor #32 - Symmetric einsum FLOP counting. The α/M direct-event model is the architectural answer. It uses the full pointwise symmetry group
G_pt(visible and summed side; declared + identical-operand-swap-induced) and counts both unique multiplications and accumulation events. The model is path-independent by construction.Numerical expectations in the original issue reflect an earlier framing ("only count multiplications, use symmetry group of unsummed tensor"). The new model is strictly more comprehensive - for example, the issue's
einsum('ij,k->ik', B_sym2, C)at n=10 expects 550, the new model gives 1,550 (because it counts accumulation events on top of the 550 unique multiplications, and tracks symmetry on the full pointwise group). The shift supersedes the issue's specific numerical examples.Closes Off-by-one in sum/mean reductions #56 - Off-by-one in sum/mean reductions.
sum(A)forA.shape = (10,)now charges 9 flops (then − 1accumulations the issue asked for), not 10.meancharges sum-cost + 1 divide. Falls out of applying the α/M direct-event model to the reduction path: the first input element is a free copy, only the remainingn − 1accumulations cost.Issues partially addressed
Short-circuit einsum pre-cache symmetry/identity work before path lookup #26 - Short-circuit einsum pre-cache work.
_path_cachekey is now(subscripts, shapes, optimize, fma_cost)only - no per-op symmetry fingerprint, no identity-pattern grouping in the key. The pre-cache work for path search has been eliminated as a side effect of the symmetry-oracle removal._accumulation_cachepath insideeinsum(). The "no SymmetricTensor operands present" fast path inside_get_accumulation_costcould still skip fingerprint materialization entirely.Repeated-operand outer should use symmetry-aware FLOP counting #65 - Repeated-operand outer FLOP counting.
fnp.einsum("i,j->ij", v, v), the new α/M model returns a symmetry-aware count via the operand-swap-induced S₂{i,j} on the output.fnp.outer(v, v)itself is unchanged. The remaining work is to aliasouterrepeated-operand cases to the einsum cost path.Explicitly out of scope
einsum("ik,jl->ij", A, B)→einsum("i,j->ij", A.sum(axis=1), B.sum(axis=1))) is orthogonal to the α/M model. The model rigorously counts direct events on the full group action; the lowering captures algebraic equivalence on single-tensor reductions. The two could compose in a future pass.symmetrizecost model. Different op, not touched by this rewrite.Test plan
uv run pytest tests/accumulation/ -q→ 376 reduction + earlier accumulation tests pass (verified locally in 5.9s)uv run pytest tests/accumulation/test_js_parity.py -v→ 22/22 JS preset parity (verified)uv run pytest tests/test_reduction_integration.py tests/test_method_tracking.py -q→ reduction integration green