[Draft] Fix off-by-one in reduction FLOP counts#57
Conversation
Per code review: the parenthetical '(g.axes ⊂ K)' could be read as the definition of inner-clean rather than the negation condition. Rephrase to state the rule (g.axes ⊆ R) and why it fails here.
Reductions now charge unique_out × (u_R − 1) instead of prod(input_shape). Symmetry-aware via propagate_symmetry_reduce (output) and inner-clean Burnside (input per output slice). Clamped to 1 for degenerate shapes. Closes #56.
- Move propagate_symmetry_reduce to top-level import (no circular dep; _symmetric.py does not import _flops.py). - Raise ValueError for scalar input with explicit axis instead of crashing with IndexError deeper in the call stack. - Tighten type hints to list[PermutationGroup] | None via TYPE_CHECKING. - Document the reduced_axes=None fallback in _compute_R_unique_count. - Add regression test for the scalar+axis ValueError.
All downstream tests that embed the old reduction cost formula (cost = numel) now expect the corrected cost (numel − 1 for full reduction, unique_out × (u_R − 1) with clamping otherwise).
Minor polish from code review: explain where literals like 14, 12, 2 come from so future readers don't have to reverse-engineer them. Also updates a stale "numel(input)" sub-bucket comment to match the new formula.
The notes previously claimed cost_multiplier=2, but the code does not pass cost_multiplier to the reduction cost path. The 2× factor actually comes from weight=2.0 in weights.csv. Update notes to reflect reality.
Three new tests in TestReductionSymmetry exercise the runtime path (we.sum on SymmetricTensor with axis/tuple-axis) for the three sym cases: split pair (no inner savings), preserving (sym survives on output), and inner-clean (sym entirely within reduced axes).
The attached cost-description label on reduction wrappers (sum, mean, std, etc.) now reflects the corrected off-by-one formula: "numel(input) - 1" instead of "numel(input)". User-facing docstrings now match the actual runtime behavior.
|
@wiwu2390 Reviewing more carefully — the "first element is a free copy" argument from #56 only structurally applies to single-fold ops. For multi-phase reductions (mean, std/var, ptp, …) the structural savings are either 0 or multiple, and uniform The argument's actual scope"Free first element" is a property of a single fold: Per-op structural check
Uniform Proposed narrowingKeep
Revert to
|
|
I think the formula
Example: Discussion here: https://alignmentresearch.slack.com/archives/C0AFESY004U/p1777064701467469?thread_ts=1776713328.350929&cid=C0AFESY004U |
|
I haven't checked this PR's behavior on non- |
|
Putting this on hold until the updated einsum-cost-calculation PR lands on main - same partition-counting logic, better to share than duplicate. Will need a careful re-evaluation of this PR on top of the shared helpers once the einsum-cost-calculation is in. |
Summary
A length-
nreduction chargesn − 1FLOPs, notn— the first input is afree copy, only the remaining
n − 1are accumulations.The fix applies to every reduction path:
SymmetricTensorFormula
Where:
unique_out— unique outputs afterpropagate_symmetry_reduce+ Burnside.u_R— unique inputs per output slice, via Burnside over inner-cleangroups only (
g.axes ⊆ R).g.axes ⊆ K): only shrinkunique_out.max(..., 1)— clamps scalar / size-1-axis / empty shape to 1 FLOP.Tests
uv run pytest tests/→ 3188 passed, 87 skipped, 0 failures.(4 pre-existing
scipyImportErrors unrelated.)Closes
Closes #56
Try it
Install from this branch:
Dense reductions —
n − 1per output slice:Symmetric reductions — full / preserving / split / inner-clean:
Targeted test runs: