Skip to content

Intern SymmetryGroup instances by canonical action so equivalent groups share one cache #73

@spMohanty

Description

@spMohanty

Identified in #67 / #71. Complementary to #71's closed-form-order() shortcut.

Background

SymmetryGroup.order() (and elements()) memoize their results on the instance via _order / _elements slots (src/whest/_perm_group.py:327-331). On the same instance, the second .order() call is ~1μs.

But the cache is per-instance, not per-equivalence-class. SymmetryGroup __eq__ and __hash__ are value-based on the canonical action (src/whest/_perm_group.py:407-411), so two independently constructed S_n instances compare equal but each carry their own empty cache. That's the path PR #67's c58454d97 (fixed in ed50c3853) hit: direct_product_groups / restrict_group_to_axes allocate fresh SymmetryGroup objects on every _counted_ufunc_outer / tensordot call, so the composition pays the cold O(|G|) walk every time even though all the cost-model paths really want is a single canonical group per equivalence class.

Reproducer

import time, whest as we

a = we.SymmetryGroup.symmetric(axes=tuple(range(8)))
b = we.SymmetryGroup.symmetric(axes=tuple(range(8)))
print(a == b)        # True (value equality)
print(a is b)        # False (separate instances)
print(hash(a) == hash(b))  # True

t = time.perf_counter()
a.order()
print(f"a cold: {(time.perf_counter() - t) * 1000:.0f}ms")   # ~1800 ms

# Independently constructed b — pays the cold cost again:
t = time.perf_counter()
b.order()
print(f"b cold: {(time.perf_counter() - t) * 1000:.0f}ms")   # ~1800 ms (no shared cache)

Suggested approach

Intern SymmetryGroup instances by canonical action: a process-wide WeakValueDictionary[CanonicalAction, SymmetryGroup] (or equivalent) so every equal-by-canonical-action group resolves to the same Python object — thereby sharing one _order / _elements cache.

Sketch:

# src/whest/_perm_group.py
import weakref

_GROUP_INTERN: weakref.WeakValueDictionary[
    tuple,                     # canonical action key
    "SymmetryGroup",
] = weakref.WeakValueDictionary()

class SymmetryGroup:
    def __new__(cls, *generators, axes=None):
        # Build the prospective instance, compute its canonical action key,
        # then return the interned singleton when one already exists.
        prospective = super().__new__(cls)
        prospective._init_state(generators, axes=axes)        # the body of today's __init__
        key = prospective._canonical_axis_action()
        existing = _GROUP_INTERN.get(key)
        if existing is not None:
            return existing
        _GROUP_INTERN[key] = prospective
        return prospective

__init__ becomes a no-op for re-used instances (or a guard via an _initialized flag) since __new__ already populated state. All factories (.symmetric, .cyclic, .dihedral, .young, .from_generators, .direct_product) and helpers in _symmetry_utils.py (intersect_groups, direct_product_groups, restrict_group_to_axes, remap_group_axes, embed_group, broadcast_group) inherit interning automatically.

Implications and design considerations

  • __eq__ becomes identity-equivalenta == b iff a is b for interned instances. The existing value-based __eq__ would still work and remain a correctness fallback for non-interned sources (e.g. unpickled / cross-process / future code paths that bypass interning).
  • Hash stability__hash__ is already canonical-action-based, so the existing dict lookup logic works directly.
  • MemoryWeakValueDictionary lets unused groups GC naturally; only currently-referenced groups stay interned.
  • Thread safety — needs a lock around the __new__ get-or-insert, or rely on the GIL for CPython (fine for now).
  • MutationSymmetryGroup is supposed to be immutable, but verify nothing inside _perm_group.py rebinds _order / _elements after construction in a way that would corrupt the shared cache. (Today _dimino-driven population is the only mutation; that's idempotent and safe to share.)
  • Equality with non-interned instances — until every construction path goes through __new__'s intern-check, value-equality must keep working. Don't replace __eq__ with id-equality.

Relationship to #71

Two complementary fixes for the same family of perf issues:

Fix Helps when Cost
Closed-form order() for known kinds (#71) Group is constructed via a known factory (symmetric, cyclic, dihedral, young, direct_product of known kinds). Even fresh instances are cheap. A _known_kind tag and ~10 lines of factorial / gcd / product math.
Interning (this issue) Group is reconstructed identically across calls (any kind, including from_generators). Caches are shared. A WeakValueDictionary and a __new__ override.

They stack: with both in place, known-kind groups are O(1) regardless of identity; unknown-kind groups are O(|G|) once per equivalence class per process; pathological calls that keep producing new equivalence classes still hit the degree-12 bail from PR #67.

#71's closed-form shortcut is preferable as the first fix because it doesn't change any visible semantics. Interning is a slightly heavier change (touches __new__, locking, weak references) but covers cases #71 doesn't (e.g. groups loaded from disk via from_payload, or composed via from_generators).

Acceptance criteria

  • SymmetryGroup.symmetric(axes=tuple(range(8))) constructed twice resolves to the same Python object (a is b).
  • The second instance's .order() is O(1) (test with time.perf_counter()).
  • All existing tests in tests/test_symmetry_group_api.py, tests/test_perm_group_coverage.py, and tests/test_symmetry_transport.py pass unchanged.
  • Round-trip via to_payload / from_payload returns the interned canonical instance.
  • A new test pinning the interning behaviour in tests/test_symmetry_group_api.py.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    area:coreCore whest API, counting, ndarray, and dispatch/wrapping pathsperformancePerformance optimization and runtime-sensitive improvementspriority:p2Nice-to-have, scheduledtopic:symmetrySymmetry metadata, inference, propagation, groups, and symmetric tensors

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions