Compound NumPy / linalg functions over-count FLOPs via internal ufuncs on WhestArray inputs

Identified in #67 — flagged as a known limitation in the PR's "Explicitly out of scope" section.

### Symptom

A user-facing `np.linalg.cond(whest_array)` is correctly routed to `we.linalg.cond` by the `__array_function__` allowlist (added in #67). But for compound functions where whest does **not** have its own dedicated wrapper, the protocol path strips the input to a plain `np.ndarray` and lets `_np.<func>(stripped)` run — which internally fires multiple ufunc dispatches that no longer enter the protocol path (the operands are now plain ndarrays). Whest's wrapper still charges the dense outer cost, but the internal ufunc ops compute on plain ndarrays and don't add anything. Net: under-count or over-count depending on how the wrapper estimates cost.

The risk surface today is small because #67's allowlist routes most compound ops to whest's own wrappers. But any path that ends in a raw `_np.<func>(stripped, ...)` call (e.g. fallbacks, ops we deliberately don't model, future numpy additions) can hit this.

### Reproducer (illustrative — needs a real-world hit to confirm)

```python
import whest as we, numpy as np
A = we.random.randn(50, 50)
with we.BudgetContext(flop_budget=int(1e10)) as bc:
    np.linalg.cond(A)             # routes via whest's wrapper — counted correctly
    # In contrast, anything that ends in `_np.<compound>(_to_base_ndarray(A))`
    # internally without a dedicated whest wrapper will double-charge or under-charge.
```

A concrete reproducer would help here — whoever picks this up should grep for `_to_base_ndarray(...)` followed by `_np.<func>(...)` in `linalg/`, `_pointwise.py`, etc., construct a per-call-stack model, and identify which call paths over- vs. under-charge.

### Suggested approach

1. Audit every site where whest charges a wrapper-level FLOP cost AND then calls a `_np.<compound>(stripped)` function. Compare the wrapper cost to the sum of ufunc costs the underlying numpy implementation would charge if every ufunc entered the protocol.
2. Where they don't match, either:
   - Match numpy's internal-ufunc count by adjusting the wrapper cost formula, **or**
   - Add a dedicated whest wrapper that drives the compound op via whest-counted primitives (preferred when feasible).
3. Add a regression test that pins wrapper-cost ≈ sum-of-internal-ufunc-cost for each audited op.

### Acceptance criteria

- A documented audit table listing each compound op and whether its current cost formula over- / under- / matches numpy's internal-ufunc decomposition.
- For each over-counter, a fix or an explicit comment justifying the over-count.

### Related

- PR #67 — Stage 4 (`__array_function__` allowlist) routes most compound ops correctly; this issue is the residual gap.
- PR #51 — established the strip-to-base pattern that creates this risk.
- Related Einsum-Cost work tracked in #32.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Compound NumPy / linalg functions over-count FLOPs via internal ufuncs on WhestArray inputs #69

Symptom

Reproducer (illustrative — needs a real-world hit to confirm)

Suggested approach

Acceptance criteria

Related

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Compound NumPy / linalg functions over-count FLOPs via internal ufuncs on WhestArray inputs #69

Description

Symptom

Reproducer (illustrative — needs a real-world hit to confirm)

Suggested approach

Acceptance criteria

Related

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions