Skip to content

OddSHAP approximator#522

Open
Sara-ne wants to merge 37 commits into
mmschlk:mainfrom
FabianK-Dev:oddshap_approximator
Open

OddSHAP approximator#522
Sara-ne wants to merge 37 commits into
mmschlk:mainfrom
FabianK-Dev:oddshap_approximator

Conversation

@Sara-ne

@Sara-ne Sara-ne commented May 20, 2026

Copy link
Copy Markdown

Motivation and Context

This PR adds a new OddSHAP approximator for estimating first-order Shapley values.
OddSHAP is based on the method by Fumagalli et al. (2026). The main idea is to estimate Shapley values through odd Fourier terms.


Implemented

  • OddSHAP-specific sampling weights
  • Regression weights for the weighted least-squares problem
  • Paired coalition sampling through the existing sampler
  • Used InterventionalTreeExplainer to select relevant higher-order odd interactions
  • Built the active odd Fourier support from:
    • the empty interaction,
    • all single-player interactions,
    • selected higher-order odd interactions
  • Solved the constrained odd Fourier regression problem
  • Converted the fitted odd Fourier coefficients into first-order Shapley values
  • Added a ValueError when the budget is too small

Notes

This implementation doesn't add a separate ProxySPEX-style adapter. Instead, it reuses the existing tree interaction code through InterventionalTreeExplainer.
The current implementation only supports odd_only=True, because the final Shapley value computation is based on odd-cardinality Fourier terms.


Public API Changes

  • No Public API changes
  • Yes, Public API changes (Details below)

How Has This Been Tested?

OddSHAP was verified at two levels: unit tests and a reproduction of the paper (arXiv:2602.01399). The reproduction checks our measured MSE against the values reported in the paper's Table 1

  1. Unit tests:
    tests/shapiq/tests_unit/tests_approximators/test_approximator_oddshap.py (in this PR) — 59 tests, all passing, against ExactComputer ground truth:
  • Exact recovery of Shapley values at full budget
  • Monotone MSE convergence as the budget grows
  • Determinism under a fixed random_state
  • Low-budget ValueError path (no internal fallback approximator)
  • Approximator-interface conformance
  1. All four estimators (MSR, SVARM; PermutationSampling, OddSHAP) reproduce the paper within a factor of ~2–3 (expected RNG/hardware / instance-set spread).
    Tabular value functions (Estate, Cancer, CG60, IL60, NHANES, Crime; d=15..101) — OddSHAP runs and converges correctly across the full dimension range.

Checklist

  • The changes have been tested locally.
  • Documentation has been updated (if the public API or usage changes).
  • An entry has been added to CHANGELOG.md (if relevant for users).
  • The code follows the project's style guidelines.
  • I have considered the impact of these changes on the public API.

Sara-ne and others added 16 commits May 10, 2026 16:21
Two free-function helpers used inside OddSHAP.approximate:

- lgboost_to_fourier(model_dict): converts a fitted LightGBM model
  to its aggregated Fourier representation via per-tree DFS recursion
  (Gorji et al., arXiv:2410.06300).
- top_k_interactions(coeffs, k, odd=True): selects the top-k
  interactions by |coefficient|, optionally restricted to odd
  cardinality (per the OddSHAP Theorem 3.2 restriction).

Mirrors the interface that the OddSHAP paper code imports as
`from oddshap.proxyspex import lgboost_to_fourier, top_k_interactions`.

14 unit tests cover: top-k selection logic (odd filter, magnitude
sort, k-limit, edge cases); single-leaf / one-split / two-level-split
tree recursions with hand-computed expected coefficients; end-to-end
on fitted LightGBM (constant, linear, XOR targets); odd singletons
recovery via the full pipeline.
Replaces Sara's TODO stub with a ProxySPEX-style screening pass:

  1. lgboost_to_fourier(surrogate.booster_.dump_model())
     converts the fitted LightGBM surrogate to its sparse Fourier
     representation (DFS recursion, one entry per encountered
     interaction).

  2. top_k_interactions(coeffs, k=n_candidate_interactions, odd=False)
     keeps the top-k entries by |coefficient|. Pre-filtered to
     cardinality >= 3 odd interactions so the budget is not spent on
     singletons (those are added unconditionally by _build_support).

Smoke test on SOUM(n in {6, 8, 10}) at full budget reaches the
regression branch and returns sensible odd higher-order interactions
(e.g. (1, 4, 5), (0, 3, 6), ...). All 14 existing adapter unit tests
still pass.
50 tests covering OddSHAP's algorithmic guarantees:

* Init contract — defaults, custom kwargs, attribute exposure
* Coalition-size sampling weights — shape, sum=1, zero boundaries,
  symmetry, paper formula 1/((n-1)*C(n-2,k-1))
* approximate() return-value contract — InteractionValues fields,
  baseline equals v(empty), estimation_budget recorded, estimated flag
* Constraint-system identities (exact, enforced by construction):
  - efficiency axiom: sum_i phi_i = v(N) - v(empty)
  - baseline: phi_empty = v(empty)
* Determinism — same seed -> bit-identical output; sub-budget seeds differ
* Branch routing via runtime_last_approximate_run keys
* ProxySPEX adapter integration — output is higher-order odd only,
  respects k limit, handles zero budget and missing surrogate
* _build_support invariants — empty + all singletons always present,
  even/singleton inputs dropped, unsorted tuples normalized
* Game-property tests on DummyGame — symmetry / efficiency on a
  closed-form game
* Convergence vs ExactComputer — xfail(strict=False) since OddSHAP
  is a sparse-recovery method (n=6 currently xpasses)
* Efficiency persists at sub-budget — by construction

Two xfails documented inline:
  - low-budget fallback path raises IndexError in shapiq.tree.explainer
    on constant LightGBM surrogates (tracked separately)
  - convergence on dense SOUM at full budget for n=8 (sparse-recovery
    method; tightens once SG-41 paired-sampling lands)

Results: 47 passed, 2 xfailed, 1 xpassed (3.3 s).
… changes

After Sara registered OddSHAP in shapiq.approximator.regression.__init__,
the top-level import of TreeExplainer in oddshap.py triggers a circular
import (regression -> oddshap -> tree.explainer -> explainer -> tree).
Moved that import inside _approximate_via_fallback where it is actually
used.

Test alignment with Sara's API changes:

  * odd_only=False is now explicitly rejected -- split that into
    test_init_rejects_odd_only_false and dropped the kwarg from
    test_init_custom_kwargs.

  * _select_odd_interactions now takes a budget keyword and bypasses
    the top-k truncation when budget >= 2**n. Updated the 4 existing
    call sites to pass budget=, and added
    test_select_odd_interactions_full_budget_returns_all_higher_order_odd
    to document the new full-budget short-circuit.

Result: 63 passed, 2 xfailed, 1 xpassed (no regressions; same 2 xfails
as before -- TreeExplainer fallback crash and n=8 dense SOUM
convergence).
  Cast the parity matrix to float before applying the Fourier sign transform.
  This prevents uint8 underflow where -1 became 255 and restores full-budget
  consistency against ExactComputer. Also removes obsolete xfail markers for
  the fallback and full-budget convergence tests.
  Correct the OddSHAP candidate interaction budget to follow the paper’s ceil(m / eta) rule and add coverage for the
  regression threshold boundary.

  Clean up OddSHAP implementation style and integration details, including stale comments, docstrings, unused
  compatibility kwargs, optional LightGBM import handling, and public approximator export ordering.
…r into oddshap.py, add method to seperate sampling weights from kernel weights
Sara's latest commits on oddshap_approximator implement Max's feedback:

  - removes runtime_last_approximate_run measurement
  - replaces the low-budget fallback with an explicit ValueError
  - default interaction_detection switches from ProxySPEX to ProxySHAP
  - sampling weights are now uniform over non-boundary sizes
    (paper's 1/((n-1)C(n-2,k-1)) formula moved to the new
    _init_regression_kernel_weights_static and is used as the LSQ
    kernel weight, equivalent to KernelSHAP weights up to a global scale)

Test updates:

  * test_init_defaults: drop runtime_last_approximate_run assertion;
    expect interaction_detection == 'ProxySHAP'.

  * test_sampling_weights_match_paper_formula renamed into
    test_sampling_weights_uniform_over_non_boundary_sizes (new
    behaviour) plus a separate
    test_regression_kernel_weights_match_paper_formula that pins the
    paper formula on the LSQ kernel where it actually lives.

  * test_high_budget_takes_regression_path / test_low_budget_takes_fallback_path
    removed (runtime tracking gone, fallback path gone) and replaced by
    test_low_budget_raises_value_error.

  * Sara's added test_boundary_budget_takes_regression_path_... was
    still asserting on the deleted runtime dict — kept the n_candidate
    assertion, dropped the runtime check.

Result: 59 passed, 0 failed, 0 xfailed in 1.2 s. All adapter tests (14)
still pass. The convergence test that was previously xfailed on n=8
now passes cleanly thanks to Sara's Fourier sign fix.
@Sara-ne Sara-ne marked this pull request as ready for review May 20, 2026 18:06
@mmschlk mmschlk marked this pull request as draft May 27, 2026 06:50
Sara-ne and others added 4 commits May 27, 2026 11:38
MSRBiased was listed in SV_APPROXIMATORS and __all__ but never defined or
imported (leftover from a main merge), so importing shapiq.approximator raised
NameError. Removing the two dangling references restores the import.
@42logos

42logos commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Reproduction & benchmark for OddSHAP (Task 3 + Task 4)

The paper reproduction and the cross-method benchmark for this approximator live on the companion branch wu/oddshap-repro, commit 317b77a8 (branched off oddshap_approximator). It adds:

  • examples/approximators/plot_oddshap_paper_reproduction.py — a gallery example reproducing the paper's Table 1/3 (Fumagalli et al., arXiv:2602.01399) on all six tabular value functions (Cancer, Estate, CG60, IL60, NHANES, Crime; d = 15..101).
  • notebooks/oddshap_reproduction_and_benchmark.ipynb — an executed notebook covering Task 3 (paper reproduction) and Task 4 (cross-method benchmark on synthetic SOUM games), driving benchmark/performance.py.
  • benchmark/ — the conformance/benchmark suite (run_sweep over the SV approximators against ExactComputer ground truth).

Key points:

  • No external shap dependency. The exact ground truth uses shapiq's own InterventionalTreeExplainer(index="SV", max_order=1). It was cross-checked against the reference SHAP interventional TreeExplainer and agrees to within 1e-8 per feature on every value function, including the missing-value-heavy NHANES (using native-NaN handling, no imputation).
  • Accuracy (Task 3). Over N = 30 local explanations per value function at a budget of ~100·d, OddSHAP is rank-1 on all 6 value functions (average rank 1.00) against MSR, SVARM, PermutationSampling, KernelSHAP, k-additive SHAP, and RegressionMSR. The four real-data value functions match the paper's reported error scale; the two synthetic ones (CG60/IL60) use shapiq's own generators, so absolute MSE differs while the ranking is unchanged.
  • Benchmark (Task 4). On SOUM games (n = 10, 3 seeds) with closed-form exact Shapley values, OddSHAP's MSE collapses to numerical zero at ~1/4 of the exact budget, ahead of all sampling baselines.

A 7-page external report with the full configuration, Table 1 (median + IQR), and all figures accompanies this (shared by email).

42logos and others added 4 commits June 10, 2026 04:02
fix(approximator): remove stale MSRBiased reference that breaks import
The interaction screening selected the candidate odd support by Shapley
Interaction Index magnitude (InterventionalTreeExplainer, index=SII), while
the downstream regression is solved in the Fourier basis. The paper
(Algorithm 1 / 'Controlling Higher-Order Terms') specifies extracting the
odd-sized Fourier interactions with the highest magnitudes from the fitted
GBT, following ProxySPEX.

Screening now converts the LightGBM surrogate via convert_tree_model and
reuses ProxySPEX's exact GBT-to-Fourier extraction (_sklearn_to_fourier)
instead of re-implementing it, so the support is selected in the same basis
the odd regression is solved in. The SII route also enumerated interactions
up to order n (combinatorial cost the Fourier extraction avoids).

Also removes parameters that were never read (regression_basis,
interaction_detection, proxy_max_order) and the now-unused arguments of
_select_odd_interactions. N=30 reproduction: OddSHAP stays rank-1 on all six
tabular value functions with uniformly lower median MSE (-17% to -34%).
fix(oddshap): restore the paper's Fourier-coefficient screening (reuse ProxySPEX extractor)
@Sara-ne Sara-ne marked this pull request as ready for review June 10, 2026 08:08
@Sara-ne

Sara-ne commented Jun 10, 2026

Copy link
Copy Markdown
Author

Hi @mmschlk this PR is ready for review now

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new OddSHAP regression-based approximator to estimate first-order Shapley values using odd-only Fourier regression with a LightGBM surrogate, and wires it into the public shapiq.approximator exports. It also adds a dedicated unit test suite and cites the corresponding 2026 paper in the documentation references.

Changes:

  • Added OddSHAP implementation (src/shapiq/approximator/regression/oddshap.py) including sampling weights, regression kernel weights, support construction, constrained regression, and Shapley transformation.
  • Added comprehensive unit tests for OddSHAP behavior and invariants.
  • Exposed OddSHAP via regression/approximator __init__.py and added the paper citation to references.bib.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
tests/shapiq/tests_unit/tests_approximators/test_approximator_oddshap.py Adds unit tests covering initialization, weighting, constraints, determinism, and budget handling.
src/shapiq/approximator/regression/oddshap.py Implements the OddSHAP approximator (sampling, interaction screening, constrained regression, SV transform).
src/shapiq/approximator/regression/init.py Re-exports OddSHAP from the regression approximators package.
src/shapiq/approximator/init.py Exposes OddSHAP (and updates SV approximator registry exports).
docs/source/references.bib Adds BibTeX entry for the OddSHAP paper (Fumagalli et al., 2026).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/shapiq/approximator/regression/oddshap.py
Comment thread src/shapiq/approximator/regression/oddshap.py
Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread docs/source/references.bib
@codecov

codecov Bot commented Jun 10, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.69072% with 20 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
src/shapiq/approximator/regression/oddshap.py 89.63% 20 Missing ⚠️

📢 Thoughts on this report? Let us know!

- Validate the budget before sampling/evaluating the game so an invalid
  budget fails fast without paying for game evaluations.
- Restore the full-budget semantics: when the sampler enumerates all 2**n
  coalitions, the candidate odd support is no longer truncated to
  ceil(budget/eta).
- InteractionValues now carries index=self.approximation_index and
  target_index=self.index instead of hard-coded strings.
- Rewrite the stale test-module docstring (it still described the removed
  low-budget TreeExplainer fallback and a non-existent xfail marker).
- Mark the OddSHAP test module skip_if_no_lightgbm, matching the other
  LightGBM-dependent suites (lightgbm is an optional extra).
- Reindent the new references.bib entry with spaces (2-space style of the
  surrounding entries) instead of tabs.
42logos and others added 4 commits June 10, 2026 11:08
…test

- Mark the five defensive, structurally unreachable raises with
  'pragma: no cover' (empty/grand coalition presence, unbuilt support,
  empty constraint system, coefficient-shape mismatch), following the
  existing convention in approximator/base.py and regression/base.py.
- Add real tests for the previously uncovered reachable branches:
  degenerate-n weight initializers (ValueError), the tree_params branch of
  _fit_surrogate_model, _build_support(None), and
  _build_weighted_system(drop_boundary_rows=False).
- Replace the vacuous full-budget screening test (it asserted nothing about
  truncation) with an end-to-end check that approximate() passes the
  untruncated candidate count at budget=2**n, plus a direct check that a
  large candidate budget returns the full odd support.
- references.bib: align the '=' column of the new entry with the file's
  longest-key+1 convention (was only tab->space converted).
fix(oddshap): address all six Copilot review comments on PR mmschlk#522
…th tests

- keep only intent/reason comments; drop redundant step-by-step narration
- document in approximate() that raising below n*eta is a deliberate divergence
  from Algorithm 1's TreeSHAP fallback (an under-budgeted call never silently
  returns a different estimator's values)
- apply ruff format to the test module
- replace the pragma-excluded defensive raises with real guard tests; oddshap.py
  now reaches 100% line coverage without any `# pragma: no cover`
- minor: parenthesize the estimated flag, rename an unused unpacked variable,
  and assert the screening step actually returns interactions
… default

- raise ValueError for interaction_factor < 1 (guards the later
  ceil(budget/eta) division)
- keep the paper's max_depth=10 surrogate when tree_params omits it
  (previously fell back to LightGBM's unlimited depth)
- drop the unrelated ProxySHAP entry from SV_APPROXIMATORS (belongs in
  a separate change)
- note in the class docstring that the budget ValueError forgoes the paper's
  low-budget high-dimension regime of Figure 2
- regression tests for the eta guard and the tree_params depth default

The **kwargs in approximate()/__init__ stay: approximate's signature implements
the abstract Approximator.approximate(self, budget, game, **kwargs) contract.
@42logos 42logos force-pushed the oddshap_approximator branch from d5f0824 to bdf352b Compare June 10, 2026 16:36
42logos and others added 5 commits June 10, 2026 19:22
…vel export

- accept budget >= 2**n even when it is below n * interaction_factor
  (previously small-n full enumeration was rejected with an unsatisfiable
  minimum: OddSHAP(n=4).approximate(16, game) demanded 40 evaluations)
- merge the surrogate-construction branches; tree_params entries now override
  the shared kwargs instead of raising TypeError on duplicates (random_state,
  n_jobs, verbose)
- re-export OddSHAP at the package top level like its sibling approximators
- align the interaction_factor error message with the actual check (>= 1)
- reuse the base class's interaction lookup instead of rebuilding it; compute
  the kernel weights once at construction
- regression tests for each fix (80 tests, 100% line and branch coverage)
…tor host

- the missing-LightGBM ImportError now points to the optional extra
  (pip install 'shapiq[proxy]'), matching the other LightGBM-backed approximators
- the ProxySPEX instance hosting the Fourier extractor is constructed with
  max_order=1, avoiding an unused order-2 interaction lookup

@Advueu963 Advueu963 left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall a very nice implementation! I would work on some redudant code regarding guarding against odd interactions. The way I see it , ODDSHAP is designed to work/exxtract only odd components, so I would not leave the options to also get non-odd options. The same holds for activating/deactivating the pariing trick. Also I would like to improve the robustness of the implementation by using the same tricks done in ProxySHAP and the implementations in _models.py. Also in that move it might be nice to move OddSHAP to the proxy module, as it is basically using a proxy model to extract the fourier interactions.

Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment on lines +42 to +47
Note:
Where Algorithm 1 of the paper falls back to TreeSHAP for budgets below
``n * interaction_factor``, this implementation raises ``ValueError`` instead
(no silent downgrade to another estimator), unless the budget already covers
the full coalition space (``budget >= 2**n``). It therefore does not reproduce
the low-budget, high-dimension regime of the paper's Figure 2.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is that the case? You can use the implemented InterventionalTreeExplainer to extract the true values of the tree fitted on the binary data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — re-reading the paper, Algorithm 1 (p. 6) does prescribe if m < d*eta: return phi via TreeSHAP on the fitted GBT, and the reference implementation uses InterventionalTreeExplainer for exactly this.

However, @mmschlk previously gave us the guidance that OddSHAP should not silently fall back to a different estimator when the budget is insufficient — raise an error instead, so the user knows exactly what happened. That's why the current implementation raises ValueError here and the docstring documents it as a deliberate deviation from Algorithm 1.

We're happy to implement the paper's TreeSHAP-on-surrogate fallback if that's the agreed direction — just want to make sure this aligns with Max's earlier guidance first. The two positions are:

  • Paper: low-budget → run TreeSHAP on the already-fitted GBT surrogate, return those Shapley values
  • Current: low-budget → raise ValueError, user must either increase the budget or explicitly use another estimator

@mmschlk could you weigh in on which you'd prefer for the library?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @42logos here. This was also how I understood the situation.

I had actually implemented the fallback before, using the existing InterventionalTreeExplainer, but then changed it to raise a ValueError based on the earlier feedback that OddSHAP shouldn't silently switch to another estimator when the budget is too low.

So the current behavior is intentional, but it is a deliberate deviation from Algorithm 1 of the paper.

If the preferred direction is to follow the paper more closely, we can switch back to the older implementation for the low-budget case.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, I see. I was not aware of this internal discussion. ^^ Okay, then I'd say we leave the decision to @mmschlk. But we should be aware that reproducing the paper is quite difficult with this decision, as the approximator will error when the budget is not sufficient. Therefore, I would argue that it would be better to incorporate the fallback, as it enables 1:1 replication of the original paper's experiments. But of course you guys can decide otherwise :D.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh yes. Our reproduction actually does cover the full budget range including below n * eta — the budget curve script (plot_oddshap_budget_curves.py (https://github.com/FabianK-Dev/shapiq/blob/wu/oddshap-repro/examples/approximators/plot_oddshap_budget_curves.py), (https://github.com/FabianK-Dev/shapiq/blob/wu/oddshap-repro/notebooks/oddshap_reproduction_and_benchmark.ipynb)) sweeps m from d + 1 to min(2^d, 20000). When OddSHAP hits the budget threshold it raises ValueError, which the script catches and simply skips that data point — so the OddSHAP curve starts at n * eta while the baseline curves extend to lower budgets, which matches the paper's Figure 2 behavior. The executed reproduction notebook is here (https://github.com/FabianK-Dev/shapiq/blob/wu/oddshap-repro/notebooks/oddshap_reproduction_and_benchmark.ipynb) and the cluster results are in notebooks/cluster_results/ (https://github.com/FabianK-Dev/shapiq/tree/wu/oddshap-repro/notebooks/cluster_results).

Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment on lines +167 to +172
msg = (
"The budget is too small for OddSHAP. "
f"Received budget={budget}, but at least {minimum_budget} evaluations are required. "
"Please increase the budget."
)
raise ValueError(msg)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here you should then integrate the InterventionalTreeExplainer.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the same point as comment #2 — whether to implement Algorithm 1's if m < d*eta: return phi via TreeSHAP branch using InterventionalTreeExplainer(index="SV") on the surrogate.

Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread src/shapiq/approximator/regression/oddshap.py Outdated
Comment thread tests/shapiq/tests_unit/tests_approximators/test_approximator_oddshap.py Outdated
42logos and others added 3 commits June 14, 2026 01:48
…te count

Responds to 13 inline review comments from @Advueu963 on PR mmschlk#522:

- Remove pairing_trick and odd_only params (paired sampling is constitutive
  to OddSHAP; odd-only is the algorithm's definition — paper Sec 3.1/3.2)
- Vectorize _transform_to_shapley: loop -> masks.T @ (coeffs / sizes)
- Use index=self.index directly instead of approximation_index + target_index
  (OddSHAP computes SV directly, not via SII aggregation)
- Fall back to DecisionTreeRegressor with UserWarning when LightGBM is not
  installed, following the ProxySHAP/ProxySPEX resolution pattern; dropped
  tree_params are reported in the warning (only user-supplied keys, not
  internal defaults); dt_keys derived dynamically from DTR.get_params()
- Guard convert_tree_model result with isinstance list check (sklearn
  single-tree handler returns bare TreeModel despite list[TreeModel] annotation)
- Remove module-level skip_if_no_lightgbm; 3 LightGBM-specific param tests
  keep individual @skip_if_no_lightgbm decorators
- Drop redundant coalitions.astype(float) and design_matrix.astype(float)

Additionally fixes a paper deviation NOT covered by the review:

- n_candidate_interactions was ceil(m/eta), should be ceil(m/eta) - n per
  Algorithm 1 p.6 ("the total number of regression variables strictly scales
  with the available sampling budget m") and the reference implementation
  (FFmgll/oddshap). At minimum budget the old formula doubled the regression
  unknowns (2n instead of n singletons only).
The Fourier extraction (_sklearn_tree_to_fourier) is a pure function of
a TreeModel — it never uses any ProxySPEX approximator state. OddSHAP
previously instantiated a full ProxySPEX (with CoalitionSampler, RNG,
interaction lookup) just to call this private method.

Inline the extraction as module-level _tree_to_fourier + _ensemble_to_fourier,
removing the ProxySPEX import and the throwaway instantiation overhead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

4 participants