Skip to content

Avoid mutating sklearn estimators in DiD experiments#693

Open
drbenvincent wants to merge 5 commits intomainfrom
codex/issue-664-fit-intercept
Open

Avoid mutating sklearn estimators in DiD experiments#693
drbenvincent wants to merge 5 commits intomainfrom
codex/issue-664-fit-intercept

Conversation

@drbenvincent
Copy link
Copy Markdown
Collaborator

@drbenvincent drbenvincent commented Feb 2, 2026

Summary

  • Avoid mutating user-supplied sklearn estimators by cloning when fit_intercept=False is required
  • Emit a UserWarning when the clone occurs so users are aware and can fix their code
  • Centralize fit_intercept handling in BaseExperiment._ensure_sklearn_fit_intercept_false()
  • Add integration coverage to assert original estimators are unchanged and the warning fires
  • Fix codecov/patch failure by ensuring full branch coverage of the new method

Details

DiD experiments require fit_intercept=False because the design matrix already contains an explicit intercept column (~ 1 + ...). Previously, the experiment constructor mutated the user's model in-place (model.fit_intercept = False), which was a hidden side effect. Now the method clones the estimator via sklearn.base.clone(), sets the parameter on the copy, and issues a warning explaining why.

Coverage fix

The _ensure_sklearn_fit_intercept_false() method in base.py has a defensive isinstance guard that returns early for non-sklearn models (line 70). Originally, the method was called from inside elif isinstance(self.model, RegressorMixin): blocks in both diff_in_diff.py and staggered_did.py, which made the guard unreachable — the caller had already confirmed the model was sklearn, so the isinstance check could never be True. This caused the codecov/patch check to fail.

The fix moves the _ensure_sklearn_fit_intercept_false() call to before the model-type dispatch (the if PyMCModel / elif RegressorMixin block) in both experiment classes. Since the method already has a built-in guard for non-sklearn models, calling it unconditionally is safe and semantically cleaner. Now any PyMC integration test naturally exercises the early-return path, providing full branch coverage without artificial unit tests.

A new integration test test_did_sklearn_fit_intercept_false was also added to cover the "no warning needed" path (when the model already has fit_intercept=False).

Testing

MPLCONFIGDIR=/tmp/mplconfig XDG_CACHE_HOME=/tmp conda run -n CausalPy python -m pytest -o addopts= causalpy/tests/test_integration_skl_examples.py::test_did causalpy/tests/test_integration_skl_examples.py::test_did_sklearn_fit_intercept_false causalpy/tests/test_staggered_did.py::test_staggered_did_sklearn causalpy/tests/test_staggered_did.py::test_staggered_did_sklearn_model_without_fit_intercept

Issues

@drbenvincent drbenvincent added the bug Something isn't working label Feb 2, 2026
@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community bot commented Feb 2, 2026

@drbenvincent drbenvincent marked this pull request as draft February 2, 2026 21:29
@drbenvincent drbenvincent marked this pull request as ready for review February 9, 2026 13:06
@codecov
Copy link
Copy Markdown

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.34%. Comparing base (4769e80) to head (5945c82).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #693      +/-   ##
==========================================
+ Coverage   94.32%   94.34%   +0.02%     
==========================================
  Files          78       78              
  Lines       12159    12186      +27     
  Branches      713      713              
==========================================
+ Hits        11469    11497      +28     
  Misses        497      497              
+ Partials      193      192       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

drbenvincent and others added 5 commits April 15, 2026 19:51
…tion tests

Move the _ensure_sklearn_fit_intercept_false() call from inside the
`elif isinstance(self.model, RegressorMixin)` block to before the
model-type dispatch in both DiD and StaggeredDiD. This makes the
isinstance early-return guard (line 70 in base.py) naturally reachable
by existing PyMC integration tests, fixing the codecov/patch failure.

Also add test_did_sklearn_fit_intercept_false integration test covering
the "no warning needed" path when fit_intercept is already False.

Co-authored-by: Cursor <cursoragent@cursor.com>
Include the ruff-format whitespace fix so the PR's prek check passes remotely.

Made-with: Cursor
Use the intended warning-and-clone flow to keep diff coverage above the Codecov patch threshold.

Made-with: Cursor
@drbenvincent drbenvincent force-pushed the codex/issue-664-fit-intercept branch from 97042ed to 5945c82 Compare April 15, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DifferenceInDifferences mutates fit_intercept on user models

1 participant