Avoid mutating sklearn estimators in DiD experiments#693
Open
drbenvincent wants to merge 5 commits intomainfrom
Open
Avoid mutating sklearn estimators in DiD experiments#693drbenvincent wants to merge 5 commits intomainfrom
drbenvincent wants to merge 5 commits intomainfrom
Conversation
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #693 +/- ##
==========================================
+ Coverage 94.32% 94.34% +0.02%
==========================================
Files 78 78
Lines 12159 12186 +27
Branches 713 713
==========================================
+ Hits 11469 11497 +28
Misses 497 497
+ Partials 193 192 -1 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
83e476a to
d6a9a00
Compare
…tion tests Move the _ensure_sklearn_fit_intercept_false() call from inside the `elif isinstance(self.model, RegressorMixin)` block to before the model-type dispatch in both DiD and StaggeredDiD. This makes the isinstance early-return guard (line 70 in base.py) naturally reachable by existing PyMC integration tests, fixing the codecov/patch failure. Also add test_did_sklearn_fit_intercept_false integration test covering the "no warning needed" path when fit_intercept is already False. Co-authored-by: Cursor <cursoragent@cursor.com>
Include the ruff-format whitespace fix so the PR's prek check passes remotely. Made-with: Cursor
Use the intended warning-and-clone flow to keep diff coverage above the Codecov patch threshold. Made-with: Cursor
97042ed to
5945c82
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
fit_intercept=Falseis requiredUserWarningwhen the clone occurs so users are aware and can fix their codefit_intercepthandling inBaseExperiment._ensure_sklearn_fit_intercept_false()codecov/patchfailure by ensuring full branch coverage of the new methodDetails
DiD experiments require
fit_intercept=Falsebecause the design matrix already contains an explicit intercept column (~ 1 + ...). Previously, the experiment constructor mutated the user's model in-place (model.fit_intercept = False), which was a hidden side effect. Now the method clones the estimator viasklearn.base.clone(), sets the parameter on the copy, and issues a warning explaining why.Coverage fix
The
_ensure_sklearn_fit_intercept_false()method inbase.pyhas a defensive isinstance guard that returns early for non-sklearn models (line 70). Originally, the method was called from insideelif isinstance(self.model, RegressorMixin):blocks in bothdiff_in_diff.pyandstaggered_did.py, which made the guard unreachable — the caller had already confirmed the model was sklearn, so the isinstance check could never beTrue. This caused thecodecov/patchcheck to fail.The fix moves the
_ensure_sklearn_fit_intercept_false()call to before the model-type dispatch (theif PyMCModel / elif RegressorMixinblock) in both experiment classes. Since the method already has a built-in guard for non-sklearn models, calling it unconditionally is safe and semantically cleaner. Now any PyMC integration test naturally exercises the early-return path, providing full branch coverage without artificial unit tests.A new integration test
test_did_sklearn_fit_intercept_falsewas also added to cover the "no warning needed" path (when the model already hasfit_intercept=False).Testing
Issues