Add PanelRegression experiment class for fixed effects estimation#628
Add PanelRegression experiment class for fixed effects estimation#628drbenvincent merged 38 commits intomainfrom
PanelRegression experiment class for fixed effects estimation#628Conversation
drbenvincent
left a comment
There was a problem hiding this comment.
- The
summarymethod gives just point estimates but not the HDI's - The
plot_coefficientsdoes not include error bars for the HDI's
Fixed in commit 4b67e10. Changes made:
The horizontal lines in the forest plot represent the HDI intervals, with the dot showing the posterior mean. |
|
bugbot review |
PR SummaryAdds a new
Written by Cursor Bugbot for commit 4b67e10. This will update automatically on new commits. Configure here. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #628 +/- ##
==========================================
+ Coverage 93.44% 93.61% +0.16%
==========================================
Files 74 76 +2
Lines 11199 11747 +548
Branches 657 714 +57
==========================================
+ Hits 10465 10997 +532
- Misses 544 545 +1
- Partials 190 205 +15 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
PanelRegression experiment class for fixed effects estimation
|
@drbenvincent do you wanna fix the conflicts or can bugbot do it? |
|
I'm in the process of resolving conflicts for my open pr's. Will get to this one soon :) |
|
TODO: check the changes to codespell |
Code Review: 9 issues found and fixedCommit: 83cc395 Critical Bugs Fixed1. The sklearn 2. Boolean treatment columns were silently NOT demeaned by
Moderate Issues Fixed3. Missing All other experiment classes implement this abstract method from 4. Two-way within transformation only correct for balanced panels The sequential single-pass demeaning (first by unit, then by time) is algebraically equivalent to the standard two-way within transformation only for balanced panels. For unbalanced panels, iterative alternating demeaning is needed. Added documentation in both the class docstring and the method docstring explaining this limitation. 5. When unit demeaning was applied first and then time demeaning, Minor Issues Fixed6. 7. 8. 9. Tests Added7 new tests covering the fixed functionality. All 19 tests pass. All pre-commit checks pass. |
Follow-up: Refactor
|
Convention alignment with other experiment classesAddressed non-conformances found by comparing Changes
All 19 tests pass and all pre-commit hooks are clean. |
Fix codecov/patch failure — 11 uncovered statements in
|
| Test | Lines Covered | What it tests |
|---|---|---|
test_plot_unit_effects_ols |
664-671 (5 stmts) | OLS branch of plot_unit_effects() |
test_plot_residuals_ols |
867 | OLS branch of plot_residuals() |
test_plot_trajectories_all_units |
738 | n_sample >= n_units branch (all units shown) |
test_plot_trajectories_single_unit |
766 | Single-unit subplot edge case |
test_get_plot_data_bayesian_raises_on_ols |
539 | ValueError guard when called with OLS model |
test_get_plot_data_ols_raises_on_pymc |
567 | ValueError guard when called with PyMC model |
test_plot_unit_effects_no_fe_labels |
641 | ValueError when no C(unit) terms in formula |
Result: panel_regression.py statement coverage went from 92% (11 missing) → 97% (0 missing). The remaining 3% is partial branch coverage (11 partial branches), which does not affect the codecov/patch line-coverage check.
Reorganized code cells for clarity, added cell metadata to hide input/output in some cells, and improved section headings for better structure. Split code and output for the time-varying confounder example, and updated example numbering for consistency.
The QQ plot in PanelRegression now uses consistent colors for markers and lines to match other plots. The panel_fixed_effects notebook was reorganized and clarified, with improved explanations of panel data confounders, fixed effects, and identification assumptions, as well as updated code and output for data simulation.
Critical fixes: - Fix summary() printing wrong OLS coefficients for fe_method='dummies' (positional zip mismatch with filtered labels) - Fix boolean treatment columns silently skipped by _within_transform (select_dtypes excludes bool; now includes bool and casts to float) Moderate fixes: - Add effect_summary() stub with helpful NotImplementedError message - Document balanced-panel limitation for two-way within transformation - Fix _group_means to store means from original data, not demeaned data Minor fixes: - Fix summary() header to not say "excluding FE dummies" for within method - Implement plot_coefficients(var_names=...) parameter (was ignored) - Implement plot_trajectories select='extreme' and 'high_variance' strategies - Clarify treated_units coordinate placeholder in y DataArray Also adds 7 new tests covering the fixed functionality. Co-authored-by: Cursor <cursoragent@cursor.com>
Break up the monolithic __init__ into the canonical pipeline used by all
other experiment classes on main:
self.input_validation()
self._build_design_matrices()
self._prepare_data()
self.algorithm()
- Rename _validate_inputs() -> input_validation()
- Extract _build_design_matrices() (includes within transform + patsy)
- Extract _prepare_data() (numpy -> xarray conversion)
- Extract algorithm() (model fitting)
Co-authored-by: Cursor <cursoragent@cursor.com>
- Move expt_type from class attribute to instance attribute in __init__ - Set data.index.name on original data before assignment (not on a copy) - Use standard "unit_0" label for treated_units coordinate - Pass xarray directly to sklearn fit() instead of .values/.ravel() - Use get_coeffs() instead of direct coef_ access (handles 2D arrays) - Squeeze predict() output where 1D arrays are needed Co-authored-by: Cursor <cursoragent@cursor.com>
Fixes codecov/patch failure by covering the 11 previously uncovered statements: OLS branches for plot_unit_effects and plot_residuals, edge cases in plot_trajectories (all-units and single-unit), and defensive ValueError guards in get_plot_data_bayesian, get_plot_data_ols, and plot_unit_effects. Co-authored-by: Cursor <cursoragent@cursor.com>
- Define balanced vs unbalanced panels with brief examples - Note that unbalanced panels are common in practice - Suggest sensitivity checks or iterative FE packages for heavily unbalanced data Co-authored-by: Cursor <cursoragent@cursor.com>
…y demeaned formula
- Fix citation: goodmanbacon2021difference → goodman2021difference (matches references.bib)
- One-way demeaning: use \bar{y}_{i·} and \bar{u}_{i·} for clarity
- Add two-way FE demeaned formula in Fixed Effects Toolbox section (balanced panels)
Co-authored-by: Cursor <cursoragent@cursor.com>
- Intro: add paragraph on two ways to implement FE (dummies vs demeaned) - Toolbox: explain demeaned transformation conceptually before formulas - Rename subsection to 'Same estimate via demeaned transformation' and tie to Toolbox Co-authored-by: Cursor <cursoragent@cursor.com>
Keep PanelRegression and PiecewiseITS exports while applying ruff import ordering after conflict resolution so pre-commit remains green. Made-with: Cursor
c233263 to
c2b055f
Compare
Made-with: Cursor # Conflicts: # causalpy/__init__.py
Made-with: Cursor
Closes #627
Implementation Plan for PanelRegression - COMPLETE ✅
Phase 1: Core Implementation ✅
Phase 2: Specialized Plotting Methods ✅
Phase 3: Testing ✅
Phase 4: Documentation ✅
Phase 5: Final Integration ✅
Phase 6: Address Review Feedback ✅
Original prompt
This section details on the original issue you should resolve
<issue_title>Feature: Panel Fixed Effects (
PanelRegressionexperiment class)</issue_title><issue_description>## Summary
Add a
PanelRegressionexperiment wrapper that enables panel-aware visualization and diagnostics, with support for both dummy variable and within-transformation approaches to fixed effects.Motivation
Panel data methods are foundational in applied econometrics. Chapter 8 of Causal Inference: The Mixtape covers fixed effects estimation, which is a workhorse for causal inference when there are unobserved time-invariant confounders.
The Mixtape code repository contains Python and R implementations of these methods.
Mixtape Coverage
sasp.pybail.pySee also the R implementations:
sasp.Randbail_1.R.Why Panel FE Matters
Current State
Panel fixed effects already works with
LinearRegressionusing patsy formula syntax:What's missing is a dedicated experiment class that provides panel-aware visualization, diagnostics, and efficient handling of large panels.
Proposed API
Core Parameters
Two Approaches: Dummies vs Within
1. Dummy Variables (
fe_method="dummies")User includes
C(unit)in the formula explicitly:Pros:
Cons:
2. Within Transformation (
fe_method="within")User does NOT include
C(unit)— the experiment class demeans the data:Pros:
Cons:
Design Matrix Comparison
dummiesy ~ C(unit) + Xwithiny ~ X(on demeaned data)For 10,000 units with 5 covariates:
Implementation
Main Class