dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest by igerber · Pull Request #401 · igerber/diff-diff

igerber · 2026-05-09T14:26:17Z

Summary

ChaisemartinDHaultfoeuille.by_path now accepts non-binary integer treatment (D in Z, e.g. ordinal {0, 1, 2}). Path tuples become integer-state tuples like (0, 2, 2, 2). Continuous D (e.g. 1.5) raises ValueError at fit-time per the no-silent-failures contract.
New paths_of_interest kwarg for user-specified path subsets, alternative to by_path=k's top-k automatic ranking. Mutually exclusive with by_path; setting both raises ValueError. Composes with non-binary D and all downstream by_path surfaces. Python-only API (R has no list-based selection).
Gate at chaisemartin_dhaultfoeuille.py:1870 replaced by D-integer validation; the :1118 preconditions gate and 11 self.by_path is not None activation branches in fit() rerouted to fire under either selector. The 6 path-enumeration helper signatures gain a paths_of_interest parameter; 5 forward to _enumerate_treatment_paths.

Methodology references (required if estimator / math changes)

Method name(s): de Chaisemartin & D'Haultfœuille reversible-treatment event study + per-path disaggregation
Paper / source link(s): Same paper anchor as PR dCDH: add by_path per-path event-study disaggregation #357 (foundation); R reference DIDmultiplegtDYN 2.3.3 (per-path dispatcher in did_multiplegt_dyn, path-string construction in did_multiplegt_by_path)
Any intentional deviations from the source (and why):
- Continuous-D rejection: R's path-string encoding silently produces fractional path tuples for non-integer D; we raise to enforce integer-coded discrete treatment per feedback_no_silent_failures. Documented in REGISTRY.md.
- R substr(path, 1, 1) baseline-derivation bug for D >= 10: R's helper did_multiplegt_by_path captures only the first character of the comma-separated path string, mis-allocating R's per-path control-pool subset for multi-digit D values. Python's tuple-key matching is correct; the parity scenario stays in D in {0, 1, 2} to avoid the R bug. Documented in REGISTRY.md and CHANGELOG.md.
- paths_of_interest is Python-only: R's by_path accepts only positive int (top-k) or -1 (all). No list-based selection in R; no R-parity test for this kwarg.

Validation

Tests added/updated:
- tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary (10 tests, 3 slow)
- tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest (27 tests, 7 slow)
- tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary (1 parity test)
- Removed stale test_forbids_non_binary_treatment gate-firing test
Backtest / simulation / notebook evidence: R parity scenario multi_path_reversible_by_path_non_binary (Scenario 19) added to benchmarks/R/generate_dcdh_dynr_test_values.R and regenerated benchmarks/data/dcdh_dynr_golden_values.json. Per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), SE within ~5% rtol envelope. 282 dCDH non-slow tests pass; 37 new test classes (slow + non-slow) green.

Security / privacy

Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

github-actions · 2026-05-09T14:33:35Z

Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology review did not find an unmitigated estimator/math/variance defect. The main methodology-sensitive deviations here are documented in docs/methodology/REGISTRY.md.
P1: the public convenience helper chaisemartin_dhaultfoeuille(...) does not propagate the new paths_of_interest constructor kwarg, so that new API is broken on a public entry point.
P2: under paths_of_interest, the code writes user-selection order into frequency_rank, so returned metadata no longer means “frequency rank” even though the field name and downstream consumers still imply that.
P3: primary docs/source docstrings/results docs are still partly by_path-only / binary-only, and there is no convenience-wrapper regression test for paths_of_interest.
I did not find a security or performance regression in the diff itself.

Methodology

Severity P3-informational. Impact: no unmitigated methodology mismatch found. The estimator-sensitive deviations I checked are documented: continuous-D rejection for path enumeration, Python-only paths_of_interest, and the R substr() bug for D >= 10. References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L2010-L2020, diff_diff/chaisemartin_dhaultfoeuille.py:L5612-L5741. Concrete fix: none.

Code Quality

Severity P1. Impact: paths_of_interest is broken on the exported one-shot helper because chaisemartin_dhaultfoeuille() still hard-codes the constructor kwarg allowlist without the new parameter, so paths_of_interest falls through to fit() even though fit() does not accept it. Any caller using the public helper will get a runtime error instead of the new behavior. References: diff_diff/chaisemartin_dhaultfoeuille.py:L8139-L8153, diff_diff/chaisemartin_dhaultfoeuille.py:L841-L860, diff_diff/__init__.py:L205-L208. Concrete fix: add paths_of_interest to the init-key split or, better, derive the split from inspect.signature(ChaisemartinDHaultfoeuille.__init__) so future constructor params cannot drift out of sync.

Performance

No findings.

Maintainability

Severity P2. Impact: under paths_of_interest, frequency_rank is assigned from user-specified order rather than observed-path frequency, but the field name and results schema still describe it as a frequency rank. That silently mislabels metadata for downstream consumers of results.path_effects and to_dataframe(level="by_path"). References: diff_diff/chaisemartin_dhaultfoeuille.py:L5845-L5935, diff_diff/chaisemartin_dhaultfoeuille_results.py:L394-L452, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1690-L1803. Concrete fix: keep user order in a separate selection_order/display_order field and preserve true frequency rank separately, or rename/document the field consistently everywhere it is surfaced.

Tech Debt

No findings. TODO.md does not already track the P1 wrapper regression, so it is not mitigated as deferred work.

Security

No findings.

Documentation/Tests

Severity P3. Impact: the new public behavior is only partially documented. The class summary still says per-path support requires binary treatment, the main API docs still describe only by_path=k, the results docs still say path surfaces populate only when by_path is set, and the REGISTRY checklist still says by_path is binary-only. There is also no convenience-wrapper regression covering paths_of_interest, which is why the P1 slipped through despite existing helper tests. References: diff_diff/chaisemartin_dhaultfoeuille.py:L367-L370, docs/api/chaisemartin_dhaultfoeuille.rst:L16-L22, diff_diff/chaisemartin_dhaultfoeuille_results.py:L394-L452, docs/methodology/REGISTRY.md:L671-L671, tests/test_chaisemartin_dhaultfoeuille.py:L204-L223. Concrete fix: update those docs to mention non-binary by_path and paths_of_interest, and add a helper-level regression that exercises chaisemartin_dhaultfoeuille(..., paths_of_interest=[...]).

Path to Approval

Fix paths_of_interest propagation in chaisemartin_dhaultfoeuille() at diff_diff/chaisemartin_dhaultfoeuille.py:L8139-L8153.
Add a regression next to tests/test_chaisemartin_dhaultfoeuille.py:L204-L223 that calls the convenience helper with paths_of_interest and checks it matches direct-class behavior.

P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()` hard-coded the constructor kwarg allowlist without `paths_of_interest`, so the new kwarg fell through to `fit()` and raised TypeError. Replace the static set with a signature-derived split via `inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future constructor params cannot drift out of sync. Add helper-level regression `test_convenience_function_routes_paths_of_interest_to_init` parallel to the existing `test_convenience_function_matches_class`. P2: under `paths_of_interest`, `frequency_rank` was assigned from `enumerate(selected_paths)`, which produces user-selection order rather than true frequency rank. Decouple the iteration order from the rank field: keep `selected_paths` iteration in user order (insertion order preserved on `path_effects.keys()`), but compute `frequency_rank` as the within-selected-paths rank by descending group count (lex tiebreak on the path tuple). Under `by_path=k`, `selected_paths` is already sorted by descending frequency so the two coincide; under `paths_of_interest`, frequency_rank now reflects true observed-count rank regardless of user order. Add regression `test_paths_of_interest_frequency_rank_is_true_frequency`. P3: docs/source-docstring binary-only language on per-path disaggregation cleaned up: - class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop "binary treatment", add `paths_of_interest` mention) - docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add `paths_of_interest` and explicit "binary or integer-coded discrete" language) - diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452 (`path_effects`, `path_placebo_event_study`, `path_cumulated_event_study`, `path_sup_t_bands` activator clauses now read "by_path is a positive int OR paths_of_interest is set") - docs/methodology/REGISTRY.md:671 checklist line (add "or integer-coded discrete" + `paths_of_interest`) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T14:43:52Z

/ai-review

github-actions · 2026-05-09T14:49:32Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 7df7dc258b4d8d587997cb89f99875bbf9344f28

Overall Assessment

✅ Looks good

Executive Summary

Previous P1 resolved: the public helper now routes paths_of_interest through __init__, with regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L225-L260.
Previous P2 resolved: frequency_rank is now computed from observed path frequency rather than user-list order under paths_of_interest, with regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L8651-L8675.
Methodology review did not find an unmitigated estimator/math/variance defect. The key deviations from R are documented in docs/methodology/REGISTRY.md:L641-L643 and surfaced in code at diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023.
Edge-case handling on the new paths looks adequate: empty selected-path sets return {} with warning, continuous non-integer D is rejected at fit time, and per-path inference remains safe_inference-based and NaN-consistent (diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L6256, diff_diff/chaisemartin_dhaultfoeuille.py:L3248-L3431).
Remaining issues are P3 only: a small amount of reporting/docs text still behaves as if by_path were the only selector, and reporting surfaces re-sort paths_of_interest outputs by frequency_rank.

Methodology

Severity P3-informational. Impact: no unmitigated methodology mismatch found. The affected method is the dCDH reversible-treatment event study with per-path disaggregation; the PR keeps the existing per-path IF/control-pool/variance construction and documents the intentional deviations (continuous-D rejection, Python-only paths_of_interest, and the R substr() bug for D >= 10). References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023, diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L6256. Concrete fix: none.

Code Quality

Severity P3-informational. Impact: the prior convenience-wrapper regression is fixed. chaisemartin_dhaultfoeuille() now derives init kwargs from the constructor signature, so paths_of_interest no longer falls through to fit() and error. References: diff_diff/chaisemartin_dhaultfoeuille.py:L8120-L8171, tests/test_chaisemartin_dhaultfoeuille.py:L225-L260. Concrete fix: none.

Performance

No findings.

Maintainability

Severity P3. Impact: results.path_effects preserves user-specified order, but summary() and to_dataframe(level="by_path") immediately re-sort by frequency_rank, so the new selector’s list order is not carried through all reporting surfaces. That does not change estimates, but it leaves the display contract ambiguous. References: diff_diff/chaisemartin_dhaultfoeuille.py:L628-L630, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1288-L1291, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1726-L1728. Concrete fix: either preserve paths_of_interest order across reporting surfaces or document explicitly that only results.path_effects preserves user order.

Tech Debt

No findings. TODO.md does not contain a pre-existing entry that changes the assessment for this diff.

Security

No findings.

Documentation/Tests

Severity P3. Impact: most of the prior docs gaps are resolved, but some remaining user-facing text still describes per-path outputs as by_path-only, which is now inaccurate under paths_of_interest. Examples include the result docs’ empty-state wording and the to_dataframe(level="by_path") guidance. References: diff_diff/chaisemartin_dhaultfoeuille_results.py:L447-L459, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1699-L1704, docs/methodology/REGISTRY.md:L643-L643. Concrete fix: update the remaining text to say “by_path or paths_of_interest”, and add a small regression around summary/to_dataframe(level="by_path") messaging for the new selector.
Note: this was a static review. I did not execute the added tests locally because the review environment is missing numpy.

R1 verdict was "Looks good". Two P3 housekeeping items addressed: P3 (maintainability): `summary()` and `to_dataframe(level="by_path")` re-sorted output by `frequency_rank`, dropping the user-specified order under `paths_of_interest`. Both reporting surfaces now iterate in `path_effects` insertion order so user-specified order is preserved end-to-end. Under `by_path=k`, insertion order matches descending frequency_rank so output is identical to the prior release. Adds regressions `test_paths_of_interest_order_preserved_in_to_dataframe` and `test_paths_of_interest_order_preserved_in_summary`. P3 (docs): remaining `by_path`-only language in result docs and REGISTRY: - `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty- state contract: "no bootstrap or `by_path is None`" -> "no bootstrap, or both `by_path` and `paths_of_interest` are `None`" - `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level= "by_path")` ValueError text: pass `paths_of_interest` as an alternative to `by_path=k` - `docs/methodology/REGISTRY.md:643` joint sup-t bands Note: empty-state contract updated for both selectors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T15:08:17Z

/ai-review

github-actions · 2026-05-09T15:14:28Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 1d8e16bef9da0bd8183f809e0cde418d7f212822

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior behavior regressions are addressed. The convenience helper now routes paths_of_interest through __init__ (diff_diff/chaisemartin_dhaultfoeuille.py:L8154-L8163), frequency_rank is computed from observed path counts rather than user-list order (diff_diff/chaisemartin_dhaultfoeuille.py:L5847-L5858), and summary() / to_dataframe(level="by_path") now preserve user-specified path order (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1289-L1293, L1730-L1734).
The affected methodology is the dCDH reversible-treatment event study’s per-path disaggregation, plus inherited per-path placebo, bootstrap, sup-t, and trends paths. I did not find an unmitigated estimator, identification, or variance/SE defect.
The PR’s substantive deviations from R are documented in the Methodology Registry, including continuous-D rejection, the R substr() bug for D >= 10, and the Python-only paths_of_interest API, so they are informational rather than blockers.
Remaining issue is P3 only: some reporting text still hard-codes by_path, so empty-state messaging can be inaccurate when the user actually requested paths_of_interest.
This was a static review. I could not run the targeted tests because pytest is not installed in the review environment.

Methodology

Severity P3-informational. Impact: the changed method is the dCDH reversible-treatment per-path event-study surface (including inherited per-path placebo/bootstrap/sup-t/trends behavior). I did not find an undocumented mismatch to the registry or an incorrect SE/inference change. The PR’s methodology deviations from R are explicitly documented: continuous non-integer D is rejected at fit time, the R baseline-derivation bug for D >= 10 is called out, and paths_of_interest is documented as a Python-only extension. References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L1170-L1224, L2012-L2023, L5614-L5754. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P3. Impact: the remaining reporting text still labels the surface as by_path even when results were requested via paths_of_interest, and the generic empty-state warning can misdescribe “none of the requested paths were retained” as if the panel had no observed complete-window paths at all. This does not change estimates, but it leaves the public selector contract slightly inconsistent. References: diff_diff/chaisemartin_dhaultfoeuille.py:L5809-L5819, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1273-L1285. Concrete fix: make the header/warnings selector-aware, e.g. “Treatment-Path Disaggregation” / “by_path or paths_of_interest”, and use a distinct empty-state message when paths_of_interest yields zero retained paths.

Tech Debt

No findings.

Security

No findings.

Documentation/Tests

No blocker-level findings. The added coverage spans wrapper routing, parameter validation, ordering semantics, non-binary paths, parity, and cross-feature interactions. Static review only; tests were not executed here because pytest is unavailable.

R2 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): selector-aware messaging on the empty-state path: 1. `_compute_path_effects` empty-selection warning now branches on `paths_of_interest is not None`. Under `paths_of_interest`, the message reads "every user-specified path was either unobserved ... per-path 'zero observed groups' UserWarnings already issued" so the user gets a clear pointer to the per-path warnings rather than a generic "no observed treatment path" message that misdescribes the cause. Under `by_path=k`, the original message is preserved. 2. `summary()` empty-state header changed from "Treatment-Path Disaggregation (by_path)" to "Treatment-Path Disaggregation"; the trailing parenthetical now says "by_path / paths_of_interest was a no-op on this panel". Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning` asserting the paths_of_interest-specific warning text fires when every requested path is unobserved. Also fixes 7 pre-existing F541 f-string-without-placeholder warnings in the warning blocks via `ruff --fix`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T15:30:42Z

/ai-review

github-actions · 2026-05-09T15:36:35Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 245f19498c5082b160ba070023bb420c0488ebeb

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior issues called out in the last AI pass are addressed. The convenience wrapper now routes paths_of_interest through __init__ (diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8181), frequency_rank is no longer tied to user-list order (diff_diff/chaisemartin_dhaultfoeuille.py:L5865-L5876), and summary() / to_dataframe(level="by_path") now preserve user-specified path order (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1290-L1294, L1731-L1735).
The affected methodology is the dCDH reversible-treatment per-path event-study surface, including per-path placebos, bootstrap/sup-t inheritance, and per-path linear-trend cumulation.
I did not find an unmitigated estimand, control-pool, or variance/SE defect in the new non-binary integer-D or paths_of_interest paths.
The substantive deviations from R are documented in the Methodology Registry, so they are informational rather than defects: continuous-D rejection, the R substr() baseline bug for D >= 10, and the Python-only paths_of_interest API (docs/methodology/REGISTRY.md:L641-L643).
Residual issue is P3 only: some user-facing warnings/docs still hard-code by_path or generic “no observed paths” wording even when the active selector is paths_of_interest.
Static review only; I could not run the added tests because pytest is not installed in this environment.

Methodology

Severity P3-informational. Impact: the changed method is the dCDH per-path reversible-treatment event-study stack. The selector extension leaves the IF/control-pool methodology intact by zeroing only switcher-side contributions while preserving the full control pool and original cohort structure (diff_diff/chaisemartin_dhaultfoeuille.py:L5477-L5609, L6537-L6664); non-binary support is guarded to integer-coded discrete treatment at fit time (diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023) and path enumeration canonicalizes integer states only (diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L5754). The R deviations are documented in docs/methodology/REGISTRY.md:L641-L643. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity P3. Impact: selector-aware behavior is still described in a few places as by_path only. The multi-baseline warnings for controls and trends_linear still say by_path + ... even when triggered by paths_of_interest (diff_diff/chaisemartin_dhaultfoeuille.py:L1725-L1735, L1829-L1879), the non-empty summary header still renders Treatment-Path Disaggregation (by_path), and the empty summary text can still misdescribe “all requested paths were unobserved” as “no observed paths have a complete window” (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1287). to_dataframe()’s level docs also still describe the surface as by_path=k only (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1445-L1452). Concrete fix: make warnings, summary headers, and to_dataframe(level="by_path") docs selector-aware, and distinguish “requested paths unobserved” from “panel has no complete-window paths.”

Tech Debt

No findings. I did not see a new correctness limitation that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

No blocker-level findings. The added coverage materially improves validation, ordering, non-binary parity, and cross-feature composition checks (tests/test_chaisemartin_dhaultfoeuille.py:L8184-L8938, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321). Static review only; I could not execute them because pytest is unavailable.

R2 verdict was "Looks good". One residual P3 housekeeping class addressed (selector-aware messaging in additional locations): 1. Multi-baseline `UserWarning`s under `controls` and `trends_linear` now read "by_path / paths_of_interest + ..." instead of "by_path + ..." since both fire under either selector. 2. F_g=3 boundary `UserWarning` under `trends_linear` now reads "by_path / paths_of_interest + trends_linear: ..." for the same reason. 3. `summary()` non-empty header dropped the "(by_path)" suffix — matches the empty-state branch fixed in the prior round. 4. `to_dataframe(level="by_path")` docstring now documents that the surface is populated under either `by_path=k` or `paths_of_interest=[(...), ...]`. Updates 5 test assertions that matched the old "by_path + ..." warning substring; the new "+ trends_linear" / "+ controls" substring matches both selector forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T15:48:13Z

/ai-review

github-actions · 2026-05-09T15:52:29Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 5fc825df830053c1d2eaa444684c347529576682

Overall Assessment

✅ Looks good

Executive Summary

Re-review status: the prior substantive regressions appear fixed. The convenience wrapper now routes paths_of_interest through __init__, frequency_rank is no longer tied to user-list order, and summary() / to_dataframe(level="by_path") preserve user-selected path order: diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8190, diff_diff/chaisemartin_dhaultfoeuille.py:L5867-L5878, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1290-L1294, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1732-L1736.
Affected methodology: the dCDH reversible-treatment multi-horizon event study’s per-path disaggregation surface.
I did not find an unmitigated estimand, control-pool, or variance/SE defect in the new non-binary integer-D or paths_of_interest paths.
The substantive deviations from R are documented in the Methodology Registry, so they are informational rather than defects: continuous-D rejection, the R substr() baseline bug for D >= 10, and the Python-only paths_of_interest API: docs/methodology/REGISTRY.md:L641-L643.
Residual issue is only P3: the empty summary() renderer still uses generic “no observed paths” wording even when paths_of_interest merely selected unobserved paths: diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1278.
Static review only; pytest is not installed in this environment.

Methodology

Severity: P3-informational. Impact: the selector change remains a mechanical filter on the existing per-path IF construction. switcher_subset_mask zeroes only non-selected switcher contributions while leaving control-pool construction and full-cohort recentering unchanged, which matches the registry’s documented per-path estimand/SE contract: diff_diff/chaisemartin_dhaultfoeuille.py:L5479-L5608, diff_diff/chaisemartin_dhaultfoeuille.py:L5802-L5929, docs/methodology/REGISTRY.md:L641-L643. Non-binary support is also correctly gated to integer-coded D at fit time: diff_diff/chaisemartin_dhaultfoeuille.py:L2014-L2024. Concrete fix: none.

Code Quality

No findings.

Performance

No findings.

Maintainability

Severity: P3. Impact: fit() now distinguishes the paths_of_interest empty case in its warning text (diff_diff/chaisemartin_dhaultfoeuille.py:L5811-L5827), but summary() still renders every empty path_effects state as “No observed paths have a complete [F_g-1, F_g-1+L_max] window.” (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1278). That is inaccurate when the panel has observed paths but none match the requested list, so the earlier wording issue is only partially resolved. Concrete fix: persist an empty-state reason or selector type on the results object and render selector-specific empty text in summary(); add a summary-level regression next to the existing warning test at tests/test_chaisemartin_dhaultfoeuille.py:L8735-L8764.

Tech Debt

No new findings. I did not identify a new deferred correctness limitation that needs TODO.md tracking.

Security

No findings.

Documentation/Tests

Severity: P3-informational. Impact: coverage is materially stronger for the changed surface, including wrapper routing, ordering, rank semantics, selector-specific warnings, and non-binary R parity: tests/test_chaisemartin_dhaultfoeuille.py:L8651-L8764, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321. Concrete fix: none from static review; execute the targeted new dCDH tests in CI/local validation.

R4 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): the `summary()` empty-state renderer for `path_effects == {}` rendered the same generic "No observed paths have a complete [F_g-1, F_g-1+L_max] window" text under both selectors. Under `paths_of_interest`, that text is misleading because the panel may have observed paths but none match the user's list. Branch on `_estimator_ref.paths_of_interest`: when set, the empty-state block now reads "Every path in paths_of_interest was unobserved or had a window outside L_max+1" with a pointer to the per-path 'zero observed groups' UserWarnings; under `by_path=k`, the original wording is preserved. Adds regression `test_paths_of_interest_all_unobserved_summary_distinct_text` (next to the existing `_emits_distinct_warning` test) asserting the selector-specific summary text fires and the by_path-only wording does not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T16:14:20Z

/ai-review

github-actions · 2026-05-09T16:20:33Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 552b66ff20e5e13012553229b5e828c328a63e55

Overall Assessment

✅ Looks good

Executive Summary

Methodology: I did not find an unmitigated estimand, control-pool, or inference/SE defect in the new non-binary by_path path or the new paths_of_interest selector. The registry now documents both the integer-coded-D requirement and the Python-only selector as a mechanical filter over the existing per-path IF/SE construction, and the implementation matches that contract. docs/methodology/REGISTRY.md:L641-L643 diff_diff/chaisemartin_dhaultfoeuille.py:L2014-L2024 diff_diff/chaisemartin_dhaultfoeuille.py:L2316-L2334
The prior re-review issue is fixed: summary() now distinguishes the all-unobserved paths_of_interest empty state, summary() / to_dataframe(level="by_path") preserve user order, and frequency_rank is no longer tied to iteration order. diff_diff/chaisemartin_dhaultfoeuille.py:L5867-L5878 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1306 diff_diff/chaisemartin_dhaultfoeuille_results.py:L1744-L1748
Severity P2 code-quality issue: failed set_params() calls are not transactional, so a rejected selector update can leave the estimator in an invalid mixed-selector state that fit() will still execute. diff_diff/chaisemartin_dhaultfoeuille.py:L786-L825 diff_diff/chaisemartin_dhaultfoeuille.py:L1167-L1200 diff_diff/chaisemartin_dhaultfoeuille.py:L5702-L5728 tests/test_chaisemartin_dhaultfoeuille.py:L8522-L8533
Documentation and tests are materially improved, including targeted paths_of_interest coverage and a new R-parity fixture for non-binary by_path. tests/test_chaisemartin_dhaultfoeuille.py:L8451-L8563 tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321
Static review only; pytest is not installed in this environment.

Methodology

No findings.

Code Quality

Severity: P2. Impact: set_params() writes directly onto self before it validates the new mutual-exclusion rule, so est.set_params(by_path=2) on an estimator already holding paths_of_interest raises but leaves both selectors populated. fit() does not re-check that mutex and _enumerate_treatment_paths() simply prefers paths_of_interest when both are present, so later fits can run with stale selector state unless the caller manually repairs the object. Concrete fix: validate a copied/merged param dict before mutating self, or add a fit-time hard guard for by_path plus paths_of_interest; then replace the current recovery test with an assertion that failed set_params() leaves the estimator unchanged. diff_diff/chaisemartin_dhaultfoeuille.py:L786-L825 diff_diff/chaisemartin_dhaultfoeuille.py:L1167-L1200 diff_diff/chaisemartin_dhaultfoeuille.py:L5702-L5728 tests/test_chaisemartin_dhaultfoeuille.py:L8522-L8533

Performance

No findings.

Maintainability

No findings.

Tech Debt

No new findings.

Security

No findings.

Documentation/Tests

No findings. The docs/tests added here are aligned with the implementation; I could not execute them locally because pytest is unavailable in this environment.

R5 verdict was "Looks good" with one P2 code-quality regression flagged. The previous round's `test_set_params_partial_mutation_recoverable` documented inconsistent intermediate state as expected behavior: on a mutex violation, `set_params()` mutated `self` before validation, so a failed call left both `by_path` and `paths_of_interest` populated. `fit()` did not re-check the mutex and `_enumerate_treatment_paths()` silently preferred `paths_of_interest` when both were set, so subsequent fits ran with stale selector state unless the caller manually repaired the object. Fix: make `set_params()` transactional. Snapshot the pre-call values for the keys being set, attempt mutation + validation, and roll back to the snapshot on any exception before re-raising. This mirrors sklearn-style atomic semantics. The validation block was extracted into a private `_validate_invariants()` helper so __init__ and set_params can share the rules without code duplication risking drift. Replaces `test_set_params_partial_mutation_recoverable` (which asserted the buggy partial-mutation behavior) with `test_set_params_failed_validation_is_transactional`, asserting that: 1. After a failed `set_params()`, `get_params()` returns bit-identical to the pre-call snapshot. 2. Subsequent valid `set_params()` succeeds against the rolled- back state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T16:31:42Z

/ai-review

github-actions · 2026-05-09T16:39:19Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 9d3fff090375b3f3ebc0b1c03c396846e52fdb6d

Overall Assessment

⚠️ Needs changes

Executive Summary

The previous re-review issue is fixed: failed set_params() calls are now transactional and no longer leave a mixed-selector estimator state. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L844 tests/test_chaisemartin_dhaultfoeuille.py:L8522-L8543
I did not find an unmitigated methodology, control-pool, or SE mismatch in the non-binary by_path lift or the documented Python-only paths_of_interest filter.
Severity P1: paths_of_interest is documented as composing with all downstream by-path surfaces, and the selector is now threaded through analytical, placebo, bootstrap, and post-bootstrap trends-linear branches, but the new tests do not cover the full supported interaction matrix. Under the project’s parameter-interaction rule, that blocks approval. diff_diff/chaisemartin_dhaultfoeuille.py:L622-L642 diff_diff/chaisemartin_dhaultfoeuille.py:L2337-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L2503-L2520 diff_diff/chaisemartin_dhaultfoeuille.py:L3050-L3096 diff_diff/chaisemartin_dhaultfoeuille.py:L3360-L3372 tests/test_chaisemartin_dhaultfoeuille.py:L8868-L8979
The current selector-specific sup-t check is especially weak: it only asserts res.path_sup_t_bands is not None, but the implementation intentionally returns {} for requested-but-empty bands, so that assertion does not actually verify the selector-specific sup-t path. diff_diff/chaisemartin_dhaultfoeuille.py:L4088-L4123 tests/test_chaisemartin_dhaultfoeuille.py:L8926-L8938
Static review only; I could not run the suite locally because numpy is not installed in this environment.

Methodology

No findings.

Code Quality

No findings. The prior mixed-state set_params() problem appears resolved. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L844

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. TODO.md does not currently track the P1 interaction-coverage gap above, so it remains unmitigated.

Security

No findings.

Documentation/Tests

Severity: P1. Impact: paths_of_interest is documented as compatible with bootstrap, per-path placebos, joint sup-t bands, trends_linear, and trends_nonparam, and the selector is forwarded into each of those branches in fit(). The new tests cover analytical trends_linear/trends_nonparam and generic bootstrap/placebo, but they do not exercise selector-specific paths_of_interest + trends_linear + n_bootstrap (post-bootstrap path_cumulated_event_study) or paths_of_interest + trends_nonparam + placebo/n_bootstrap, and the only sup-t assertion is vacuous because {} also satisfies is not None. Concrete fix: add selector-specific regressions for those intersections and assert non-empty/finitely-populated outputs, not just object existence. diff_diff/chaisemartin_dhaultfoeuille.py:L622-L642 diff_diff/chaisemartin_dhaultfoeuille.py:L2337-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L2503-L2520 diff_diff/chaisemartin_dhaultfoeuille.py:L3050-L3096 diff_diff/chaisemartin_dhaultfoeuille.py:L3360-L3372 diff_diff/chaisemartin_dhaultfoeuille.py:L4088-L4123 tests/test_chaisemartin_dhaultfoeuille.py:L8868-L8979

Path to Approval

Add a paths_of_interest + trends_linear=True + n_bootstrap>0 regression, ideally with placebo=True, that asserts selected-path filtering, finite per-path bootstrap SE propagation, and populated post-bootstrap path_cumulated_event_study.
Add a paths_of_interest + trends_nonparam="state" + placebo=True + n_bootstrap>0 regression that asserts the selector and set_ids flow through _compute_path_placebos, _collect_path_bootstrap_inputs, _collect_path_placebo_bootstrap_inputs, and the resulting public surfaces.
Replace the current selector-specific sup-t smoke check with one that requires at least one selected path to have a finite crit_value and non-NaN cband_conf_int entries on results.path_effects.

P1 (Documentation/Tests): the docstring claimed paths_of_interest composed with bootstrap, per-path placebos, joint sup-t bands, trends_linear, and trends_nonparam, but the new tests covered only the analytical trends_* paths and a triple-combo (paths_of_interest + non-binary D + bootstrap + placebo). The R6 reviewer flagged two specific gaps: 1. paths_of_interest + trends_linear=True + n_bootstrap > 0 + placebo=True: post-bootstrap path_cumulated_event_study path was untested. 2. paths_of_interest + trends_nonparam="state" + placebo=True + n_bootstrap > 0: the set_ids flow through all four per-path collectors (analytical, placebo, bootstrap, placebo-bootstrap) was untested. Plus the existing non-binary triple-combo's sup-t check was vacuous (`assert res.path_sup_t_bands is not None` is satisfied by `{}` when the strict-majority gate excludes every path). Fixes: - Add test_paths_of_interest_trends_linear_bootstrap_placebo asserting path-set restriction, finite per-path bootstrap SE on path_effects, populated post-bootstrap path_cumulated_event_study with finite (effect, SE) per (path, horizon), and populated per-path placebo for the same paths. - Add test_paths_of_interest_trends_nonparam_bootstrap_placebo asserting selector + set_ids flow through all four collectors: finite analytical/bootstrap (effect, SE) on path_effects, finite per-path placebo (effect, SE). - Tighten test_paths_of_interest_non_binary_bootstrap_placebo: bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority gate fires, and assert >=1 finite crit_value AND non-NaN cband_conf_int per (path, horizon) on path_effects (replacing the vacuous "is not None" check). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T16:42:45Z

/ai-review

github-actions · 2026-05-09T16:50:10Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 25b1f62c7f4821e8267db7ac1fa4c976cedefc70

Overall Assessment

✅ Looks good

Executive Summary

The previous re-review blocker appears addressed: the new tests now cover paths_of_interest + trends_linear + n_bootstrap + placebo, paths_of_interest + trends_nonparam + n_bootstrap + placebo, and a non-vacuous per-path sup-t/cband assertion. tests/test_chaisemartin_dhaultfoeuille.py:L8909-L9065
I did not find an unmitigated methodology mismatch in the affected method, the dCDH per-path event-study disaggregation. The new selector and non-binary lift reuse the existing per-path IF/SE/bootstrap machinery rather than changing the estimand. diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040 diff_diff/chaisemartin_dhaultfoeuille.py:L2331-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L5631-L5771
P3 informational: the continuous-D rejection for per-path selection, the D>=10 R path-string bug, and the Python-only paths_of_interest extension are explicitly documented in the Methodology Registry, so they are not defects. docs/methodology/REGISTRY.md:L641-L643
The prior mixed-selector set_params() problem is fixed with transactional rollback and regression coverage. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L852 tests/test_chaisemartin_dhaultfoeuille.py:L8511-L8543
Static review only; I could not run the suite locally because numpy and pytest are not installed in this environment.

Methodology

Severity: P3 informational. Impact: by_path/paths_of_interest now reject continuous treatment at fit-time instead of silently rounding, and the known R substr(path, 1, 1) bug for D>=10 is documented as an intentional Python/R divergence. I did not find an undocumented change to the dCDH estimand, weighting, or variance formula. Concrete fix: None. diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040 docs/methodology/REGISTRY.md:L641-L643
Severity: P3 informational. Impact: paths_of_interest is a Python-only selector, documented as a mechanical filter over observed paths rather than a new methodology branch; the implementation threads it through the same event-study, placebo, bootstrap, sup-t, and cumulation surfaces. Concrete fix: None. diff_diff/chaisemartin_dhaultfoeuille.py:L2331-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L2499-L2520 diff_diff/chaisemartin_dhaultfoeuille.py:L3043-L3372 docs/methodology/REGISTRY.md:L641-L643

Code Quality

No findings. The transactional rollback in set_params() resolves the prior mixed-state bug and is regression-tested. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L852 tests/test_chaisemartin_dhaultfoeuille.py:L8522-L8543

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. I did not see newly introduced deferred work that should have been tracked in TODO.md.

Security

No findings.

Documentation/Tests

No findings. The prior P1 interaction-coverage gap appears closed by the new selector-specific regressions, and the sup-t check now requires finite crit_value and finite cband_conf_int endpoints rather than is not None. tests/test_chaisemartin_dhaultfoeuille.py:L8909-L9065
R-parity coverage was extended for the non-binary by_path lift. tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1320
Static review only; local execution was not possible because numpy and pytest are unavailable here.

igerber · 2026-05-09T17:56:45Z

/ai-review

github-actions · 2026-05-09T18:04:17Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 6c467744c91782316c8736e25b125ba87a2af040

Overall Assessment

✅ Looks good. Highest unmitigated finding is P2.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille dCDH reversible-treatment event-study per-path disaggregation.
No P0/P1 methodology, variance/SE, inference, control-pool, or selector-propagation issues found.
paths_of_interest is threaded through event-study, placebo, bootstrap, cumulated trends-linear, sup-t, summary, and dataframe surfaces.
Prior re-review concerns appear addressed: reciprocal set_params() mutex tests, transactional rollback, and bootstrap/placebo/trends interaction tests are present.
Static review only: I could not run tests because numpy is not installed in this environment.

Methodology

Severity: P2
Impact: The registry documents the R substr(path, 1, 1) baseline bug for D >= 10, but the new tested support includes negative integer treatment paths. The same R single-character baseline parsing issue would also affect negative path strings such as "-1,...", while Python tuple matching handles them correctly. This is not a Python correctness issue, but it leaves an R-parity caveat incomplete for a behavior now explicitly tested.
Concrete fix: Amend the REGISTRY.md non-binary by_path note to say the R parser caveat applies to multi-character baseline states, including D >= 10 and negative integer states, or narrow the public contract to nonnegative integer-coded states.
Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L457-L460, diff_diff/chaisemartin_dhaultfoeuille.py:L5708-L5711, tests/test_chaisemartin_dhaultfoeuille.py:L8230-L8270, docs/methodology/REGISTRY.md:L641
Severity: P3 informational
Impact: Continuous-D rejection under per-path selection, the documented R D >= 10 parser divergence, and Python-only paths_of_interest are explicitly documented deviations/extensions, so they are not defects.
Concrete fix: None.
Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040, docs/methodology/REGISTRY.md:L641

Code Quality

No findings. paths_of_interest validation, get_params(), reciprocal set_params() mutex validation, and transactional rollback are present. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L285-L328, diff_diff/chaisemartin_dhaultfoeuille.py:L678-L852, tests/test_chaisemartin_dhaultfoeuille.py:L8510-L8548

Performance

No findings. The selector reuses existing per-path helper paths and bootstrap input collectors; I did not see a new avoidable asymptotic regression in the changed code.

Maintainability

No findings. Sibling-surface mirror audit covered event-study, placebo, bootstrap collectors, trends-linear cumulation, sup-t bands, summary(), and to_dataframe(level="by_path"). Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2330-L2354, diff_diff/chaisemartin_dhaultfoeuille.py:L2498-L2520, diff_diff/chaisemartin_dhaultfoeuille.py:L3042-L3097, diff_diff/chaisemartin_dhaultfoeuille.py:L3351-L3373, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1261-L1385, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1711-L1827

Tech Debt

No findings. I did not find a new deferred correctness issue relying on TODO.md mitigation.

Security

No findings.

Documentation/Tests

No P1 test gap found. The new tests cover non-binary by_path, continuous-D rejection under per-path selection, empty paths_of_interest, reciprocal selector mutexes, dataframe/summary ordering, controls, trends_linear, trends_nonparam, bootstrap, placebo, and sup-t bands. Pattern-wide grep found no new inline t_stat = effect / se anti-pattern in the affected dCDH files; inference routes through safe_inference(). Locations: tests/test_chaisemartin_dhaultfoeuille.py:L8184-L8448, tests/test_chaisemartin_dhaultfoeuille.py:L8456-L9073, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321

Bundles two adjacent extensions to ChaisemartinDHaultfoeuille.by_path: 1. Non-binary integer treatment (Wave 3 #8): replace the `NotImplementedError` gate at chaisemartin_dhaultfoeuille.py:1870 with a `ValueError` for continuous D (e.g. 1.5) per the no-silent- failures contract; integer-coded D in Z (e.g. ordinal {0, 1, 2}) is now supported and produces integer-state path tuples (e.g. (0, 2, 2, 2)). Validated against R `did_multiplegt_dyn(..., by_path)` for D in {0, 1, 2} via the new `multi_path_reversible_by_path_non_binary` golden-value scenario: per-path point estimates match R bit-exactly (rtol ~1e-9 events, rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol; SE_RTOL=0.15 envelope). 2. paths_of_interest kwarg (Wave 3 #9): new estimator parameter accepting a list of int tuples for explicit user-specified path selection. Mutually exclusive with by_path=k (raises ValueError at __init__ and set_params time). Each tuple length validated at __init__ for uniformity, vs L_max+1 at fit-time. Bool / np.bool_ rejected; np.integer accepted and canonicalized to Python int. Duplicates emit a UserWarning and dedupe; unobserved paths emit a UserWarning and are omitted from path_effects. Composes with non-binary D and all downstream by_path surfaces (bootstrap, placebos, sup-t bands, controls, trends_linear, trends_nonparam). Python-only API (R has no list-based selection in by_path). Threading: paths_of_interest added to 6 path-enumeration helper signatures (`_enumerate_treatment_paths`, `_compute_path_effects`, `_compute_path_cumulated_event_study`, `_collect_path_bootstrap_inputs`, `_compute_path_placebos`, `_collect_path_placebo_bootstrap_inputs`). The `:1118` preconditions gate (drop_larger_lower / L_max / mutex with heterogeneity / design2 / honest_did / survey_design) and the 11 `self.by_path is not None` activation branches in fit() rerouted to fire under either selector. For D >= 10, R's `did_multiplegt_by_path` derives the per-path baseline via `substr(path_index$path, 1, 1)` which captures only the first character of the comma-separated path string (e.g. captures "1" instead of "12" for path = "12,12,..."), mis-allocating R's per-path control-pool subset. Python's tuple-key matching is correct; the parity scenario stays in D in {0, 1, 2} to avoid the R bug. Adds: - `_validate_paths_of_interest` module-level helper for canonicalization - `TestByPathNonBinary` (10 tests, 3 slow) - `TestPathsOfInterest` (27 tests, 7 slow) - `TestDCDHDynRParityByPathNonBinary` (1 parity test) - R script Scenario 19 + regenerated dcdh_dynr_golden_values.json - REGISTRY.md sub-paragraphs for non-binary + paths_of_interest - CHANGELOG.md Unreleased > Added entry - llms-full.txt kwarg listing for paths_of_interest Removes: the now-stale `test_forbids_non_binary_treatment` gate- firing test from TestByPathCoverageFitGates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()` hard-coded the constructor kwarg allowlist without `paths_of_interest`, so the new kwarg fell through to `fit()` and raised TypeError. Replace the static set with a signature-derived split via `inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future constructor params cannot drift out of sync. Add helper-level regression `test_convenience_function_routes_paths_of_interest_to_init` parallel to the existing `test_convenience_function_matches_class`. P2: under `paths_of_interest`, `frequency_rank` was assigned from `enumerate(selected_paths)`, which produces user-selection order rather than true frequency rank. Decouple the iteration order from the rank field: keep `selected_paths` iteration in user order (insertion order preserved on `path_effects.keys()`), but compute `frequency_rank` as the within-selected-paths rank by descending group count (lex tiebreak on the path tuple). Under `by_path=k`, `selected_paths` is already sorted by descending frequency so the two coincide; under `paths_of_interest`, frequency_rank now reflects true observed-count rank regardless of user order. Add regression `test_paths_of_interest_frequency_rank_is_true_frequency`. P3: docs/source-docstring binary-only language on per-path disaggregation cleaned up: - class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop "binary treatment", add `paths_of_interest` mention) - docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add `paths_of_interest` and explicit "binary or integer-coded discrete" language) - diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452 (`path_effects`, `path_placebo_event_study`, `path_cumulated_event_study`, `path_sup_t_bands` activator clauses now read "by_path is a positive int OR paths_of_interest is set") - docs/methodology/REGISTRY.md:671 checklist line (add "or integer-coded discrete" + `paths_of_interest`) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R1 verdict was "Looks good". Two P3 housekeeping items addressed: P3 (maintainability): `summary()` and `to_dataframe(level="by_path")` re-sorted output by `frequency_rank`, dropping the user-specified order under `paths_of_interest`. Both reporting surfaces now iterate in `path_effects` insertion order so user-specified order is preserved end-to-end. Under `by_path=k`, insertion order matches descending frequency_rank so output is identical to the prior release. Adds regressions `test_paths_of_interest_order_preserved_in_to_dataframe` and `test_paths_of_interest_order_preserved_in_summary`. P3 (docs): remaining `by_path`-only language in result docs and REGISTRY: - `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty- state contract: "no bootstrap or `by_path is None`" -> "no bootstrap, or both `by_path` and `paths_of_interest` are `None`" - `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level= "by_path")` ValueError text: pass `paths_of_interest` as an alternative to `by_path=k` - `docs/methodology/REGISTRY.md:643` joint sup-t bands Note: empty-state contract updated for both selectors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R2 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): selector-aware messaging on the empty-state path: 1. `_compute_path_effects` empty-selection warning now branches on `paths_of_interest is not None`. Under `paths_of_interest`, the message reads "every user-specified path was either unobserved ... per-path 'zero observed groups' UserWarnings already issued" so the user gets a clear pointer to the per-path warnings rather than a generic "no observed treatment path" message that misdescribes the cause. Under `by_path=k`, the original message is preserved. 2. `summary()` empty-state header changed from "Treatment-Path Disaggregation (by_path)" to "Treatment-Path Disaggregation"; the trailing parenthetical now says "by_path / paths_of_interest was a no-op on this panel". Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning` asserting the paths_of_interest-specific warning text fires when every requested path is unobserved. Also fixes 7 pre-existing F541 f-string-without-placeholder warnings in the warning blocks via `ruff --fix`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R2 verdict was "Looks good". One residual P3 housekeeping class addressed (selector-aware messaging in additional locations): 1. Multi-baseline `UserWarning`s under `controls` and `trends_linear` now read "by_path / paths_of_interest + ..." instead of "by_path + ..." since both fire under either selector. 2. F_g=3 boundary `UserWarning` under `trends_linear` now reads "by_path / paths_of_interest + trends_linear: ..." for the same reason. 3. `summary()` non-empty header dropped the "(by_path)" suffix — matches the empty-state branch fixed in the prior round. 4. `to_dataframe(level="by_path")` docstring now documents that the surface is populated under either `by_path=k` or `paths_of_interest=[(...), ...]`. Updates 5 test assertions that matched the old "by_path + ..." warning substring; the new "+ trends_linear" / "+ controls" substring matches both selector forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R4 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): the `summary()` empty-state renderer for `path_effects == {}` rendered the same generic "No observed paths have a complete [F_g-1, F_g-1+L_max] window" text under both selectors. Under `paths_of_interest`, that text is misleading because the panel may have observed paths but none match the user's list. Branch on `_estimator_ref.paths_of_interest`: when set, the empty-state block now reads "Every path in paths_of_interest was unobserved or had a window outside L_max+1" with a pointer to the per-path 'zero observed groups' UserWarnings; under `by_path=k`, the original wording is preserved. Adds regression `test_paths_of_interest_all_unobserved_summary_distinct_text` (next to the existing `_emits_distinct_warning` test) asserting the selector-specific summary text fires and the by_path-only wording does not. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

R5 verdict was "Looks good" with one P2 code-quality regression flagged. The previous round's `test_set_params_partial_mutation_recoverable` documented inconsistent intermediate state as expected behavior: on a mutex violation, `set_params()` mutated `self` before validation, so a failed call left both `by_path` and `paths_of_interest` populated. `fit()` did not re-check the mutex and `_enumerate_treatment_paths()` silently preferred `paths_of_interest` when both were set, so subsequent fits ran with stale selector state unless the caller manually repaired the object. Fix: make `set_params()` transactional. Snapshot the pre-call values for the keys being set, attempt mutation + validation, and roll back to the snapshot on any exception before re-raising. This mirrors sklearn-style atomic semantics. The validation block was extracted into a private `_validate_invariants()` helper so __init__ and set_params can share the rules without code duplication risking drift. Replaces `test_set_params_partial_mutation_recoverable` (which asserted the buggy partial-mutation behavior) with `test_set_params_failed_validation_is_transactional`, asserting that: 1. After a failed `set_params()`, `get_params()` returns bit-identical to the pre-call snapshot. 2. Subsequent valid `set_params()` succeeds against the rolled- back state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

P1 (Documentation/Tests): the docstring claimed paths_of_interest composed with bootstrap, per-path placebos, joint sup-t bands, trends_linear, and trends_nonparam, but the new tests covered only the analytical trends_* paths and a triple-combo (paths_of_interest + non-binary D + bootstrap + placebo). The R6 reviewer flagged two specific gaps: 1. paths_of_interest + trends_linear=True + n_bootstrap > 0 + placebo=True: post-bootstrap path_cumulated_event_study path was untested. 2. paths_of_interest + trends_nonparam="state" + placebo=True + n_bootstrap > 0: the set_ids flow through all four per-path collectors (analytical, placebo, bootstrap, placebo-bootstrap) was untested. Plus the existing non-binary triple-combo's sup-t check was vacuous (`assert res.path_sup_t_bands is not None` is satisfied by `{}` when the strict-majority gate excludes every path). Fixes: - Add test_paths_of_interest_trends_linear_bootstrap_placebo asserting path-set restriction, finite per-path bootstrap SE on path_effects, populated post-bootstrap path_cumulated_event_study with finite (effect, SE) per (path, horizon), and populated per-path placebo for the same paths. - Add test_paths_of_interest_trends_nonparam_bootstrap_placebo asserting selector + set_ids flow through all four collectors: finite analytical/bootstrap (effect, SE) on path_effects, finite per-path placebo (effect, SE). - Tighten test_paths_of_interest_non_binary_bootstrap_placebo: bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority gate fires, and assert >=1 finite crit_value AND non-NaN cband_conf_int per (path, horizon) on path_effects (replacing the vacuous "is not None" check). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…D in test `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_non_integer_D_raises` failed across all 7 CI runners (macos py3.11/3.14, ubuntu-24.04-arm py3.11/3.13/3.14, ubuntu-latest py3.14, windows-latest py3.11) with: TypeError: Invalid value '1.5' for dtype 'int64' The fixture builds `df["treatment"]` as integers in {0, 1, 2}, so the column dtype is int64. On pandas >= 2.x (the version pinned in CI for py3.11+ runners), `df.loc[mask, "treatment"] = 1.5` raises TypeError outright; on older pandas (the local dev env) it emitted a FutureWarning and proceeded. The test path was supposed to exercise the estimator's D-integer guard at fit-time, not pandas's dtype coercion behavior. Cast `df["treatment"]` to float before injecting 1.5 so the test behaves identically across pandas versions and reaches the ValueError raise it was written to verify. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

igerber · 2026-05-09T18:24:54Z

/ai-review

github-actions · 2026-05-09T18:30:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 4e24cac05fa5e80ca1a1f1ad4a3ed45e478c276d

Overall Assessment

✅ Looks good. Highest unmitigated finding is P2.

Executive Summary

Affected method: ChaisemartinDHaultfoeuille dCDH reversible-treatment event-study per-path disaggregation.
No P0/P1 methodology, variance/SE, inference, control-pool, or selector-propagation issues found.
paths_of_interest is propagated through event-study, placebo, bootstrap, cumulated trends-linear, sup-t bands, summary, dataframe, get_params(), set_params(), and the convenience wrapper.
Prior P2 documentation caveat remains: negative integer paths are supported/tested but the R parser caveat only mentions D >= 10.
I attempted targeted pytest coverage, but pytest is not installed in this environment.

Methodology

Severity: P2
Impact: Prior documentation caveat remains incomplete. The registry documents R’s substr(path, 1, 1) baseline parsing bug only for D >= 10, but this PR also explicitly supports negative integer states. R would parse a path like "-1,..." as baseline "-", while Python tuple matching handles it correctly. This is not a Python correctness defect, but it is an incomplete R-parity caveat for a now-tested public behavior.
Concrete fix: Amend the non-binary by_path note to say the R parser caveat applies to multi-character baseline states, including D >= 10 and negative integers, or narrow public docs to nonnegative integer-coded states.
Locations: docs/methodology/REGISTRY.md:L641, tests/test_chaisemartin_dhaultfoeuille.py:L8230-L8270
Severity: P3 informational
Impact: Continuous-D rejection, Python-only paths_of_interest, cross-path cohort-sharing SE, bootstrap percentile CI, and per-path sup-t asymmetry are documented under labeled registry notes/deviations, so they are not methodology defects.
Concrete fix: None.
Locations: docs/methodology/REGISTRY.md:L641-L643

Code Quality

No findings. The new parameter is validated, canonicalized, included in get_params(), guarded transactionally in set_params(), and routed by the convenience wrapper. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L285-L328, diff_diff/chaisemartin_dhaultfoeuille.py:L680-L853, diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8206

Performance

No findings. The explicit selector reuses the existing path enumeration and per-path IF/bootstrap collectors; no new avoidable asymptotic issue was apparent in the changed paths.

Maintainability

No findings. Sibling-surface mirror audit covered path effects, placebos, bootstrap inputs, placebo bootstrap inputs, trends-linear cumulation, sup-t bands, summary(), and to_dataframe(level="by_path"). Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2330-L2354, diff_diff/chaisemartin_dhaultfoeuille.py:L2499-L2514, diff_diff/chaisemartin_dhaultfoeuille.py:L3043-L3097, diff_diff/chaisemartin_dhaultfoeuille.py:L3351-L3373, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1261-L1385, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1711-L1827

Tech Debt

No findings. I did not find a new silent correctness issue relying on TODO.md mitigation.

Security

No findings. Secret-pattern grep over changed files found no credential-like strings.

Documentation/Tests

Severity: P3 informational
Impact: Negative integer R-caveat docs remain incomplete as noted under Methodology. Test coverage otherwise covers validation, reciprocal mutex behavior, transactional rollback, ordering, empty result sets, non-binary D, controls, trends, bootstrap, placebo, and sup-t interactions.
Concrete fix: Same registry wording update as above.
Locations: docs/methodology/REGISTRY.md:L641, tests/test_chaisemartin_dhaultfoeuille.py:L8456-L9108

Verification notes: pattern-wide grep found no new inline dCDH inference anti-pattern; dCDH inference routes through safe_inference(). I attempted targeted tests, but the environment returned pytest: command not found.

igerber added the ready-for-ci Triggers CI test workflows label May 9, 2026

igerber and others added 5 commits May 9, 2026 14:23

igerber and others added 4 commits May 9, 2026 14:23

igerber force-pushed the dcdh-by-path-non-binary-poi branch from 6c46774 to 4e24cac Compare May 9, 2026 18:24

igerber merged commit 4469f54 into main May 9, 2026
23 checks passed

igerber deleted the dcdh-by-path-non-binary-poi branch May 9, 2026 19:29

Conversation

igerber commented May 9, 2026

Summary

Methodology references (required if estimator / math changes)

Validation

Security / privacy

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Uh oh!

igerber commented May 9, 2026

Uh oh!

github-actions Bot commented May 9, 2026

Overall Assessment

Executive Summary

Methodology

Code Quality

Performance

Maintainability

Tech Debt

Security

Documentation/Tests

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant