Skip to content

dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest#401

Merged
igerber merged 9 commits intomainfrom
dcdh-by-path-non-binary-poi
May 9, 2026
Merged

dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest#401
igerber merged 9 commits intomainfrom
dcdh-by-path-non-binary-poi

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 9, 2026

Summary

  • ChaisemartinDHaultfoeuille.by_path now accepts non-binary integer treatment (D in Z, e.g. ordinal {0, 1, 2}). Path tuples become integer-state tuples like (0, 2, 2, 2). Continuous D (e.g. 1.5) raises ValueError at fit-time per the no-silent-failures contract.
  • New paths_of_interest kwarg for user-specified path subsets, alternative to by_path=k's top-k automatic ranking. Mutually exclusive with by_path; setting both raises ValueError. Composes with non-binary D and all downstream by_path surfaces. Python-only API (R has no list-based selection).
  • Gate at chaisemartin_dhaultfoeuille.py:1870 replaced by D-integer validation; the :1118 preconditions gate and 11 self.by_path is not None activation branches in fit() rerouted to fire under either selector. The 6 path-enumeration helper signatures gain a paths_of_interest parameter; 5 forward to _enumerate_treatment_paths.

Methodology references (required if estimator / math changes)

  • Method name(s): de Chaisemartin & D'Haultfœuille reversible-treatment event study + per-path disaggregation
  • Paper / source link(s): Same paper anchor as PR dCDH: add by_path per-path event-study disaggregation #357 (foundation); R reference DIDmultiplegtDYN 2.3.3 (per-path dispatcher in did_multiplegt_dyn, path-string construction in did_multiplegt_by_path)
  • Any intentional deviations from the source (and why):
    • Continuous-D rejection: R's path-string encoding silently produces fractional path tuples for non-integer D; we raise to enforce integer-coded discrete treatment per feedback_no_silent_failures. Documented in REGISTRY.md.
    • R substr(path, 1, 1) baseline-derivation bug for D >= 10: R's helper did_multiplegt_by_path captures only the first character of the comma-separated path string, mis-allocating R's per-path control-pool subset for multi-digit D values. Python's tuple-key matching is correct; the parity scenario stays in D in {0, 1, 2} to avoid the R bug. Documented in REGISTRY.md and CHANGELOG.md.
    • paths_of_interest is Python-only: R's by_path accepts only positive int (top-k) or -1 (all). No list-based selection in R; no R-parity test for this kwarg.

Validation

  • Tests added/updated:
    • tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary (10 tests, 3 slow)
    • tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest (27 tests, 7 slow)
    • tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary (1 parity test)
    • Removed stale test_forbids_non_binary_treatment gate-firing test
  • Backtest / simulation / notebook evidence: R parity scenario multi_path_reversible_by_path_non_binary (Scenario 19) added to benchmarks/R/generate_dcdh_dynr_test_values.R and regenerated benchmarks/data/dcdh_dynr_golden_values.json. Per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), SE within ~5% rtol envelope. 282 dCDH non-slow tests pass; 37 new test classes (slow + non-slow) green.

Security / privacy

  • Confirm no secrets/PII in this PR: Yes

Generated with Claude Code

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

Overall Assessment

⚠️ Needs changes

Executive Summary

  • Methodology review did not find an unmitigated estimator/math/variance defect. The main methodology-sensitive deviations here are documented in docs/methodology/REGISTRY.md.
  • P1: the public convenience helper chaisemartin_dhaultfoeuille(...) does not propagate the new paths_of_interest constructor kwarg, so that new API is broken on a public entry point.
  • P2: under paths_of_interest, the code writes user-selection order into frequency_rank, so returned metadata no longer means “frequency rank” even though the field name and downstream consumers still imply that.
  • P3: primary docs/source docstrings/results docs are still partly by_path-only / binary-only, and there is no convenience-wrapper regression test for paths_of_interest.
  • I did not find a security or performance regression in the diff itself.

Methodology

  • Severity P3-informational. Impact: no unmitigated methodology mismatch found. The estimator-sensitive deviations I checked are documented: continuous-D rejection for path enumeration, Python-only paths_of_interest, and the R substr() bug for D >= 10. References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L2010-L2020, diff_diff/chaisemartin_dhaultfoeuille.py:L5612-L5741. Concrete fix: none.

Code Quality

  • Severity P1. Impact: paths_of_interest is broken on the exported one-shot helper because chaisemartin_dhaultfoeuille() still hard-codes the constructor kwarg allowlist without the new parameter, so paths_of_interest falls through to fit() even though fit() does not accept it. Any caller using the public helper will get a runtime error instead of the new behavior. References: diff_diff/chaisemartin_dhaultfoeuille.py:L8139-L8153, diff_diff/chaisemartin_dhaultfoeuille.py:L841-L860, diff_diff/__init__.py:L205-L208. Concrete fix: add paths_of_interest to the init-key split or, better, derive the split from inspect.signature(ChaisemartinDHaultfoeuille.__init__) so future constructor params cannot drift out of sync.

Performance

  • No findings.

Maintainability

  • Severity P2. Impact: under paths_of_interest, frequency_rank is assigned from user-specified order rather than observed-path frequency, but the field name and results schema still describe it as a frequency rank. That silently mislabels metadata for downstream consumers of results.path_effects and to_dataframe(level="by_path"). References: diff_diff/chaisemartin_dhaultfoeuille.py:L5845-L5935, diff_diff/chaisemartin_dhaultfoeuille_results.py:L394-L452, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1690-L1803. Concrete fix: keep user order in a separate selection_order/display_order field and preserve true frequency rank separately, or rename/document the field consistently everywhere it is surfaced.

Tech Debt

  • No findings. TODO.md does not already track the P1 wrapper regression, so it is not mitigated as deferred work.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: the new public behavior is only partially documented. The class summary still says per-path support requires binary treatment, the main API docs still describe only by_path=k, the results docs still say path surfaces populate only when by_path is set, and the REGISTRY checklist still says by_path is binary-only. There is also no convenience-wrapper regression covering paths_of_interest, which is why the P1 slipped through despite existing helper tests. References: diff_diff/chaisemartin_dhaultfoeuille.py:L367-L370, docs/api/chaisemartin_dhaultfoeuille.rst:L16-L22, diff_diff/chaisemartin_dhaultfoeuille_results.py:L394-L452, docs/methodology/REGISTRY.md:L671-L671, tests/test_chaisemartin_dhaultfoeuille.py:L204-L223. Concrete fix: update those docs to mention non-binary by_path and paths_of_interest, and add a helper-level regression that exercises chaisemartin_dhaultfoeuille(..., paths_of_interest=[...]).

Path to Approval

  1. Fix paths_of_interest propagation in chaisemartin_dhaultfoeuille() at diff_diff/chaisemartin_dhaultfoeuille.py:L8139-L8153.
  2. Add a regression next to tests/test_chaisemartin_dhaultfoeuille.py:L204-L223 that calls the convenience helper with paths_of_interest and checks it matches direct-class behavior.

igerber added a commit that referenced this pull request May 9, 2026
P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()`
hard-coded the constructor kwarg allowlist without `paths_of_interest`,
so the new kwarg fell through to `fit()` and raised TypeError. Replace
the static set with a signature-derived split via
`inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future
constructor params cannot drift out of sync. Add helper-level
regression `test_convenience_function_routes_paths_of_interest_to_init`
parallel to the existing `test_convenience_function_matches_class`.

P2: under `paths_of_interest`, `frequency_rank` was assigned from
`enumerate(selected_paths)`, which produces user-selection order
rather than true frequency rank. Decouple the iteration order from
the rank field: keep `selected_paths` iteration in user order
(insertion order preserved on `path_effects.keys()`), but compute
`frequency_rank` as the within-selected-paths rank by descending
group count (lex tiebreak on the path tuple). Under `by_path=k`,
`selected_paths` is already sorted by descending frequency so the
two coincide; under `paths_of_interest`, frequency_rank now reflects
true observed-count rank regardless of user order. Add regression
`test_paths_of_interest_frequency_rank_is_true_frequency`.

P3: docs/source-docstring binary-only language on per-path
disaggregation cleaned up:
- class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop
  "binary treatment", add `paths_of_interest` mention)
- docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add
  `paths_of_interest` and explicit "binary or integer-coded discrete"
  language)
- diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452
  (`path_effects`, `path_placebo_event_study`,
  `path_cumulated_event_study`, `path_sup_t_bands` activator clauses
  now read "by_path is a positive int OR paths_of_interest is set")
- docs/methodology/REGISTRY.md:671 checklist line (add
  "or integer-coded discrete" + `paths_of_interest`)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 7df7dc258b4d8d587997cb89f99875bbf9344f28


Overall Assessment

✅ Looks good

Executive Summary

  • Previous P1 resolved: the public helper now routes paths_of_interest through __init__, with regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L225-L260.
  • Previous P2 resolved: frequency_rank is now computed from observed path frequency rather than user-list order under paths_of_interest, with regression coverage in tests/test_chaisemartin_dhaultfoeuille.py:L8651-L8675.
  • Methodology review did not find an unmitigated estimator/math/variance defect. The key deviations from R are documented in docs/methodology/REGISTRY.md:L641-L643 and surfaced in code at diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023.
  • Edge-case handling on the new paths looks adequate: empty selected-path sets return {} with warning, continuous non-integer D is rejected at fit time, and per-path inference remains safe_inference-based and NaN-consistent (diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L6256, diff_diff/chaisemartin_dhaultfoeuille.py:L3248-L3431).
  • Remaining issues are P3 only: a small amount of reporting/docs text still behaves as if by_path were the only selector, and reporting surfaces re-sort paths_of_interest outputs by frequency_rank.

Methodology

  • Severity P3-informational. Impact: no unmitigated methodology mismatch found. The affected method is the dCDH reversible-treatment event study with per-path disaggregation; the PR keeps the existing per-path IF/control-pool/variance construction and documents the intentional deviations (continuous-D rejection, Python-only paths_of_interest, and the R substr() bug for D >= 10). References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023, diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L6256. Concrete fix: none.

Code Quality

  • Severity P3-informational. Impact: the prior convenience-wrapper regression is fixed. chaisemartin_dhaultfoeuille() now derives init kwargs from the constructor signature, so paths_of_interest no longer falls through to fit() and error. References: diff_diff/chaisemartin_dhaultfoeuille.py:L8120-L8171, tests/test_chaisemartin_dhaultfoeuille.py:L225-L260. Concrete fix: none.

Performance

  • No findings.

Maintainability

  • Severity P3. Impact: results.path_effects preserves user-specified order, but summary() and to_dataframe(level="by_path") immediately re-sort by frequency_rank, so the new selector’s list order is not carried through all reporting surfaces. That does not change estimates, but it leaves the display contract ambiguous. References: diff_diff/chaisemartin_dhaultfoeuille.py:L628-L630, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1288-L1291, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1726-L1728. Concrete fix: either preserve paths_of_interest order across reporting surfaces or document explicitly that only results.path_effects preserves user order.

Tech Debt

  • No findings. TODO.md does not contain a pre-existing entry that changes the assessment for this diff.

Security

  • No findings.

Documentation/Tests

  • Severity P3. Impact: most of the prior docs gaps are resolved, but some remaining user-facing text still describes per-path outputs as by_path-only, which is now inaccurate under paths_of_interest. Examples include the result docs’ empty-state wording and the to_dataframe(level="by_path") guidance. References: diff_diff/chaisemartin_dhaultfoeuille_results.py:L447-L459, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1699-L1704, docs/methodology/REGISTRY.md:L643-L643. Concrete fix: update the remaining text to say “by_path or paths_of_interest”, and add a small regression around summary/to_dataframe(level="by_path") messaging for the new selector.
  • Note: this was a static review. I did not execute the added tests locally because the review environment is missing numpy.

igerber added a commit that referenced this pull request May 9, 2026
R1 verdict was "Looks good". Two P3 housekeeping items addressed:

P3 (maintainability): `summary()` and `to_dataframe(level="by_path")`
re-sorted output by `frequency_rank`, dropping the user-specified
order under `paths_of_interest`. Both reporting surfaces now iterate
in `path_effects` insertion order so user-specified order is
preserved end-to-end. Under `by_path=k`, insertion order matches
descending frequency_rank so output is identical to the prior
release. Adds regressions
`test_paths_of_interest_order_preserved_in_to_dataframe` and
`test_paths_of_interest_order_preserved_in_summary`.

P3 (docs): remaining `by_path`-only language in result docs and
REGISTRY:
- `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty-
  state contract: "no bootstrap or `by_path is None`" -> "no
  bootstrap, or both `by_path` and `paths_of_interest` are `None`"
- `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level=
  "by_path")` ValueError text: pass `paths_of_interest` as an
  alternative to `by_path=k`
- `docs/methodology/REGISTRY.md:643` joint sup-t bands Note:
  empty-state contract updated for both selectors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 1d8e16bef9da0bd8183f809e0cde418d7f212822


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior behavior regressions are addressed. The convenience helper now routes paths_of_interest through __init__ (diff_diff/chaisemartin_dhaultfoeuille.py:L8154-L8163), frequency_rank is computed from observed path counts rather than user-list order (diff_diff/chaisemartin_dhaultfoeuille.py:L5847-L5858), and summary() / to_dataframe(level="by_path") now preserve user-specified path order (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1289-L1293, L1730-L1734).
  • The affected methodology is the dCDH reversible-treatment event study’s per-path disaggregation, plus inherited per-path placebo, bootstrap, sup-t, and trends paths. I did not find an unmitigated estimator, identification, or variance/SE defect.
  • The PR’s substantive deviations from R are documented in the Methodology Registry, including continuous-D rejection, the R substr() bug for D >= 10, and the Python-only paths_of_interest API, so they are informational rather than blockers.
  • Remaining issue is P3 only: some reporting text still hard-codes by_path, so empty-state messaging can be inaccurate when the user actually requested paths_of_interest.
  • This was a static review. I could not run the targeted tests because pytest is not installed in the review environment.

Methodology

  • Severity P3-informational. Impact: the changed method is the dCDH reversible-treatment per-path event-study surface (including inherited per-path placebo/bootstrap/sup-t/trends behavior). I did not find an undocumented mismatch to the registry or an incorrect SE/inference change. The PR’s methodology deviations from R are explicitly documented: continuous non-integer D is rejected at fit time, the R baseline-derivation bug for D >= 10 is called out, and paths_of_interest is documented as a Python-only extension. References: docs/methodology/REGISTRY.md:L641-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L1170-L1224, L2012-L2023, L5614-L5754. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity P3. Impact: the remaining reporting text still labels the surface as by_path even when results were requested via paths_of_interest, and the generic empty-state warning can misdescribe “none of the requested paths were retained” as if the panel had no observed complete-window paths at all. This does not change estimates, but it leaves the public selector contract slightly inconsistent. References: diff_diff/chaisemartin_dhaultfoeuille.py:L5809-L5819, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1273-L1285. Concrete fix: make the header/warnings selector-aware, e.g. “Treatment-Path Disaggregation” / “by_path or paths_of_interest”, and use a distinct empty-state message when paths_of_interest yields zero retained paths.

Tech Debt

  • No findings.

Security

  • No findings.

Documentation/Tests

  • No blocker-level findings. The added coverage spans wrapper routing, parameter validation, ordering semantics, non-binary paths, parity, and cross-feature interactions. Static review only; tests were not executed here because pytest is unavailable.

igerber added a commit that referenced this pull request May 9, 2026
R2 verdict was "Looks good". One P3 housekeeping item addressed:

P3 (maintainability): selector-aware messaging on the empty-state
path:

1. `_compute_path_effects` empty-selection warning now branches on
   `paths_of_interest is not None`. Under `paths_of_interest`, the
   message reads "every user-specified path was either unobserved
   ... per-path 'zero observed groups' UserWarnings already issued"
   so the user gets a clear pointer to the per-path warnings rather
   than a generic "no observed treatment path" message that
   misdescribes the cause. Under `by_path=k`, the original message
   is preserved.

2. `summary()` empty-state header changed from
   "Treatment-Path Disaggregation (by_path)" to
   "Treatment-Path Disaggregation"; the trailing parenthetical now
   says "by_path / paths_of_interest was a no-op on this panel".

Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning`
asserting the paths_of_interest-specific warning text fires when
every requested path is unobserved.

Also fixes 7 pre-existing F541 f-string-without-placeholder warnings
in the warning blocks via `ruff --fix`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 245f19498c5082b160ba070023bb420c0488ebeb


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior issues called out in the last AI pass are addressed. The convenience wrapper now routes paths_of_interest through __init__ (diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8181), frequency_rank is no longer tied to user-list order (diff_diff/chaisemartin_dhaultfoeuille.py:L5865-L5876), and summary() / to_dataframe(level="by_path") now preserve user-specified path order (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1290-L1294, L1731-L1735).
  • The affected methodology is the dCDH reversible-treatment per-path event-study surface, including per-path placebos, bootstrap/sup-t inheritance, and per-path linear-trend cumulation.
  • I did not find an unmitigated estimand, control-pool, or variance/SE defect in the new non-binary integer-D or paths_of_interest paths.
  • The substantive deviations from R are documented in the Methodology Registry, so they are informational rather than defects: continuous-D rejection, the R substr() baseline bug for D >= 10, and the Python-only paths_of_interest API (docs/methodology/REGISTRY.md:L641-L643).
  • Residual issue is P3 only: some user-facing warnings/docs still hard-code by_path or generic “no observed paths” wording even when the active selector is paths_of_interest.
  • Static review only; I could not run the added tests because pytest is not installed in this environment.

Methodology

  • Severity P3-informational. Impact: the changed method is the dCDH per-path reversible-treatment event-study stack. The selector extension leaves the IF/control-pool methodology intact by zeroing only switcher-side contributions while preserving the full control pool and original cohort structure (diff_diff/chaisemartin_dhaultfoeuille.py:L5477-L5609, L6537-L6664); non-binary support is guarded to integer-coded discrete treatment at fit time (diff_diff/chaisemartin_dhaultfoeuille.py:L2012-L2023) and path enumeration canonicalizes integer states only (diff_diff/chaisemartin_dhaultfoeuille.py:L5614-L5754). The R deviations are documented in docs/methodology/REGISTRY.md:L641-L643. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity P3. Impact: selector-aware behavior is still described in a few places as by_path only. The multi-baseline warnings for controls and trends_linear still say by_path + ... even when triggered by paths_of_interest (diff_diff/chaisemartin_dhaultfoeuille.py:L1725-L1735, L1829-L1879), the non-empty summary header still renders Treatment-Path Disaggregation (by_path), and the empty summary text can still misdescribe “all requested paths were unobserved” as “no observed paths have a complete window” (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1287). to_dataframe()’s level docs also still describe the surface as by_path=k only (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1445-L1452). Concrete fix: make warnings, summary headers, and to_dataframe(level="by_path") docs selector-aware, and distinguish “requested paths unobserved” from “panel has no complete-window paths.”

Tech Debt

  • No findings. I did not see a new correctness limitation that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • No blocker-level findings. The added coverage materially improves validation, ordering, non-binary parity, and cross-feature composition checks (tests/test_chaisemartin_dhaultfoeuille.py:L8184-L8938, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321). Static review only; I could not execute them because pytest is unavailable.

igerber added a commit that referenced this pull request May 9, 2026
R2 verdict was "Looks good". One residual P3 housekeeping class
addressed (selector-aware messaging in additional locations):

1. Multi-baseline `UserWarning`s under `controls` and `trends_linear`
   now read "by_path / paths_of_interest + ..." instead of "by_path
   + ..." since both fire under either selector.

2. F_g=3 boundary `UserWarning` under `trends_linear` now reads
   "by_path / paths_of_interest + trends_linear: ..." for the same
   reason.

3. `summary()` non-empty header dropped the "(by_path)" suffix —
   matches the empty-state branch fixed in the prior round.

4. `to_dataframe(level="by_path")` docstring now documents that the
   surface is populated under either `by_path=k` or
   `paths_of_interest=[(...), ...]`.

Updates 5 test assertions that matched the old "by_path + ..."
warning substring; the new "+ trends_linear" / "+ controls"
substring matches both selector forms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 5fc825df830053c1d2eaa444684c347529576682


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review status: the prior substantive regressions appear fixed. The convenience wrapper now routes paths_of_interest through __init__, frequency_rank is no longer tied to user-list order, and summary() / to_dataframe(level="by_path") preserve user-selected path order: diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8190, diff_diff/chaisemartin_dhaultfoeuille.py:L5867-L5878, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1290-L1294, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1732-L1736.
  • Affected methodology: the dCDH reversible-treatment multi-horizon event study’s per-path disaggregation surface.
  • I did not find an unmitigated estimand, control-pool, or variance/SE defect in the new non-binary integer-D or paths_of_interest paths.
  • The substantive deviations from R are documented in the Methodology Registry, so they are informational rather than defects: continuous-D rejection, the R substr() baseline bug for D >= 10, and the Python-only paths_of_interest API: docs/methodology/REGISTRY.md:L641-L643.
  • Residual issue is only P3: the empty summary() renderer still uses generic “no observed paths” wording even when paths_of_interest merely selected unobserved paths: diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1278.
  • Static review only; pytest is not installed in this environment.

Methodology

  • Severity: P3-informational. Impact: the selector change remains a mechanical filter on the existing per-path IF construction. switcher_subset_mask zeroes only non-selected switcher contributions while leaving control-pool construction and full-cohort recentering unchanged, which matches the registry’s documented per-path estimand/SE contract: diff_diff/chaisemartin_dhaultfoeuille.py:L5479-L5608, diff_diff/chaisemartin_dhaultfoeuille.py:L5802-L5929, docs/methodology/REGISTRY.md:L641-L643. Non-binary support is also correctly gated to integer-coded D at fit time: diff_diff/chaisemartin_dhaultfoeuille.py:L2014-L2024. Concrete fix: none.

Code Quality

  • No findings.

Performance

  • No findings.

Maintainability

  • Severity: P3. Impact: fit() now distinguishes the paths_of_interest empty case in its warning text (diff_diff/chaisemartin_dhaultfoeuille.py:L5811-L5827), but summary() still renders every empty path_effects state as “No observed paths have a complete [F_g-1, F_g-1+L_max] window.” (diff_diff/chaisemartin_dhaultfoeuille_results.py:L1269-L1278). That is inaccurate when the panel has observed paths but none match the requested list, so the earlier wording issue is only partially resolved. Concrete fix: persist an empty-state reason or selector type on the results object and render selector-specific empty text in summary(); add a summary-level regression next to the existing warning test at tests/test_chaisemartin_dhaultfoeuille.py:L8735-L8764.

Tech Debt

  • No new findings. I did not identify a new deferred correctness limitation that needs TODO.md tracking.

Security

  • No findings.

Documentation/Tests

  • Severity: P3-informational. Impact: coverage is materially stronger for the changed surface, including wrapper routing, ordering, rank semantics, selector-specific warnings, and non-binary R parity: tests/test_chaisemartin_dhaultfoeuille.py:L8651-L8764, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321. Concrete fix: none from static review; execute the targeted new dCDH tests in CI/local validation.

igerber added a commit that referenced this pull request May 9, 2026
R4 verdict was "Looks good". One P3 housekeeping item addressed:

P3 (maintainability): the `summary()` empty-state renderer for
`path_effects == {}` rendered the same generic "No observed paths
have a complete [F_g-1, F_g-1+L_max] window" text under both
selectors. Under `paths_of_interest`, that text is misleading
because the panel may have observed paths but none match the user's
list. Branch on `_estimator_ref.paths_of_interest`: when set, the
empty-state block now reads "Every path in paths_of_interest was
unobserved or had a window outside L_max+1" with a pointer to the
per-path 'zero observed groups' UserWarnings; under `by_path=k`,
the original wording is preserved.

Adds regression
`test_paths_of_interest_all_unobserved_summary_distinct_text` (next
to the existing `_emits_distinct_warning` test) asserting the
selector-specific summary text fires and the by_path-only wording
does not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 552b66ff20e5e13012553229b5e828c328a63e55


Overall Assessment

✅ Looks good

Executive Summary

Methodology

  • No findings.

Code Quality

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No new findings.

Security

  • No findings.

Documentation/Tests

  • No findings. The docs/tests added here are aligned with the implementation; I could not execute them locally because pytest is unavailable in this environment.

igerber added a commit that referenced this pull request May 9, 2026
R5 verdict was "Looks good" with one P2 code-quality regression
flagged. The previous round's `test_set_params_partial_mutation_recoverable`
documented inconsistent intermediate state as expected behavior:
on a mutex violation, `set_params()` mutated `self` before
validation, so a failed call left both `by_path` and
`paths_of_interest` populated. `fit()` did not re-check the mutex
and `_enumerate_treatment_paths()` silently preferred
`paths_of_interest` when both were set, so subsequent fits ran
with stale selector state unless the caller manually repaired the
object.

Fix: make `set_params()` transactional. Snapshot the pre-call
values for the keys being set, attempt mutation + validation, and
roll back to the snapshot on any exception before re-raising. This
mirrors sklearn-style atomic semantics. The validation block was
extracted into a private `_validate_invariants()` helper so
__init__ and set_params can share the rules without code
duplication risking drift.

Replaces `test_set_params_partial_mutation_recoverable` (which
asserted the buggy partial-mutation behavior) with
`test_set_params_failed_validation_is_transactional`, asserting
that:
1. After a failed `set_params()`, `get_params()` returns
   bit-identical to the pre-call snapshot.
2. Subsequent valid `set_params()` succeeds against the rolled-
   back state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 9d3fff090375b3f3ebc0b1c03c396846e52fdb6d


Overall Assessment

⚠️ Needs changes

Executive Summary

Methodology

No findings.

Code Quality

No findings. The prior mixed-state set_params() problem appears resolved. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L844

Performance

No findings.

Maintainability

No findings.

Tech Debt

No findings. TODO.md does not currently track the P1 interaction-coverage gap above, so it remains unmitigated.

Security

No findings.

Documentation/Tests

Path to Approval

  1. Add a paths_of_interest + trends_linear=True + n_bootstrap>0 regression, ideally with placebo=True, that asserts selected-path filtering, finite per-path bootstrap SE propagation, and populated post-bootstrap path_cumulated_event_study.
  2. Add a paths_of_interest + trends_nonparam="state" + placebo=True + n_bootstrap>0 regression that asserts the selector and set_ids flow through _compute_path_placebos, _collect_path_bootstrap_inputs, _collect_path_placebo_bootstrap_inputs, and the resulting public surfaces.
  3. Replace the current selector-specific sup-t smoke check with one that requires at least one selected path to have a finite crit_value and non-NaN cband_conf_int entries on results.path_effects.

igerber added a commit that referenced this pull request May 9, 2026
P1 (Documentation/Tests): the docstring claimed paths_of_interest
composed with bootstrap, per-path placebos, joint sup-t bands,
trends_linear, and trends_nonparam, but the new tests covered only
the analytical trends_* paths and a triple-combo (paths_of_interest
+ non-binary D + bootstrap + placebo). The R6 reviewer flagged two
specific gaps:

1. paths_of_interest + trends_linear=True + n_bootstrap > 0 +
   placebo=True: post-bootstrap path_cumulated_event_study path
   was untested.
2. paths_of_interest + trends_nonparam="state" + placebo=True +
   n_bootstrap > 0: the set_ids flow through all four per-path
   collectors (analytical, placebo, bootstrap, placebo-bootstrap)
   was untested.

Plus the existing non-binary triple-combo's sup-t check was
vacuous (`assert res.path_sup_t_bands is not None` is satisfied by
`{}` when the strict-majority gate excludes every path).

Fixes:
- Add test_paths_of_interest_trends_linear_bootstrap_placebo
  asserting path-set restriction, finite per-path bootstrap SE on
  path_effects, populated post-bootstrap path_cumulated_event_study
  with finite (effect, SE) per (path, horizon), and populated
  per-path placebo for the same paths.
- Add test_paths_of_interest_trends_nonparam_bootstrap_placebo
  asserting selector + set_ids flow through all four collectors:
  finite analytical/bootstrap (effect, SE) on path_effects, finite
  per-path placebo (effect, SE).
- Tighten test_paths_of_interest_non_binary_bootstrap_placebo:
  bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority
  gate fires, and assert >=1 finite crit_value AND non-NaN
  cband_conf_int per (path, horizon) on path_effects (replacing
  the vacuous "is not None" check).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 25b1f62c7f4821e8267db7ac1fa4c976cedefc70


Overall Assessment

✅ Looks good

Executive Summary

  • The previous re-review blocker appears addressed: the new tests now cover paths_of_interest + trends_linear + n_bootstrap + placebo, paths_of_interest + trends_nonparam + n_bootstrap + placebo, and a non-vacuous per-path sup-t/cband assertion. tests/test_chaisemartin_dhaultfoeuille.py:L8909-L9065
  • I did not find an unmitigated methodology mismatch in the affected method, the dCDH per-path event-study disaggregation. The new selector and non-binary lift reuse the existing per-path IF/SE/bootstrap machinery rather than changing the estimand. diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040 diff_diff/chaisemartin_dhaultfoeuille.py:L2331-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L5631-L5771
  • P3 informational: the continuous-D rejection for per-path selection, the D>=10 R path-string bug, and the Python-only paths_of_interest extension are explicitly documented in the Methodology Registry, so they are not defects. docs/methodology/REGISTRY.md:L641-L643
  • The prior mixed-selector set_params() problem is fixed with transactional rollback and regression coverage. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L852 tests/test_chaisemartin_dhaultfoeuille.py:L8511-L8543
  • Static review only; I could not run the suite locally because numpy and pytest are not installed in this environment.

Methodology

  • Severity: P3 informational. Impact: by_path/paths_of_interest now reject continuous treatment at fit-time instead of silently rounding, and the known R substr(path, 1, 1) bug for D>=10 is documented as an intentional Python/R divergence. I did not find an undocumented change to the dCDH estimand, weighting, or variance formula. Concrete fix: None. diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040 docs/methodology/REGISTRY.md:L641-L643
  • Severity: P3 informational. Impact: paths_of_interest is a Python-only selector, documented as a mechanical filter over observed paths rather than a new methodology branch; the implementation threads it through the same event-study, placebo, bootstrap, sup-t, and cumulation surfaces. Concrete fix: None. diff_diff/chaisemartin_dhaultfoeuille.py:L2331-L2354 diff_diff/chaisemartin_dhaultfoeuille.py:L2499-L2520 diff_diff/chaisemartin_dhaultfoeuille.py:L3043-L3372 docs/methodology/REGISTRY.md:L641-L643

Code Quality

  • No findings. The transactional rollback in set_params() resolves the prior mixed-state bug and is regression-tested. diff_diff/chaisemartin_dhaultfoeuille.py:L778-L852 tests/test_chaisemartin_dhaultfoeuille.py:L8522-L8543

Performance

  • No findings.

Maintainability

  • No findings.

Tech Debt

  • No findings. I did not see newly introduced deferred work that should have been tracked in TODO.md.

Security

  • No findings.

Documentation/Tests

  • No findings. The prior P1 interaction-coverage gap appears closed by the new selector-specific regressions, and the sup-t check now requires finite crit_value and finite cband_conf_int endpoints rather than is not None. tests/test_chaisemartin_dhaultfoeuille.py:L8909-L9065
  • R-parity coverage was extended for the non-binary by_path lift. tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1320
  • Static review only; local execution was not possible because numpy and pytest are unavailable here.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 9, 2026
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 6c467744c91782316c8736e25b125ba87a2af040


Overall Assessment

✅ Looks good. Highest unmitigated finding is P2.

Executive Summary

  • Affected method: ChaisemartinDHaultfoeuille dCDH reversible-treatment event-study per-path disaggregation.
  • No P0/P1 methodology, variance/SE, inference, control-pool, or selector-propagation issues found.
  • paths_of_interest is threaded through event-study, placebo, bootstrap, cumulated trends-linear, sup-t, summary, and dataframe surfaces.
  • Prior re-review concerns appear addressed: reciprocal set_params() mutex tests, transactional rollback, and bootstrap/placebo/trends interaction tests are present.
  • Static review only: I could not run tests because numpy is not installed in this environment.

Methodology

  • Severity: P2
    Impact: The registry documents the R substr(path, 1, 1) baseline bug for D >= 10, but the new tested support includes negative integer treatment paths. The same R single-character baseline parsing issue would also affect negative path strings such as "-1,...", while Python tuple matching handles them correctly. This is not a Python correctness issue, but it leaves an R-parity caveat incomplete for a behavior now explicitly tested.
    Concrete fix: Amend the REGISTRY.md non-binary by_path note to say the R parser caveat applies to multi-character baseline states, including D >= 10 and negative integer states, or narrow the public contract to nonnegative integer-coded states.
    Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L457-L460, diff_diff/chaisemartin_dhaultfoeuille.py:L5708-L5711, tests/test_chaisemartin_dhaultfoeuille.py:L8230-L8270, docs/methodology/REGISTRY.md:L641

  • Severity: P3 informational
    Impact: Continuous-D rejection under per-path selection, the documented R D >= 10 parser divergence, and Python-only paths_of_interest are explicitly documented deviations/extensions, so they are not defects.
    Concrete fix: None.
    Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2029-L2040, docs/methodology/REGISTRY.md:L641

Code Quality

No findings. paths_of_interest validation, get_params(), reciprocal set_params() mutex validation, and transactional rollback are present. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L285-L328, diff_diff/chaisemartin_dhaultfoeuille.py:L678-L852, tests/test_chaisemartin_dhaultfoeuille.py:L8510-L8548

Performance

No findings. The selector reuses existing per-path helper paths and bootstrap input collectors; I did not see a new avoidable asymptotic regression in the changed code.

Maintainability

No findings. Sibling-surface mirror audit covered event-study, placebo, bootstrap collectors, trends-linear cumulation, sup-t bands, summary(), and to_dataframe(level="by_path"). Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2330-L2354, diff_diff/chaisemartin_dhaultfoeuille.py:L2498-L2520, diff_diff/chaisemartin_dhaultfoeuille.py:L3042-L3097, diff_diff/chaisemartin_dhaultfoeuille.py:L3351-L3373, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1261-L1385, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1711-L1827

Tech Debt

No findings. I did not find a new deferred correctness issue relying on TODO.md mitigation.

Security

No findings.

Documentation/Tests

No P1 test gap found. The new tests cover non-binary by_path, continuous-D rejection under per-path selection, empty paths_of_interest, reciprocal selector mutexes, dataframe/summary ordering, controls, trends_linear, trends_nonparam, bootstrap, placebo, and sup-t bands. Pattern-wide grep found no new inline t_stat = effect / se anti-pattern in the affected dCDH files; inference routes through safe_inference(). Locations: tests/test_chaisemartin_dhaultfoeuille.py:L8184-L8448, tests/test_chaisemartin_dhaultfoeuille.py:L8456-L9073, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1191-L1321

igerber and others added 5 commits May 9, 2026 14:23
Bundles two adjacent extensions to ChaisemartinDHaultfoeuille.by_path:

1. Non-binary integer treatment (Wave 3 #8): replace the
   `NotImplementedError` gate at chaisemartin_dhaultfoeuille.py:1870
   with a `ValueError` for continuous D (e.g. 1.5) per the no-silent-
   failures contract; integer-coded D in Z (e.g. ordinal {0, 1, 2})
   is now supported and produces integer-state path tuples
   (e.g. (0, 2, 2, 2)). Validated against R `did_multiplegt_dyn(...,
   by_path)` for D in {0, 1, 2} via the new
   `multi_path_reversible_by_path_non_binary` golden-value scenario:
   per-path point estimates match R bit-exactly (rtol ~1e-9 events,
   rtol+atol envelope for placebo near-zero values), per-path SE
   inherits the documented cross-path cohort-sharing deviation
   (~5% rtol; SE_RTOL=0.15 envelope).

2. paths_of_interest kwarg (Wave 3 #9): new estimator parameter
   accepting a list of int tuples for explicit user-specified path
   selection. Mutually exclusive with by_path=k (raises ValueError
   at __init__ and set_params time). Each tuple length validated at
   __init__ for uniformity, vs L_max+1 at fit-time. Bool / np.bool_
   rejected; np.integer accepted and canonicalized to Python int.
   Duplicates emit a UserWarning and dedupe; unobserved paths emit
   a UserWarning and are omitted from path_effects. Composes with
   non-binary D and all downstream by_path surfaces (bootstrap,
   placebos, sup-t bands, controls, trends_linear, trends_nonparam).
   Python-only API (R has no list-based selection in by_path).

Threading: paths_of_interest added to 6 path-enumeration helper
signatures (`_enumerate_treatment_paths`, `_compute_path_effects`,
`_compute_path_cumulated_event_study`, `_collect_path_bootstrap_inputs`,
`_compute_path_placebos`, `_collect_path_placebo_bootstrap_inputs`).
The `:1118` preconditions gate (drop_larger_lower / L_max / mutex
with heterogeneity / design2 / honest_did / survey_design) and the
11 `self.by_path is not None` activation branches in fit() rerouted
to fire under either selector.

For D >= 10, R's `did_multiplegt_by_path` derives the per-path
baseline via `substr(path_index$path, 1, 1)` which captures only
the first character of the comma-separated path string (e.g.
captures "1" instead of "12" for path = "12,12,..."), mis-allocating
R's per-path control-pool subset. Python's tuple-key matching is
correct; the parity scenario stays in D in {0, 1, 2} to avoid the R
bug.

Adds:
- `_validate_paths_of_interest` module-level helper for canonicalization
- `TestByPathNonBinary` (10 tests, 3 slow)
- `TestPathsOfInterest` (27 tests, 7 slow)
- `TestDCDHDynRParityByPathNonBinary` (1 parity test)
- R script Scenario 19 + regenerated dcdh_dynr_golden_values.json
- REGISTRY.md sub-paragraphs for non-binary + paths_of_interest
- CHANGELOG.md Unreleased > Added entry
- llms-full.txt kwarg listing for paths_of_interest

Removes: the now-stale `test_forbids_non_binary_treatment` gate-
firing test from TestByPathCoverageFitGates.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()`
hard-coded the constructor kwarg allowlist without `paths_of_interest`,
so the new kwarg fell through to `fit()` and raised TypeError. Replace
the static set with a signature-derived split via
`inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future
constructor params cannot drift out of sync. Add helper-level
regression `test_convenience_function_routes_paths_of_interest_to_init`
parallel to the existing `test_convenience_function_matches_class`.

P2: under `paths_of_interest`, `frequency_rank` was assigned from
`enumerate(selected_paths)`, which produces user-selection order
rather than true frequency rank. Decouple the iteration order from
the rank field: keep `selected_paths` iteration in user order
(insertion order preserved on `path_effects.keys()`), but compute
`frequency_rank` as the within-selected-paths rank by descending
group count (lex tiebreak on the path tuple). Under `by_path=k`,
`selected_paths` is already sorted by descending frequency so the
two coincide; under `paths_of_interest`, frequency_rank now reflects
true observed-count rank regardless of user order. Add regression
`test_paths_of_interest_frequency_rank_is_true_frequency`.

P3: docs/source-docstring binary-only language on per-path
disaggregation cleaned up:
- class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop
  "binary treatment", add `paths_of_interest` mention)
- docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add
  `paths_of_interest` and explicit "binary or integer-coded discrete"
  language)
- diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452
  (`path_effects`, `path_placebo_event_study`,
  `path_cumulated_event_study`, `path_sup_t_bands` activator clauses
  now read "by_path is a positive int OR paths_of_interest is set")
- docs/methodology/REGISTRY.md:671 checklist line (add
  "or integer-coded discrete" + `paths_of_interest`)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 verdict was "Looks good". Two P3 housekeeping items addressed:

P3 (maintainability): `summary()` and `to_dataframe(level="by_path")`
re-sorted output by `frequency_rank`, dropping the user-specified
order under `paths_of_interest`. Both reporting surfaces now iterate
in `path_effects` insertion order so user-specified order is
preserved end-to-end. Under `by_path=k`, insertion order matches
descending frequency_rank so output is identical to the prior
release. Adds regressions
`test_paths_of_interest_order_preserved_in_to_dataframe` and
`test_paths_of_interest_order_preserved_in_summary`.

P3 (docs): remaining `by_path`-only language in result docs and
REGISTRY:
- `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty-
  state contract: "no bootstrap or `by_path is None`" -> "no
  bootstrap, or both `by_path` and `paths_of_interest` are `None`"
- `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level=
  "by_path")` ValueError text: pass `paths_of_interest` as an
  alternative to `by_path=k`
- `docs/methodology/REGISTRY.md:643` joint sup-t bands Note:
  empty-state contract updated for both selectors

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verdict was "Looks good". One P3 housekeeping item addressed:

P3 (maintainability): selector-aware messaging on the empty-state
path:

1. `_compute_path_effects` empty-selection warning now branches on
   `paths_of_interest is not None`. Under `paths_of_interest`, the
   message reads "every user-specified path was either unobserved
   ... per-path 'zero observed groups' UserWarnings already issued"
   so the user gets a clear pointer to the per-path warnings rather
   than a generic "no observed treatment path" message that
   misdescribes the cause. Under `by_path=k`, the original message
   is preserved.

2. `summary()` empty-state header changed from
   "Treatment-Path Disaggregation (by_path)" to
   "Treatment-Path Disaggregation"; the trailing parenthetical now
   says "by_path / paths_of_interest was a no-op on this panel".

Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning`
asserting the paths_of_interest-specific warning text fires when
every requested path is unobserved.

Also fixes 7 pre-existing F541 f-string-without-placeholder warnings
in the warning blocks via `ruff --fix`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verdict was "Looks good". One residual P3 housekeeping class
addressed (selector-aware messaging in additional locations):

1. Multi-baseline `UserWarning`s under `controls` and `trends_linear`
   now read "by_path / paths_of_interest + ..." instead of "by_path
   + ..." since both fire under either selector.

2. F_g=3 boundary `UserWarning` under `trends_linear` now reads
   "by_path / paths_of_interest + trends_linear: ..." for the same
   reason.

3. `summary()` non-empty header dropped the "(by_path)" suffix —
   matches the empty-state branch fixed in the prior round.

4. `to_dataframe(level="by_path")` docstring now documents that the
   surface is populated under either `by_path=k` or
   `paths_of_interest=[(...), ...]`.

Updates 5 test assertions that matched the old "by_path + ..."
warning substring; the new "+ trends_linear" / "+ controls"
substring matches both selector forms.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber and others added 4 commits May 9, 2026 14:23
R4 verdict was "Looks good". One P3 housekeeping item addressed:

P3 (maintainability): the `summary()` empty-state renderer for
`path_effects == {}` rendered the same generic "No observed paths
have a complete [F_g-1, F_g-1+L_max] window" text under both
selectors. Under `paths_of_interest`, that text is misleading
because the panel may have observed paths but none match the user's
list. Branch on `_estimator_ref.paths_of_interest`: when set, the
empty-state block now reads "Every path in paths_of_interest was
unobserved or had a window outside L_max+1" with a pointer to the
per-path 'zero observed groups' UserWarnings; under `by_path=k`,
the original wording is preserved.

Adds regression
`test_paths_of_interest_all_unobserved_summary_distinct_text` (next
to the existing `_emits_distinct_warning` test) asserting the
selector-specific summary text fires and the by_path-only wording
does not.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 verdict was "Looks good" with one P2 code-quality regression
flagged. The previous round's `test_set_params_partial_mutation_recoverable`
documented inconsistent intermediate state as expected behavior:
on a mutex violation, `set_params()` mutated `self` before
validation, so a failed call left both `by_path` and
`paths_of_interest` populated. `fit()` did not re-check the mutex
and `_enumerate_treatment_paths()` silently preferred
`paths_of_interest` when both were set, so subsequent fits ran
with stale selector state unless the caller manually repaired the
object.

Fix: make `set_params()` transactional. Snapshot the pre-call
values for the keys being set, attempt mutation + validation, and
roll back to the snapshot on any exception before re-raising. This
mirrors sklearn-style atomic semantics. The validation block was
extracted into a private `_validate_invariants()` helper so
__init__ and set_params can share the rules without code
duplication risking drift.

Replaces `test_set_params_partial_mutation_recoverable` (which
asserted the buggy partial-mutation behavior) with
`test_set_params_failed_validation_is_transactional`, asserting
that:
1. After a failed `set_params()`, `get_params()` returns
   bit-identical to the pre-call snapshot.
2. Subsequent valid `set_params()` succeeds against the rolled-
   back state.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 (Documentation/Tests): the docstring claimed paths_of_interest
composed with bootstrap, per-path placebos, joint sup-t bands,
trends_linear, and trends_nonparam, but the new tests covered only
the analytical trends_* paths and a triple-combo (paths_of_interest
+ non-binary D + bootstrap + placebo). The R6 reviewer flagged two
specific gaps:

1. paths_of_interest + trends_linear=True + n_bootstrap > 0 +
   placebo=True: post-bootstrap path_cumulated_event_study path
   was untested.
2. paths_of_interest + trends_nonparam="state" + placebo=True +
   n_bootstrap > 0: the set_ids flow through all four per-path
   collectors (analytical, placebo, bootstrap, placebo-bootstrap)
   was untested.

Plus the existing non-binary triple-combo's sup-t check was
vacuous (`assert res.path_sup_t_bands is not None` is satisfied by
`{}` when the strict-majority gate excludes every path).

Fixes:
- Add test_paths_of_interest_trends_linear_bootstrap_placebo
  asserting path-set restriction, finite per-path bootstrap SE on
  path_effects, populated post-bootstrap path_cumulated_event_study
  with finite (effect, SE) per (path, horizon), and populated
  per-path placebo for the same paths.
- Add test_paths_of_interest_trends_nonparam_bootstrap_placebo
  asserting selector + set_ids flow through all four collectors:
  finite analytical/bootstrap (effect, SE) on path_effects, finite
  per-path placebo (effect, SE).
- Tighten test_paths_of_interest_non_binary_bootstrap_placebo:
  bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority
  gate fires, and assert >=1 finite crit_value AND non-NaN
  cband_conf_int per (path, horizon) on path_effects (replacing
  the vacuous "is not None" check).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D in test

`tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_non_integer_D_raises`
failed across all 7 CI runners (macos py3.11/3.14, ubuntu-24.04-arm
py3.11/3.13/3.14, ubuntu-latest py3.14, windows-latest py3.11) with:

    TypeError: Invalid value '1.5' for dtype 'int64'

The fixture builds `df["treatment"]` as integers in {0, 1, 2}, so the
column dtype is int64. On pandas >= 2.x (the version pinned in CI for
py3.11+ runners), `df.loc[mask, "treatment"] = 1.5` raises TypeError
outright; on older pandas (the local dev env) it emitted a
FutureWarning and proceeded. The test path was supposed to exercise
the estimator's D-integer guard at fit-time, not pandas's dtype
coercion behavior.

Cast `df["treatment"]` to float before injecting 1.5 so the test
behaves identically across pandas versions and reaches the
ValueError raise it was written to verify.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@igerber igerber force-pushed the dcdh-by-path-non-binary-poi branch from 6c46774 to 4e24cac Compare May 9, 2026 18:24
@igerber
Copy link
Copy Markdown
Owner Author

igerber commented May 9, 2026

/ai-review

@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 9, 2026

🔁 AI review rerun (requested by @igerber)

Head SHA: 4e24cac05fa5e80ca1a1f1ad4a3ed45e478c276d


Overall Assessment

✅ Looks good. Highest unmitigated finding is P2.

Executive Summary

  • Affected method: ChaisemartinDHaultfoeuille dCDH reversible-treatment event-study per-path disaggregation.
  • No P0/P1 methodology, variance/SE, inference, control-pool, or selector-propagation issues found.
  • paths_of_interest is propagated through event-study, placebo, bootstrap, cumulated trends-linear, sup-t bands, summary, dataframe, get_params(), set_params(), and the convenience wrapper.
  • Prior P2 documentation caveat remains: negative integer paths are supported/tested but the R parser caveat only mentions D >= 10.
  • I attempted targeted pytest coverage, but pytest is not installed in this environment.

Methodology

  • Severity: P2
    Impact: Prior documentation caveat remains incomplete. The registry documents R’s substr(path, 1, 1) baseline parsing bug only for D >= 10, but this PR also explicitly supports negative integer states. R would parse a path like "-1,..." as baseline "-", while Python tuple matching handles it correctly. This is not a Python correctness defect, but it is an incomplete R-parity caveat for a now-tested public behavior.
    Concrete fix: Amend the non-binary by_path note to say the R parser caveat applies to multi-character baseline states, including D >= 10 and negative integers, or narrow public docs to nonnegative integer-coded states.
    Locations: docs/methodology/REGISTRY.md:L641, tests/test_chaisemartin_dhaultfoeuille.py:L8230-L8270

  • Severity: P3 informational
    Impact: Continuous-D rejection, Python-only paths_of_interest, cross-path cohort-sharing SE, bootstrap percentile CI, and per-path sup-t asymmetry are documented under labeled registry notes/deviations, so they are not methodology defects.
    Concrete fix: None.
    Locations: docs/methodology/REGISTRY.md:L641-L643

Code Quality

No findings. The new parameter is validated, canonicalized, included in get_params(), guarded transactionally in set_params(), and routed by the convenience wrapper. Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L285-L328, diff_diff/chaisemartin_dhaultfoeuille.py:L680-L853, diff_diff/chaisemartin_dhaultfoeuille.py:L8174-L8206

Performance

No findings. The explicit selector reuses the existing path enumeration and per-path IF/bootstrap collectors; no new avoidable asymptotic issue was apparent in the changed paths.

Maintainability

No findings. Sibling-surface mirror audit covered path effects, placebos, bootstrap inputs, placebo bootstrap inputs, trends-linear cumulation, sup-t bands, summary(), and to_dataframe(level="by_path"). Locations: diff_diff/chaisemartin_dhaultfoeuille.py:L2330-L2354, diff_diff/chaisemartin_dhaultfoeuille.py:L2499-L2514, diff_diff/chaisemartin_dhaultfoeuille.py:L3043-L3097, diff_diff/chaisemartin_dhaultfoeuille.py:L3351-L3373, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1261-L1385, diff_diff/chaisemartin_dhaultfoeuille_results.py:L1711-L1827

Tech Debt

No findings. I did not find a new silent correctness issue relying on TODO.md mitigation.

Security

No findings. Secret-pattern grep over changed files found no credential-like strings.

Documentation/Tests

  • Severity: P3 informational
    Impact: Negative integer R-caveat docs remain incomplete as noted under Methodology. Test coverage otherwise covers validation, reciprocal mutex behavior, transactional rollback, ordering, empty result sets, non-binary D, controls, trends, bootstrap, placebo, and sup-t interactions.
    Concrete fix: Same registry wording update as above.
    Locations: docs/methodology/REGISTRY.md:L641, tests/test_chaisemartin_dhaultfoeuille.py:L8456-L9108

Verification notes: pattern-wide grep found no new inline dCDH inference anti-pattern; dCDH inference routes through safe_inference(). I attempted targeted tests, but the environment returned pytest: command not found.

@igerber igerber merged commit 4469f54 into main May 9, 2026
23 checks passed
@igerber igerber deleted the dcdh-by-path-non-binary-poi branch May 9, 2026 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant