dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest#401
dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest#401
Conversation
|
Overall Assessment Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
Path to Approval
|
P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()` hard-coded the constructor kwarg allowlist without `paths_of_interest`, so the new kwarg fell through to `fit()` and raised TypeError. Replace the static set with a signature-derived split via `inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future constructor params cannot drift out of sync. Add helper-level regression `test_convenience_function_routes_paths_of_interest_to_init` parallel to the existing `test_convenience_function_matches_class`. P2: under `paths_of_interest`, `frequency_rank` was assigned from `enumerate(selected_paths)`, which produces user-selection order rather than true frequency rank. Decouple the iteration order from the rank field: keep `selected_paths` iteration in user order (insertion order preserved on `path_effects.keys()`), but compute `frequency_rank` as the within-selected-paths rank by descending group count (lex tiebreak on the path tuple). Under `by_path=k`, `selected_paths` is already sorted by descending frequency so the two coincide; under `paths_of_interest`, frequency_rank now reflects true observed-count rank regardless of user order. Add regression `test_paths_of_interest_frequency_rank_is_true_frequency`. P3: docs/source-docstring binary-only language on per-path disaggregation cleaned up: - class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop "binary treatment", add `paths_of_interest` mention) - docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add `paths_of_interest` and explicit "binary or integer-coded discrete" language) - diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452 (`path_effects`, `path_placebo_event_study`, `path_cumulated_event_study`, `path_sup_t_bands` activator clauses now read "by_path is a positive int OR paths_of_interest is set") - docs/methodology/REGISTRY.md:671 checklist line (add "or integer-coded discrete" + `paths_of_interest`) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R1 verdict was "Looks good". Two P3 housekeeping items addressed: P3 (maintainability): `summary()` and `to_dataframe(level="by_path")` re-sorted output by `frequency_rank`, dropping the user-specified order under `paths_of_interest`. Both reporting surfaces now iterate in `path_effects` insertion order so user-specified order is preserved end-to-end. Under `by_path=k`, insertion order matches descending frequency_rank so output is identical to the prior release. Adds regressions `test_paths_of_interest_order_preserved_in_to_dataframe` and `test_paths_of_interest_order_preserved_in_summary`. P3 (docs): remaining `by_path`-only language in result docs and REGISTRY: - `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty- state contract: "no bootstrap or `by_path is None`" -> "no bootstrap, or both `by_path` and `paths_of_interest` are `None`" - `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level= "by_path")` ValueError text: pass `paths_of_interest` as an alternative to `by_path=k` - `docs/methodology/REGISTRY.md:643` joint sup-t bands Note: empty-state contract updated for both selectors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R2 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): selector-aware messaging on the empty-state path: 1. `_compute_path_effects` empty-selection warning now branches on `paths_of_interest is not None`. Under `paths_of_interest`, the message reads "every user-specified path was either unobserved ... per-path 'zero observed groups' UserWarnings already issued" so the user gets a clear pointer to the per-path warnings rather than a generic "no observed treatment path" message that misdescribes the cause. Under `by_path=k`, the original message is preserved. 2. `summary()` empty-state header changed from "Treatment-Path Disaggregation (by_path)" to "Treatment-Path Disaggregation"; the trailing parenthetical now says "by_path / paths_of_interest was a no-op on this panel". Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning` asserting the paths_of_interest-specific warning text fires when every requested path is unobserved. Also fixes 7 pre-existing F541 f-string-without-placeholder warnings in the warning blocks via `ruff --fix`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R2 verdict was "Looks good". One residual P3 housekeeping class addressed (selector-aware messaging in additional locations): 1. Multi-baseline `UserWarning`s under `controls` and `trends_linear` now read "by_path / paths_of_interest + ..." instead of "by_path + ..." since both fire under either selector. 2. F_g=3 boundary `UserWarning` under `trends_linear` now reads "by_path / paths_of_interest + trends_linear: ..." for the same reason. 3. `summary()` non-empty header dropped the "(by_path)" suffix — matches the empty-state branch fixed in the prior round. 4. `to_dataframe(level="by_path")` docstring now documents that the surface is populated under either `by_path=k` or `paths_of_interest=[(...), ...]`. Updates 5 test assertions that matched the old "by_path + ..." warning substring; the new "+ trends_linear" / "+ controls" substring matches both selector forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R4 verdict was "Looks good". One P3 housekeeping item addressed:
P3 (maintainability): the `summary()` empty-state renderer for
`path_effects == {}` rendered the same generic "No observed paths
have a complete [F_g-1, F_g-1+L_max] window" text under both
selectors. Under `paths_of_interest`, that text is misleading
because the panel may have observed paths but none match the user's
list. Branch on `_estimator_ref.paths_of_interest`: when set, the
empty-state block now reads "Every path in paths_of_interest was
unobserved or had a window outside L_max+1" with a pointer to the
per-path 'zero observed groups' UserWarnings; under `by_path=k`,
the original wording is preserved.
Adds regression
`test_paths_of_interest_all_unobserved_summary_distinct_text` (next
to the existing `_emits_distinct_warning` test) asserting the
selector-specific summary text fires and the by_path-only wording
does not.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
R5 verdict was "Looks good" with one P2 code-quality regression flagged. The previous round's `test_set_params_partial_mutation_recoverable` documented inconsistent intermediate state as expected behavior: on a mutex violation, `set_params()` mutated `self` before validation, so a failed call left both `by_path` and `paths_of_interest` populated. `fit()` did not re-check the mutex and `_enumerate_treatment_paths()` silently preferred `paths_of_interest` when both were set, so subsequent fits ran with stale selector state unless the caller manually repaired the object. Fix: make `set_params()` transactional. Snapshot the pre-call values for the keys being set, attempt mutation + validation, and roll back to the snapshot on any exception before re-raising. This mirrors sklearn-style atomic semantics. The validation block was extracted into a private `_validate_invariants()` helper so __init__ and set_params can share the rules without code duplication risking drift. Replaces `test_set_params_partial_mutation_recoverable` (which asserted the buggy partial-mutation behavior) with `test_set_params_failed_validation_is_transactional`, asserting that: 1. After a failed `set_params()`, `get_params()` returns bit-identical to the pre-call snapshot. 2. Subsequent valid `set_params()` succeeds against the rolled- back state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment Executive Summary
Methodology No findings. Code Quality No findings. The prior mixed-state Performance No findings. Maintainability No findings. Tech Debt No findings. Security No findings. Documentation/Tests
Path to Approval
|
P1 (Documentation/Tests): the docstring claimed paths_of_interest
composed with bootstrap, per-path placebos, joint sup-t bands,
trends_linear, and trends_nonparam, but the new tests covered only
the analytical trends_* paths and a triple-combo (paths_of_interest
+ non-binary D + bootstrap + placebo). The R6 reviewer flagged two
specific gaps:
1. paths_of_interest + trends_linear=True + n_bootstrap > 0 +
placebo=True: post-bootstrap path_cumulated_event_study path
was untested.
2. paths_of_interest + trends_nonparam="state" + placebo=True +
n_bootstrap > 0: the set_ids flow through all four per-path
collectors (analytical, placebo, bootstrap, placebo-bootstrap)
was untested.
Plus the existing non-binary triple-combo's sup-t check was
vacuous (`assert res.path_sup_t_bands is not None` is satisfied by
`{}` when the strict-majority gate excludes every path).
Fixes:
- Add test_paths_of_interest_trends_linear_bootstrap_placebo
asserting path-set restriction, finite per-path bootstrap SE on
path_effects, populated post-bootstrap path_cumulated_event_study
with finite (effect, SE) per (path, horizon), and populated
per-path placebo for the same paths.
- Add test_paths_of_interest_trends_nonparam_bootstrap_placebo
asserting selector + set_ids flow through all four collectors:
finite analytical/bootstrap (effect, SE) on path_effects, finite
per-path placebo (effect, SE).
- Tighten test_paths_of_interest_non_binary_bootstrap_placebo:
bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority
gate fires, and assert >=1 finite crit_value AND non-NaN
cband_conf_int per (path, horizon) on path_effects (replacing
the vacuous "is not None" check).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good. Highest unmitigated finding is P2. Executive Summary
Methodology
Code Quality No findings. Performance No findings. The selector reuses existing per-path helper paths and bootstrap input collectors; I did not see a new avoidable asymptotic regression in the changed code. Maintainability No findings. Sibling-surface mirror audit covered event-study, placebo, bootstrap collectors, trends-linear cumulation, sup-t bands, Tech Debt No findings. I did not find a new deferred correctness issue relying on Security No findings. Documentation/Tests No P1 test gap found. The new tests cover non-binary |
Bundles two adjacent extensions to ChaisemartinDHaultfoeuille.by_path: 1. Non-binary integer treatment (Wave 3 #8): replace the `NotImplementedError` gate at chaisemartin_dhaultfoeuille.py:1870 with a `ValueError` for continuous D (e.g. 1.5) per the no-silent- failures contract; integer-coded D in Z (e.g. ordinal {0, 1, 2}) is now supported and produces integer-state path tuples (e.g. (0, 2, 2, 2)). Validated against R `did_multiplegt_dyn(..., by_path)` for D in {0, 1, 2} via the new `multi_path_reversible_by_path_non_binary` golden-value scenario: per-path point estimates match R bit-exactly (rtol ~1e-9 events, rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol; SE_RTOL=0.15 envelope). 2. paths_of_interest kwarg (Wave 3 #9): new estimator parameter accepting a list of int tuples for explicit user-specified path selection. Mutually exclusive with by_path=k (raises ValueError at __init__ and set_params time). Each tuple length validated at __init__ for uniformity, vs L_max+1 at fit-time. Bool / np.bool_ rejected; np.integer accepted and canonicalized to Python int. Duplicates emit a UserWarning and dedupe; unobserved paths emit a UserWarning and are omitted from path_effects. Composes with non-binary D and all downstream by_path surfaces (bootstrap, placebos, sup-t bands, controls, trends_linear, trends_nonparam). Python-only API (R has no list-based selection in by_path). Threading: paths_of_interest added to 6 path-enumeration helper signatures (`_enumerate_treatment_paths`, `_compute_path_effects`, `_compute_path_cumulated_event_study`, `_collect_path_bootstrap_inputs`, `_compute_path_placebos`, `_collect_path_placebo_bootstrap_inputs`). The `:1118` preconditions gate (drop_larger_lower / L_max / mutex with heterogeneity / design2 / honest_did / survey_design) and the 11 `self.by_path is not None` activation branches in fit() rerouted to fire under either selector. For D >= 10, R's `did_multiplegt_by_path` derives the per-path baseline via `substr(path_index$path, 1, 1)` which captures only the first character of the comma-separated path string (e.g. captures "1" instead of "12" for path = "12,12,..."), mis-allocating R's per-path control-pool subset. Python's tuple-key matching is correct; the parity scenario stays in D in {0, 1, 2} to avoid the R bug. Adds: - `_validate_paths_of_interest` module-level helper for canonicalization - `TestByPathNonBinary` (10 tests, 3 slow) - `TestPathsOfInterest` (27 tests, 7 slow) - `TestDCDHDynRParityByPathNonBinary` (1 parity test) - R script Scenario 19 + regenerated dcdh_dynr_golden_values.json - REGISTRY.md sub-paragraphs for non-binary + paths_of_interest - CHANGELOG.md Unreleased > Added entry - llms-full.txt kwarg listing for paths_of_interest Removes: the now-stale `test_forbids_non_binary_treatment` gate- firing test from TestByPathCoverageFitGates. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 (blocker): the convenience helper `chaisemartin_dhaultfoeuille()` hard-coded the constructor kwarg allowlist without `paths_of_interest`, so the new kwarg fell through to `fit()` and raised TypeError. Replace the static set with a signature-derived split via `inspect.signature(ChaisemartinDHaultfoeuille.__init__)` so future constructor params cannot drift out of sync. Add helper-level regression `test_convenience_function_routes_paths_of_interest_to_init` parallel to the existing `test_convenience_function_matches_class`. P2: under `paths_of_interest`, `frequency_rank` was assigned from `enumerate(selected_paths)`, which produces user-selection order rather than true frequency rank. Decouple the iteration order from the rank field: keep `selected_paths` iteration in user order (insertion order preserved on `path_effects.keys()`), but compute `frequency_rank` as the within-selected-paths rank by descending group count (lex tiebreak on the path tuple). Under `by_path=k`, `selected_paths` is already sorted by descending frequency so the two coincide; under `paths_of_interest`, frequency_rank now reflects true observed-count rank regardless of user order. Add regression `test_paths_of_interest_frequency_rank_is_true_frequency`. P3: docs/source-docstring binary-only language on per-path disaggregation cleaned up: - class summary at chaisemartin_dhaultfoeuille.py:367-370 (drop "binary treatment", add `paths_of_interest` mention) - docs/api/chaisemartin_dhaultfoeuille.rst:14-22 (add `paths_of_interest` and explicit "binary or integer-coded discrete" language) - diff_diff/chaisemartin_dhaultfoeuille_results.py:394-452 (`path_effects`, `path_placebo_event_study`, `path_cumulated_event_study`, `path_sup_t_bands` activator clauses now read "by_path is a positive int OR paths_of_interest is set") - docs/methodology/REGISTRY.md:671 checklist line (add "or integer-coded discrete" + `paths_of_interest`) Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R1 verdict was "Looks good". Two P3 housekeeping items addressed: P3 (maintainability): `summary()` and `to_dataframe(level="by_path")` re-sorted output by `frequency_rank`, dropping the user-specified order under `paths_of_interest`. Both reporting surfaces now iterate in `path_effects` insertion order so user-specified order is preserved end-to-end. Under `by_path=k`, insertion order matches descending frequency_rank so output is identical to the prior release. Adds regressions `test_paths_of_interest_order_preserved_in_to_dataframe` and `test_paths_of_interest_order_preserved_in_summary`. P3 (docs): remaining `by_path`-only language in result docs and REGISTRY: - `chaisemartin_dhaultfoeuille_results.py` `path_sup_t_bands` empty- state contract: "no bootstrap or `by_path is None`" -> "no bootstrap, or both `by_path` and `paths_of_interest` are `None`" - `chaisemartin_dhaultfoeuille_results.py` `to_dataframe(level= "by_path")` ValueError text: pass `paths_of_interest` as an alternative to `by_path=k` - `docs/methodology/REGISTRY.md:643` joint sup-t bands Note: empty-state contract updated for both selectors Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verdict was "Looks good". One P3 housekeeping item addressed: P3 (maintainability): selector-aware messaging on the empty-state path: 1. `_compute_path_effects` empty-selection warning now branches on `paths_of_interest is not None`. Under `paths_of_interest`, the message reads "every user-specified path was either unobserved ... per-path 'zero observed groups' UserWarnings already issued" so the user gets a clear pointer to the per-path warnings rather than a generic "no observed treatment path" message that misdescribes the cause. Under `by_path=k`, the original message is preserved. 2. `summary()` empty-state header changed from "Treatment-Path Disaggregation (by_path)" to "Treatment-Path Disaggregation"; the trailing parenthetical now says "by_path / paths_of_interest was a no-op on this panel". Adds regression `test_paths_of_interest_all_unobserved_emits_distinct_warning` asserting the paths_of_interest-specific warning text fires when every requested path is unobserved. Also fixes 7 pre-existing F541 f-string-without-placeholder warnings in the warning blocks via `ruff --fix`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R2 verdict was "Looks good". One residual P3 housekeeping class addressed (selector-aware messaging in additional locations): 1. Multi-baseline `UserWarning`s under `controls` and `trends_linear` now read "by_path / paths_of_interest + ..." instead of "by_path + ..." since both fire under either selector. 2. F_g=3 boundary `UserWarning` under `trends_linear` now reads "by_path / paths_of_interest + trends_linear: ..." for the same reason. 3. `summary()` non-empty header dropped the "(by_path)" suffix — matches the empty-state branch fixed in the prior round. 4. `to_dataframe(level="by_path")` docstring now documents that the surface is populated under either `by_path=k` or `paths_of_interest=[(...), ...]`. Updates 5 test assertions that matched the old "by_path + ..." warning substring; the new "+ trends_linear" / "+ controls" substring matches both selector forms. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R4 verdict was "Looks good". One P3 housekeeping item addressed:
P3 (maintainability): the `summary()` empty-state renderer for
`path_effects == {}` rendered the same generic "No observed paths
have a complete [F_g-1, F_g-1+L_max] window" text under both
selectors. Under `paths_of_interest`, that text is misleading
because the panel may have observed paths but none match the user's
list. Branch on `_estimator_ref.paths_of_interest`: when set, the
empty-state block now reads "Every path in paths_of_interest was
unobserved or had a window outside L_max+1" with a pointer to the
per-path 'zero observed groups' UserWarnings; under `by_path=k`,
the original wording is preserved.
Adds regression
`test_paths_of_interest_all_unobserved_summary_distinct_text` (next
to the existing `_emits_distinct_warning` test) asserting the
selector-specific summary text fires and the by_path-only wording
does not.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
R5 verdict was "Looks good" with one P2 code-quality regression flagged. The previous round's `test_set_params_partial_mutation_recoverable` documented inconsistent intermediate state as expected behavior: on a mutex violation, `set_params()` mutated `self` before validation, so a failed call left both `by_path` and `paths_of_interest` populated. `fit()` did not re-check the mutex and `_enumerate_treatment_paths()` silently preferred `paths_of_interest` when both were set, so subsequent fits ran with stale selector state unless the caller manually repaired the object. Fix: make `set_params()` transactional. Snapshot the pre-call values for the keys being set, attempt mutation + validation, and roll back to the snapshot on any exception before re-raising. This mirrors sklearn-style atomic semantics. The validation block was extracted into a private `_validate_invariants()` helper so __init__ and set_params can share the rules without code duplication risking drift. Replaces `test_set_params_partial_mutation_recoverable` (which asserted the buggy partial-mutation behavior) with `test_set_params_failed_validation_is_transactional`, asserting that: 1. After a failed `set_params()`, `get_params()` returns bit-identical to the pre-call snapshot. 2. Subsequent valid `set_params()` succeeds against the rolled- back state. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
P1 (Documentation/Tests): the docstring claimed paths_of_interest
composed with bootstrap, per-path placebos, joint sup-t bands,
trends_linear, and trends_nonparam, but the new tests covered only
the analytical trends_* paths and a triple-combo (paths_of_interest
+ non-binary D + bootstrap + placebo). The R6 reviewer flagged two
specific gaps:
1. paths_of_interest + trends_linear=True + n_bootstrap > 0 +
placebo=True: post-bootstrap path_cumulated_event_study path
was untested.
2. paths_of_interest + trends_nonparam="state" + placebo=True +
n_bootstrap > 0: the set_ids flow through all four per-path
collectors (analytical, placebo, bootstrap, placebo-bootstrap)
was untested.
Plus the existing non-binary triple-combo's sup-t check was
vacuous (`assert res.path_sup_t_bands is not None` is satisfied by
`{}` when the strict-majority gate excludes every path).
Fixes:
- Add test_paths_of_interest_trends_linear_bootstrap_placebo
asserting path-set restriction, finite per-path bootstrap SE on
path_effects, populated post-bootstrap path_cumulated_event_study
with finite (effect, SE) per (path, horizon), and populated
per-path placebo for the same paths.
- Add test_paths_of_interest_trends_nonparam_bootstrap_placebo
asserting selector + set_ids flow through all four collectors:
finite analytical/bootstrap (effect, SE) on path_effects, finite
per-path placebo (effect, SE).
- Tighten test_paths_of_interest_non_binary_bootstrap_placebo:
bump to 2 paths and n_bootstrap=400 so the sup-t strict-majority
gate fires, and assert >=1 finite crit_value AND non-NaN
cband_conf_int per (path, horizon) on path_effects (replacing
the vacuous "is not None" check).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…D in test
`tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary::test_non_integer_D_raises`
failed across all 7 CI runners (macos py3.11/3.14, ubuntu-24.04-arm
py3.11/3.13/3.14, ubuntu-latest py3.14, windows-latest py3.11) with:
TypeError: Invalid value '1.5' for dtype 'int64'
The fixture builds `df["treatment"]` as integers in {0, 1, 2}, so the
column dtype is int64. On pandas >= 2.x (the version pinned in CI for
py3.11+ runners), `df.loc[mask, "treatment"] = 1.5` raises TypeError
outright; on older pandas (the local dev env) it emitted a
FutureWarning and proceeded. The test path was supposed to exercise
the estimator's D-integer guard at fit-time, not pandas's dtype
coercion behavior.
Cast `df["treatment"]` to float before injecting 1.5 so the test
behaves identically across pandas versions and reaches the
ValueError raise it was written to verify.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
6c46774 to
4e24cac
Compare
|
/ai-review |
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment✅ Looks good. Highest unmitigated finding is P2. Executive Summary
Methodology
Code QualityNo findings. The new parameter is validated, canonicalized, included in PerformanceNo findings. The explicit selector reuses the existing path enumeration and per-path IF/bootstrap collectors; no new avoidable asymptotic issue was apparent in the changed paths. MaintainabilityNo findings. Sibling-surface mirror audit covered path effects, placebos, bootstrap inputs, placebo bootstrap inputs, trends-linear cumulation, sup-t bands, Tech DebtNo findings. I did not find a new silent correctness issue relying on SecurityNo findings. Secret-pattern grep over changed files found no credential-like strings. Documentation/Tests
Verification notes: pattern-wide grep found no new inline dCDH inference anti-pattern; dCDH inference routes through |
Summary
ChaisemartinDHaultfoeuille.by_pathnow accepts non-binary integer treatment (D in Z, e.g. ordinal{0, 1, 2}). Path tuples become integer-state tuples like(0, 2, 2, 2). Continuous D (e.g.1.5) raisesValueErrorat fit-time per the no-silent-failures contract.paths_of_interestkwarg for user-specified path subsets, alternative toby_path=k's top-k automatic ranking. Mutually exclusive withby_path; setting both raisesValueError. Composes with non-binary D and all downstreamby_pathsurfaces. Python-only API (R has no list-based selection).chaisemartin_dhaultfoeuille.py:1870replaced by D-integer validation; the:1118preconditions gate and 11self.by_path is not Noneactivation branches infit()rerouted to fire under either selector. The 6 path-enumeration helper signatures gain apaths_of_interestparameter; 5 forward to_enumerate_treatment_paths.Methodology references (required if estimator / math changes)
by_pathper-path event-study disaggregation #357 (foundation); R referenceDIDmultiplegtDYN 2.3.3(per-path dispatcher indid_multiplegt_dyn, path-string construction indid_multiplegt_by_path)feedback_no_silent_failures. Documented in REGISTRY.md.substr(path, 1, 1)baseline-derivation bug for D >= 10: R's helperdid_multiplegt_by_pathcaptures only the first character of the comma-separated path string, mis-allocating R's per-path control-pool subset for multi-digit D values. Python's tuple-key matching is correct; the parity scenario stays inD in {0, 1, 2}to avoid the R bug. Documented in REGISTRY.md and CHANGELOG.md.paths_of_interestis Python-only: R'sby_pathaccepts only positive int (top-k) or-1(all). No list-based selection in R; no R-parity test for this kwarg.Validation
tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary(10 tests, 3 slow)tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest(27 tests, 7 slow)tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary(1 parity test)test_forbids_non_binary_treatmentgate-firing testmulti_path_reversible_by_path_non_binary(Scenario 19) added tobenchmarks/R/generate_dcdh_dynr_test_values.Rand regeneratedbenchmarks/data/dcdh_dynr_golden_values.json. Per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), SE within ~5% rtol envelope. 282 dCDH non-slow tests pass; 37 new test classes (slow + non-slow) green.Security / privacy
Generated with Claude Code