Skip to content

Comparison notebook: Why flexibility matters?#703

Open
cetagostini wants to merge 6 commits intomainfrom
cetagostini/causalimpact_v_causalpy
Open

Comparison notebook: Why flexibility matters?#703
cetagostini wants to merge 6 commits intomainfrom
cetagostini/causalimpact_v_causalpy

Conversation

@cetagostini
Copy link
Copy Markdown
Contributor

Add a new Jupyter notebook to docs demonstrating a comparison between Interrupted Time Series (ITS) approaches and Synthetic Control methods. The notebook loads CausalPy's built-in sc dataset, sets up plotting and RNG seed, and walks through applying CausalImpact and CausalPy to the same data (treated unit 'actual', controls a–g, treatment at time 73) to highlight when synthetic control is the more appropriate method.

Add a new Jupyter notebook to docs demonstrating a comparison between Interrupted Time Series (ITS) approaches and Synthetic Control methods. The notebook loads CausalPy's built-in `sc` dataset, sets up plotting and RNG seed, and walks through applying CausalImpact and CausalPy to the same data (treated unit 'actual', controls a–g, treatment at time 73) to highlight when synthetic control is the more appropriate method.
@review-notebook-app
Copy link
Copy Markdown

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@cursor
Copy link
Copy Markdown

cursor bot commented Feb 9, 2026

PR Summary

Low Risk
Docs-only change that adds a toctree entry; no runtime or library behavior is affected.

Overview
Adds a new documentation example notebook, its_causalpy_vs_causalimpact.ipynb, to the Comparative Interrupted Time Series section by linking it in docs/source/notebooks/index.md.

Written by Cursor Bugbot for commit a1d4a88. This will update automatically on new commits. Configure here.

@read-the-docs-community
Copy link
Copy Markdown

read-the-docs-community bot commented Feb 9, 2026

Documentation build overview

📚 causalpy | 🛠️ Build #32296776 | 📁 Comparing 53033ca against latest (78ec91d)

  🔍 Preview build  

333 files changed · + 24 added · ± 121 modified · - 188 deleted

+ Added

± Modified

- Deleted

@codecov
Copy link
Copy Markdown

codecov bot commented Feb 9, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 94.35%. Comparing base (278e947) to head (a1d4a88).
⚠️ Report is 84 commits behind head on main.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #703   +/-   ##
=======================================
  Coverage   94.35%   94.35%           
=======================================
  Files          44       44           
  Lines        7517     7517           
  Branches      456      456           
=======================================
  Hits         7093     7093           
  Misses        262      262           
  Partials      162      162           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@drbenvincent drbenvincent self-requested a review February 21, 2026 17:13
@drbenvincent drbenvincent added the documentation Improvements or additions to documentation label Feb 28, 2026
@drbenvincent
Copy link
Copy Markdown
Collaborator

Reviewed by claude-opus-4.7-xhigh using custom pedagogy and figure-quality skills. Many of the points below are qualitative judgements — feel free to push back on specific ones, but ideally most of the issues get addressed in some form.

Thanks Carlos — this is a useful notebook and the core demonstration (same data, three methods, one ground truth) is exactly the kind of side-by-side comparison that teaches people what to reach for. The numbers work out, the data is well chosen, and the final three-panel summary is a strong close.

The comments below are a mix of pedagogy and figure-quality notes. I've tried to prioritise: the first group changes how the notebook feels as teaching material, the second tightens technical correctness and Sphinx/MyST rendering, and the third is polish.


1. Reframe the narrative: "match method to data" rather than "tool X is bad"

The current framing reads like a head-to-head between libraries ("CausalImpact gets it wrong", "CausalPy wins"). The more durable lesson is the one already implicit in the notebook: when you have a donor pool of unaffected controls, modelling cross-unit structure beats extrapolating the treated unit's own history. CausalImpact isn't "wrong" here so much as being asked to answer a question with one-seventh of the available information.

Reframing this way lands better pedagogically — acknowledge the surprising result, reassure the reader, then connect it to the theory that predicts it — and it's also more defensible, since CausalImpact-with-controls is a legitimate CITS analysis, just not the one we run here.

Concrete suggestions:

  • Title. "Why Flexibility Matters in Causal Inference?" is ungrammatical as a question. Either drop the ? or rephrase as "Why does flexibility matter in causal inference?" or — if you want to lean into the real lesson — "Choosing the right counterfactual: ITS vs. synthetic control on the same data."
  • Opening hook (cell 0). The "someone on your team says 'just use CausalImpact'" framing sets up a strawman. An examples-first opening could be: "Here is one dataset. It has a clearly treated unit and seven unaffected controls. We will estimate the causal effect three ways, and the answers will differ by more than an order of magnitude. Understanding why tells us how to choose between ITS and synthetic control in general."
  • Cell 12 and 16 ("+9.3 is the wrong sign", "-5.2 overshoots by 3x"). These are correct but land as gotchas. A softer, more instructive version: "With no access to the controls, both methods must build a counterfactual from the treated series alone. Neither temporal model captures that the level of the treated unit is largely determined by a combination of the controls — so both extrapolate the pre-period level and trend, and both mis-attribute the gap to the intervention. This is the expected failure mode of ITS when cross-sectional information is available but unused."
  • Cell 32 ("CausalPy dominates ... 14× lower MAE"). Most of that gap is the method (SC vs. ITS), not the library. Consider separating the two claims: (a) SC is the right method for this data structure, and (b) CausalPy happens to let you switch methods without leaving the framework. Right now they read as one claim.

2. Add a reflection prompt at the end

Most of CausalPy's tutorial notebooks close with a takeaway that ties back to the reader's own work. The current close is a marketing bullet list for CausalPy. Suggest adding, after "What We've Learned", something like:

Before reaching for an ITS tool on your own problem, ask:

  • Do I have other units that plausibly did not receive the intervention and whose pre-period series are a reasonable mixture basis for my treated unit?
  • Is my treated unit's pre-period behaviour driven more by shared time trends across units (→ SC / CITS) or by its own internal structure, e.g. seasonality and level (→ ITS)?
  • Could an intervention spill over into the candidate controls? If so, neither CITS nor SC is clean.

Example: A retailer launches a new loyalty scheme in one region. The other regions run the old scheme and are observed weekly. Which method fits?

3. Fix the index.md placement

index.md adds this notebook under the "Comparative Interrupted Time Series" toctree, but the notebook's thesis is that CITS is not the right method here and argues for synthetic control. Recommend either:

  • promoting it to a new top-level "Method selection / comparison" toctree section, or
  • moving it under "Synthetic Control" (since that is the method the notebook ends up endorsing).

Leaving it under CITS will confuse readers who navigate the sidebar expecting CITS examples.


4. Technical-accuracy and rendering items

These are small but worth fixing because they appear in teaching material and in a Sphinx-built docs page.

  • Cell 18 has an orphan ::: with no opening directive — the closing fence on the line after "ITS ignores it." will render as literal text in the Sphinx/MyST build. Either delete it or wrap cell 18's intro as a proper MyST callout. Given the content ("The key difference"), :::{important} fits for a crucial conceptual distinction.
  • "95% HDI" label is mis-stated in the comparison plot (cell 29). The code computes quantile(0.025) and quantile(0.975) — that's an equal-tailed 95% CI, not a highest-density interval. Either switch the label to "95% CI" or compute the HDI (e.g. az.hdi(..., hdi_prob=0.95), or az.plot_hdi directly) to match the label.
  • Cell 29 middle panel title and cell 31 table row label both say "CausalPy (BSTS State Space)", but StateSpaceTimeSeries is a structural state-space model — BSTS is the specific Scott & Varian framework that CausalImpact uses. Relabel the CausalPy row as "CausalPy ITS (structural state space)" or just "CausalPy ITS" to avoid conflating them. This also removes the odd implication that CausalPy and CausalImpact differ on implementation more than on method.
  • Calendar axis in cell 13 is arbitrary. The dataset has an integer time index; pd.date_range("2020-01-01", periods=len(df), freq="ME") invents a monthly calendar that is never used for anything (and the seasonal_length=12 in cell 14 only matters if that calendar is real). If it's only there because StateSpaceTimeSeries requires a DatetimeIndex, add a one-sentence caveat ("We attach an arbitrary monthly calendar because the state-space model expects a DatetimeIndex; the seasonal component is not meaningful for this dataset"). Otherwise the seasonality claim in the narrative (cell 17, "level, trend, and seasonal components") is misleading.
  • Sampler portability (cell 14). nutpie + jax backend + FAST_COMPILE isn't in the default install and will fail for most readers running the notebook locally. For a docs notebook I'd use the default sampler with modest sample_kwargs (draws=500, tune=500, target_accept=0.95), consistent with AGENTS.md asking docs/tests to minimise MCMC load. Same comment applies to progressbar=True in cell 23 — docs notebooks usually keep that off.
  • Ground truth framing (cell 7 and cell 30). The "-1.85" is the average of the causal effect column over the post-period — that's the simulation's ATT over this window, not a fundamental property. Worth saying once: "This dataset is simulated with a known causal effect, so we can compare each method's estimate to the true ATT over the post-intervention period (≈ -1.85)."
  • # | warning: false pragmas (cells 9, 13, 14, 23) are Quarto directives and do nothing in a Sphinx/nbsphinx build — they just appear as comments in the rendered code blocks. Suggest removing them and silencing warnings the standard way if needed (e.g. the existing top-level warnings.filterwarnings("ignore")).

5. Figures

The three-panel summary (cell 29) is the centrepiece figure and mostly works, but a few things would tighten it.

  • Width. figsize=(16, 4.5) is ~1.6× the notebook's global width set in the setup cell (plt.rcParams["figure.figsize"] = [10, 5]) and won't align with the text column in the docs build. Either keep the width at 10 and use figsize=(10, FIG_HEIGHT) with three columns (tight), or — the more readable option at a single-column docs width — stack the panels vertically (3 rows × 1 col). Width should match the other figures in the notebook so everything aligns in Sphinx's rendered column.
  • Shared y-axis. The three panels don't share a y-range, which makes the "CausalImpact goes the wrong way" point visually harder to see. Use sharey=True (or explicit matching ax.set_ylim(...)) so the reader can eyeball magnitude differences directly — this is probably the single highest-value tweak to that figure.
  • Semantic colours. The three methods currently use C0 / C2 / C1 in a non-obvious order (CausalImpact = blue, CausalPy ITS = green, SC = orange). Defining named constants in the setup cell (e.g. COLOR_CI, COLOR_ITS, COLOR_SC) and using them everywhere makes the relationship between panels explicit. Consider tying colour to method family rather than library (e.g. both ITS-family methods in one hue, SC in a distinct one) to visually reinforce the pedagogical point that the key distinction is ITS-family vs. SC.
  • True-effect reference line. Good that it's black dashed. Set zorder=10 on it so it sits on top of the shaded band.
  • Figure captions. CausalPy's existing notebooks don't use a standard caption convention, so I wouldn't invent one here. But at minimum, the prose markdown cell immediately before the three-panel figure currently reads as narrative ("Now let's put it all together..."); one technical sentence describing what is actually plotted would help — e.g. "Per-period causal effect estimates in the post-intervention window (n=27) for each method. Shaded bands are 95% equal-tailed intervals; the dashed black line is the known simulated effect. Axes share y-limits across panels." This can live in the existing markdown cell — no new syntax needed.
  • Overview plot (cell 6). Consider labelling the controls collectively in the legend ("Controls (a–g)") rather than leaving them unlabelled — small thing, but explicit labelling helps on first read.

6. Glossary and citations (house style)

Per AGENTS.md:

  • Link glossary terms on first mention. docs/source/knowledgebase/glossary.rst already defines Interrupted time series design, Comparative interrupted time-series, Synthetic control, Donor pool, Counterfactual, Potential outcomes. The notebook currently uses all of these without linking. In a .ipynb with MyST, first-mention pattern is {term}`Interrupted Time Series <Interrupted time series design>`. Sibling notebooks like its_pymc_comparative.ipynb already follow this pattern.
  • Citations. Add entries to docs/source/references.bib and cite them inline with {cite:t}`key`. At minimum, consider citing:
    • Brodersen et al. (2015) for CausalImpact / BSTS
    • Abadie, Diamond & Hainmueller (2010) for synthetic control
    • Bouttell et al. (2018) or similar for CITS
  • Do not add a :::{bibliography} block at the end of the notebook. PR #834 (merged 2026-04-15) consolidated the 22 per-notebook bibliography blocks into a single central docs/source/references.rst to fix duplicate-citation Sphinx warnings. Inline {cite:t} roles still work as before — they now render citations that all point to the central references page.

Summary — what I'd treat as blocking vs. nice-to-have

Should fix before merge:

  • Orphan ::: in cell 18 (breaks MyST rendering).
  • "95% HDI" label that is actually a quantile CI.
  • Mis-categorisation in index.md (under CITS, but argues against CITS).
  • Non-portable sampler kwargs (nutpie + jax + FAST_COMPILE) in a docs notebook.
  • Three-panel figure width (cell 29) mismatches the rest of the notebook.
  • BSTS vs. structural-state-space labelling.

Strongly recommended (pedagogy):

  • Reframe from "tool X vs tool Y" to "method selection on one dataset".
  • Add a reflection prompt at the end.
  • Fix the title's question-mark grammar.

Polish:

  • Glossary links on first mention.
  • References / citations.
  • Shared y-axis in the three-panel comparison.
  • Named semantic colour constants.
  • Remove # | warning: false Quarto pragmas.
  • Caveat the arbitrary monthly calendar and the seasonal_length=12 that flows from it.

cetagostini and others added 2 commits April 16, 2026 21:15
Reframes the notebook from a library head-to-head to a method-selection
walkthrough, fixes several correctness and rendering issues flagged in
review, and aligns with CausalPy's glossary/citation conventions.

Correctness and rendering (blocking):
- Cell 18: wrap "The key difference" in :::{important} so the orphan
  closing fence no longer renders literally in the Sphinx build.
- Cell 29: compute true 95% HDIs via az.hdi instead of quantile(0.025/0.975)
  so the label matches the computation; stack panels 3x1 at figsize=(10, 10)
  with sharey=True so magnitudes compare directly; add zorder=10 on the
  true-effect reference line; use named semantic colour constants.
- Cells 14, 23: drop non-portable nutpie/jax/FAST_COMPILE sampler kwargs
  and progressbar=True; use default sampler with draws=500, tune=500,
  target_accept=0.95 for a docs-friendly notebook.
- Cells 9, 13, 14, 23: remove Quarto `# | warning: false` pragmas that
  don't do anything in the nbsphinx build.
- Cells 29, 31: relabel "CausalPy (BSTS State Space)" as "CausalPy ITS
  (structural state space)" to stop conflating with CausalImpact's BSTS.
- index.md: move the notebook out of "Comparative Interrupted Time Series"
  into "Synthetic Control" since SC is the method the notebook endorses.

Pedagogy:
- Retitle to "Choosing the right counterfactual: ITS vs. synthetic control
  on the same data" and replace the strawman opening with an examples-first
  framing.
- Soften cells 12 and 16 so the CausalImpact and CausalPy-ITS results are
  presented as the expected failure mode of ITS when cross-sectional
  information is unused, rather than as gotchas.
- In cell 32, disentangle "method gap" from "library gap" and add a
  reflection prompt with three diagnostic questions and a worked example.

Polish:
- Cell 2: define COLOR_CI/COLOR_ITS/COLOR_SC tied to method family.
- Cell 6: label the control lines collectively as "Controls (a–g)".
- Cell 7: note that -1.85 is the simulation's ATT over this post-period,
  not a fundamental property.
- Cell 12: caveat the arbitrary monthly calendar and seasonal_length=12.
- Cell 28: one-sentence technical caption before the three-panel figure.

Glossary and citations:
- Link Interrupted Time Series, Comparative ITS, Synthetic Control,
  Counterfactual, and ATT on first mention via {term} roles.
- Cite Brodersen et al. (2015) for CausalImpact/BSTS (new bib entry),
  Abadie, Diamond & Hainmueller (2010) for synthetic control, and
  Lopez Bernal et al. (2018) for CITS, using {cite:p} roles — no
  per-notebook bibliography block (PR #834 central references).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Regenerates outputs (MAE table, three-panel figure, sampling logs) under
the portable default sampler so the rendered docs page matches the
updated code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@cetagostini
Copy link
Copy Markdown
Contributor Author

cetagostini commented Apr 16, 2026

@drbenvincent apply the recommendations. Issues will solve after merge: #826

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants