Skip to content

Release train: school-holiday features, collect mode (#363), snapshot store, operational guards#368

Merged
bartzbeielstein merged 22 commits into
mainfrom
develop
Jun 12, 2026
Merged

Release train: school-holiday features, collect mode (#363), snapshot store, operational guards#368
bartzbeielstein merged 22 commits into
mainfrom
develop

Conversation

@bartzbeielstein

Copy link
Copy Markdown
Collaborator

Promotes develop → main for the 22.1.0 release:

All mechanism, no policy: substitution/fallback decisions stay operator-side per the fail-safe boundary. Union verified locally: 2545 tests, ruff, REUSE, full quarto render green; freeze outputs tracked.

🤖 Generated with Claude Code

github-actions Bot and others added 22 commits June 10, 2026 23:58
_freeze/ is intentionally committed (.gitignore policy) so doc renders are
deterministic and network-free, but the execution outputs for the reference
pages added in 19.2.0-19.4.0 (weather.derived.*, weather.locations.*,
calendar ephemeris/day-type, multitask quantile factories) were never added
after the local render. Cache-only change; no source or docs-source touched.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…track-freeze-19x-reference-pages

chore(docs): track freeze outputs for 19.2-19.4 reference pages
Adds school-holiday calendar features for all 16 German Bundesländer,
sourced from the OpenHolidays API (ODbL-1.0), covering 2022-01-01 to
2027-12-31.  Requests outside the covered range raise ValueError (fail-safe;
no fill/extrapolation).  Only country_code="DE" is supported.

Data:
- datasets/csv/school_holidays_de.csv (648 rows, 16 states)
- datasets/csv/school_holidays_de_meta.csv (validity range metadata)
- LICENSES/ODbL-1.0.txt
- REUSE sidecar .license files for both CSVs

Code:
- calendar/holiday.py: create_school_holiday_df() + get_school_holiday_features()
- calendar/__init__.py: export both symbols
- data/fetch_data.py: load_school_holidays_de() loader
- manager/features.py: select_exogenous_features() gains include_school_holiday_features kwarg
- configurator/config_multi.py: include_school_holiday_features: bool = False field
- multitask/base.py: wires include_school_holiday_features into the concat + select pipeline

Docs:
- _quarto.yml: adds both new symbols to autosummary and sidebar; also fixes
  pre-existing sidebar drift by adding create_day_type_df/get_day_type_features
- quartodoc reference pages generated for create_school_holiday_df and
  get_school_holiday_features

Tests (tests/test_calendar_school_holiday.py, 30 tests):
- determinism, dtype/no-NaN/binary, known NW 2024 vacations (Osterferien,
  Sommerferien, Herbstferien), state isolation (BY vs NW on 2024-08-21),
  inclusive edges, hourly broadcast, fail-safe both edges, country_code
  validation, selector toggle, bundled-data integrity (16 states, schema,
  row count in [500,650], meta)

Full-repo follow-up (spotforecast2, not this repo):
- No wiring changes needed in the full repo: include_school_holiday_features
  is fully wired in sf2-safe's multitask/base.py concat + select blocks.
  ConfigEntsoe inherits include_school_holiday_features via ConfigMulti
  (dataclass inheritance).  The full repo reads sf2-safe; no additional
  wiring or re-exports required there.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 — datasets/csv/school_holidays_de.csv: add missing HE Weihnachtsferien
2027-12-23/2028-01-11 record; de-duplicate all 23 truncated-duplicate pairs in
MV and SH (keep natural/longer end_date per DATA POLICY); re-sort by (state,
start_date).  Final row count: 626, 16 states, 0 duplicate keys.

FIX 2 — calendar/holiday.py create_school_holiday_df: normalize tz-aware
timestamps in range check via .tz_convert(None).normalize() so a boundary
timestamp such as 2027-12-31 23:00 UTC no longer falsely raises ValueError.
Also fix pd.date_range call to localize string start/end to inferred_tz before
mixing with a tz-aware counterpart.

FIX 3 — tests: pin HE Weihnachtsferien 2027 (test_he_weihnachtsferien_2027):
2027-12-22 == 0, 2027-12-23 through 2027-12-31 all == 1.

FIX 4 — tests: tz-aware boundary tests: end=2027-12-31 23:00 UTC does not
raise; end=2028-01-01 00:00 UTC raises ValueError.

FIX 5 — data/fetch_data.py load_school_holidays_de: sort returned DataFrame
by (state, start_date) as docstring claims.

FIX 6 — same docstring: reword datetime64 dtype claim to "resolution depends
on the pandas version".

FIX 7 — same regeneration note: replace endDate-truncation rule with the
natural-form DATA POLICY (keep startDate in range, verbatim endDate).

FIX 8 — configurator/config_multi.py: add include_ephemeris_features and
include_day_type_features entries to Args and Attributes docblocks; place
include_school_holiday_features directly after include_day_type_features to
match field order.

FIX 9 — calendar/holiday.py create_school_holiday_df: rename params
start_date/end_date → start/end, mirroring create_day_type_df; update call
site in get_school_holiday_features and docstring.

FIX 10 — data/fetch_data.py load_school_holidays_de: remove unused data_home
parameter entirely; update signature and docstring.

FIX 11 — manager/features.py: inline single-element allowlist for
is_school_holiday, eliminating intermediate variable.

FIX 12 — multitask/base.py: replace both getattr fallbacks with direct
self.config.include_school_holiday_features (field is declared on ConfigMulti).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tom-up zone QC frame

Implements #363: ZoneResult structured reporting; collect mode never substitutes or retries — fallback policy stays operator-side. Adds build_zone_qc_frame (min_count totals, fail-safe on missing zone files).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add SnapshotStore (frozen dataclass) and parse_snapshot_timestamp to
spotforecast2_safe.utils.snapshot_store.  The store is a pure mechanism:
write/newest_valid/restore/prune/seed_from_file/age_of — it knows nothing
about interim files, zones, downloads, or substitution policy, which
remains operator-side per the KB 2026-06-11 ADR (safe-full-split /
no-reexport-shims invariant).

Generalized from bart26k-lecture/scripts/_team4_resilience.py; all
domain knowledge stripped.  Key mechanics ported faithfully:
- Atomic tmp+os.replace writes (crash-safe)
- pd.Timestamp(stem, tz="UTC") parsing (ISO-8601-basic, filename-safe)
- mtime-as-UTC seeding in seed_from_file
- TTL-scoped newest_valid + prune (also clears stale .tmp)

30 new tests (tests/test_snapshot_store.py): atomicity with simulated
os.replace crash, TTL boundary, restore round-trip, seed_from_file
mtime/skip/dedup, age_of, unparseable-filename warnings.

Docs: registered SnapshotStore and parse_snapshot_timestamp in the
existing Utils sidebar section and quartodoc Utils section of _quarto.yml;
two .qmd stubs generated by quartodoc build.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…guard

Upstreams the team-4 operational guards from
scripts/team4_4zones_submit.py as parameterised, raising library functions.
Thresholds and policy (Abort exit codes, mode-specific forbidden sets,
reference-selection fallback chain) remain in the operator layer.

New modules
-----------
- spotforecast2_safe/preprocessing/coverage.py
  assert_frontier_fresh, assert_actual_lag_within,
  assert_no_interior_gaps (guards 1-3 of assert_coverage; all raise
  CoverageError), last_complete_hour (pure frontier-completeness helper;
  raises ValueError on empty/all-NaN input).
  Value-sanity guard (guard 5 / intra-hour range / step / deviation) is
  already apply_target_corruption_policy — not duplicated.

- spotforecast2_safe/processing/shape_check.py
  ShapeCheckReport (frozen dataclass) + check_forecast_shape (pure
  Pearson-corr + range-ratio measurement; zero-range guard -> NaN ratio;
  raises only on invalid inputs).

- spotforecast2_safe/multitask/guards.py
  assert_no_leakage (checks training frame, selected exog names, fitted
  feature_name_; raises LeakageError per surface; unreadable fitted
  features raise RuntimeError — verifiability invariant).

New exceptions
--------------
- CoverageError (RuntimeError subclass) in exceptions.py
- LeakageError (RuntimeError subclass) in exceptions.py

Semantic choices
----------------
- Boundary operators: strict < throughout (== boundary passes) — faithful
  to the script's comparison operators.
- Unreadable fitted features: raise RuntimeError rather than warn-and-skip
  — an unreadable feature list violates verifiability, not just leakage.
- last_complete_hour uses modal index diff for cadence inference (matches
  the script's .diff().mode().iloc[0] idiom).

Tests: 56 new assertions across three test files; full suite 2462 passed.
Docs: quartodoc QMDs generated for all 9 new public symbols; _quarto.yml
sidebar and contents sections updated under Preprocessing, Processing,
Multitask, and Exceptions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (tests): pin exact-boundary TTL semantics — three new tests confirm
that a snapshot/file stamped exactly NOW-TTL is treated as valid across
newest_valid, prune, and seed_from_file.

FIX 2 (write docstring): document second-precision truncation — timestamps
use %Y%m%dT%H%M%SZ; sub-second components are silently truncated; two
writes within the same UTC second overwrite via last-writer-wins os.replace.

FIX 3 (prune Returns): clarify "Number of files deleted (expired .csv
snapshots plus any stale .tmp files)."

FIX 4 (__post_init__ validation): SnapshotStore now raises ValueError for
non-positive ttl or non-absolute root; two tests added.

FIX 5 (dead guard): remove redundant `p.name.endswith(".tmp")` check in
newest_valid and seed_from_file — suffix != ".csv" already excludes .tmp
files (.tmp has suffix .tmp, not .csv).

FIX 6 (SPDX header): copyright year updated to 2024-2026 in both files.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (_quarto.yml): add ZoneResult and build_zone_qc_frame to both
the website sidebar Downloader section and the quartodoc sections list,
alongside download_zone_loads entries.

FIX 2 (downloader/__init__.py): remove ZoneResult and build_zone_qc_frame
re-exports, restoring __init__ to develop convention (download_new_data
and merge_build_manual only); full-path imports via
spotforecast2_safe.downloader.entsoe are the project standard.

FIX 3 (entsoe.py): validate start/end before the mode branch so a bogus
format raises in both raise and collect mode; update noqa comment to
state the catch is limited to download-layer failures after argument
validation.

FIX 4 (entsoe.py): resolve interim_path fresh via get_data_home() after
each successful zone download and assert the file exists before recording
ok=True; record ok=False with a descriptive error if missing.

FIX 5 (entsoe.py build_zone_qc_frame): replace redundant
`all_have_forecast and len(forecasts) == n` guard with the
reference-faithful `len(forecasts) == n`; remove unused flag.

FIX 6 (entsoe.py build_zone_qc_frame): annotate
`data_home: Optional[Union[Path, str]] = None`.

FIX 7 (entsoe.py download_zone_loads): merge the two separate
raise/collect per-zone for-loops into one; query closure is defined once
per zone before the mode branch.

FIX 8 (docs/reference/*.qmd): add missing trailing newlines to
ZoneResult.qmd, build_zone_qc_frame.qmd, download_zone_loads.qmd.

FIX 9 (tests): add test_build_zone_qc_frame_explicit_data_home_no_env_var
which calls build_zone_qc_frame(data_home=<explicit path>) without
SPOTFORECAST2_DATA being set.

FIX 10: no conftest.py exists; _FakeClient/_FailingClient left in place.

FIX 11 (entsoe.py ZoneResult): reword `area` docstring to say it is the
string area code (e.g. "DE_TENNET"), not an entsoe-py Area enum instance.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The collect-mode branch's _quarto.yml accidentally carried the Utils
registrations of the parallel snapshot-store branch, whose module does
not exist here, breaking quartodoc build.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (blocker): guard IndexError in last_complete_hour when actual has
only one non-NaN row — mode() of a single diff is empty; now raises
ValueError with a clear message. Add test.

FIX 2 (blocker): remove stale "raises on the first surface" sentence
from assert_no_leakage docstring; implementation collects all violations
and raises once.

FIX 3 (should-fix): replace Sphinx-style tilde cross-references with
plain Markdown backticks in coverage.py (module docstring + 3 function
docstrings) and multitask/guards.py (function docstring). Only
branch-modified files touched; pre-existing refs in model.py and
target_corruption.py left unchanged.

FIX 4 (should-fix): revert out-of-scope cosmetic reformats from
config_multi.py and test_config_multi.py to avoid merge conflicts with
an in-flight branch.

FIX 5 (nit): add invariant note to ShapeCheckReport.skipped property
docstring: check_forecast_shape always stores n_overlap=0 when skipping;
manually-constructed reports with n_overlap > 0 and NaN metrics report
skipped=False.

FIX 6 (nit): tighten test assertion in test_multitask_guards.py from
surfaces_mentioned >= 2 to == 3 (fixture violates all three surfaces).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…chool-holiday-bundesland

feat(calendar): per-Bundesland school-holiday features (roadmap #6)
## [22.1.0-rc.1](v22.0.0...v22.1.0-rc.1) (2026-06-12)

### Features

* **calendar:** add per-Bundesland school-holiday features (roadmap [#6](#6)) ([3a3deb3](3a3deb3))

### Bug Fixes

* **calendar:** apply code-review fixes for school-holiday feature branch ([5e46c40](5e46c40))
…one-collect-mode

feat(downloader): per-zone collect mode for download_zone_loads + bottom-up zone QC frame
…napshot-store-2

feat(utils): generic TTL'd atomic snapshot store
…perational-guards

feat(preprocessing): coverage guards, forecast shape report, leakage guard
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
## [22.1.0-rc.2](v22.1.0-rc.1...v22.1.0-rc.2) (2026-06-12)

### Features

* **downloader:** per-zone collect mode for download_zone_loads + bottom-up zone QC frame ([48b96a7](48b96a7)), closes [#363](#363)
* **preprocessing:** coverage guards, forecast shape report, leakage guard ([34a7a0c](34a7a0c))
* **utils:** generic TTL'd atomic snapshot store ([cfeeda4](cfeeda4))

### Bug Fixes

* **docs:** drop cross-branch snapshot-store entries from _quarto.yml ([de7f3ca](de7f3ca))
* **downloader:** apply code-review fixes to zone-collect-mode ([f40cf41](f40cf41))
* **preprocessing,guards:** six surgical code-review fixes ([1d7d0ab](1d7d0ab))
* **utils:** code-review fixes for SnapshotStore ([9e2710a](9e2710a))

### Documentation

* **downloader:** regenerate reference pages after docstring fixes ([6e22a12](6e22a12))
* **utils:** regenerate reference pages after docstring fixes ([2d7f2d6](2d7f2d6))
@bartzbeielstein bartzbeielstein merged commit a9913a5 into main Jun 12, 2026
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

downloader: optional structured per-zone result for download_zone_loads (on_zone_failure="collect")

2 participants