Release train: school-holiday features, collect mode (#363), snapshot store, operational guards#368
Merged
Conversation
_freeze/ is intentionally committed (.gitignore policy) so doc renders are deterministic and network-free, but the execution outputs for the reference pages added in 19.2.0-19.4.0 (weather.derived.*, weather.locations.*, calendar ephemeris/day-type, multitask quantile factories) were never added after the local render. Cache-only change; no source or docs-source touched. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…track-freeze-19x-reference-pages chore(docs): track freeze outputs for 19.2-19.4 reference pages
Adds school-holiday calendar features for all 16 German Bundesländer, sourced from the OpenHolidays API (ODbL-1.0), covering 2022-01-01 to 2027-12-31. Requests outside the covered range raise ValueError (fail-safe; no fill/extrapolation). Only country_code="DE" is supported. Data: - datasets/csv/school_holidays_de.csv (648 rows, 16 states) - datasets/csv/school_holidays_de_meta.csv (validity range metadata) - LICENSES/ODbL-1.0.txt - REUSE sidecar .license files for both CSVs Code: - calendar/holiday.py: create_school_holiday_df() + get_school_holiday_features() - calendar/__init__.py: export both symbols - data/fetch_data.py: load_school_holidays_de() loader - manager/features.py: select_exogenous_features() gains include_school_holiday_features kwarg - configurator/config_multi.py: include_school_holiday_features: bool = False field - multitask/base.py: wires include_school_holiday_features into the concat + select pipeline Docs: - _quarto.yml: adds both new symbols to autosummary and sidebar; also fixes pre-existing sidebar drift by adding create_day_type_df/get_day_type_features - quartodoc reference pages generated for create_school_holiday_df and get_school_holiday_features Tests (tests/test_calendar_school_holiday.py, 30 tests): - determinism, dtype/no-NaN/binary, known NW 2024 vacations (Osterferien, Sommerferien, Herbstferien), state isolation (BY vs NW on 2024-08-21), inclusive edges, hourly broadcast, fail-safe both edges, country_code validation, selector toggle, bundled-data integrity (16 states, schema, row count in [500,650], meta) Full-repo follow-up (spotforecast2, not this repo): - No wiring changes needed in the full repo: include_school_holiday_features is fully wired in sf2-safe's multitask/base.py concat + select blocks. ConfigEntsoe inherits include_school_holiday_features via ConfigMulti (dataclass inheritance). The full repo reads sf2-safe; no additional wiring or re-exports required there. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 — datasets/csv/school_holidays_de.csv: add missing HE Weihnachtsferien 2027-12-23/2028-01-11 record; de-duplicate all 23 truncated-duplicate pairs in MV and SH (keep natural/longer end_date per DATA POLICY); re-sort by (state, start_date). Final row count: 626, 16 states, 0 duplicate keys. FIX 2 — calendar/holiday.py create_school_holiday_df: normalize tz-aware timestamps in range check via .tz_convert(None).normalize() so a boundary timestamp such as 2027-12-31 23:00 UTC no longer falsely raises ValueError. Also fix pd.date_range call to localize string start/end to inferred_tz before mixing with a tz-aware counterpart. FIX 3 — tests: pin HE Weihnachtsferien 2027 (test_he_weihnachtsferien_2027): 2027-12-22 == 0, 2027-12-23 through 2027-12-31 all == 1. FIX 4 — tests: tz-aware boundary tests: end=2027-12-31 23:00 UTC does not raise; end=2028-01-01 00:00 UTC raises ValueError. FIX 5 — data/fetch_data.py load_school_holidays_de: sort returned DataFrame by (state, start_date) as docstring claims. FIX 6 — same docstring: reword datetime64 dtype claim to "resolution depends on the pandas version". FIX 7 — same regeneration note: replace endDate-truncation rule with the natural-form DATA POLICY (keep startDate in range, verbatim endDate). FIX 8 — configurator/config_multi.py: add include_ephemeris_features and include_day_type_features entries to Args and Attributes docblocks; place include_school_holiday_features directly after include_day_type_features to match field order. FIX 9 — calendar/holiday.py create_school_holiday_df: rename params start_date/end_date → start/end, mirroring create_day_type_df; update call site in get_school_holiday_features and docstring. FIX 10 — data/fetch_data.py load_school_holidays_de: remove unused data_home parameter entirely; update signature and docstring. FIX 11 — manager/features.py: inline single-element allowlist for is_school_holiday, eliminating intermediate variable. FIX 12 — multitask/base.py: replace both getattr fallbacks with direct self.config.include_school_holiday_features (field is declared on ConfigMulti). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tom-up zone QC frame Implements #363: ZoneResult structured reporting; collect mode never substitutes or retries — fallback policy stays operator-side. Adds build_zone_qc_frame (min_count totals, fail-safe on missing zone files). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Add SnapshotStore (frozen dataclass) and parse_snapshot_timestamp to spotforecast2_safe.utils.snapshot_store. The store is a pure mechanism: write/newest_valid/restore/prune/seed_from_file/age_of — it knows nothing about interim files, zones, downloads, or substitution policy, which remains operator-side per the KB 2026-06-11 ADR (safe-full-split / no-reexport-shims invariant). Generalized from bart26k-lecture/scripts/_team4_resilience.py; all domain knowledge stripped. Key mechanics ported faithfully: - Atomic tmp+os.replace writes (crash-safe) - pd.Timestamp(stem, tz="UTC") parsing (ISO-8601-basic, filename-safe) - mtime-as-UTC seeding in seed_from_file - TTL-scoped newest_valid + prune (also clears stale .tmp) 30 new tests (tests/test_snapshot_store.py): atomicity with simulated os.replace crash, TTL boundary, restore round-trip, seed_from_file mtime/skip/dedup, age_of, unparseable-filename warnings. Docs: registered SnapshotStore and parse_snapshot_timestamp in the existing Utils sidebar section and quartodoc Utils section of _quarto.yml; two .qmd stubs generated by quartodoc build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…guard Upstreams the team-4 operational guards from scripts/team4_4zones_submit.py as parameterised, raising library functions. Thresholds and policy (Abort exit codes, mode-specific forbidden sets, reference-selection fallback chain) remain in the operator layer. New modules ----------- - spotforecast2_safe/preprocessing/coverage.py assert_frontier_fresh, assert_actual_lag_within, assert_no_interior_gaps (guards 1-3 of assert_coverage; all raise CoverageError), last_complete_hour (pure frontier-completeness helper; raises ValueError on empty/all-NaN input). Value-sanity guard (guard 5 / intra-hour range / step / deviation) is already apply_target_corruption_policy — not duplicated. - spotforecast2_safe/processing/shape_check.py ShapeCheckReport (frozen dataclass) + check_forecast_shape (pure Pearson-corr + range-ratio measurement; zero-range guard -> NaN ratio; raises only on invalid inputs). - spotforecast2_safe/multitask/guards.py assert_no_leakage (checks training frame, selected exog names, fitted feature_name_; raises LeakageError per surface; unreadable fitted features raise RuntimeError — verifiability invariant). New exceptions -------------- - CoverageError (RuntimeError subclass) in exceptions.py - LeakageError (RuntimeError subclass) in exceptions.py Semantic choices ---------------- - Boundary operators: strict < throughout (== boundary passes) — faithful to the script's comparison operators. - Unreadable fitted features: raise RuntimeError rather than warn-and-skip — an unreadable feature list violates verifiability, not just leakage. - last_complete_hour uses modal index diff for cadence inference (matches the script's .diff().mode().iloc[0] idiom). Tests: 56 new assertions across three test files; full suite 2462 passed. Docs: quartodoc QMDs generated for all 9 new public symbols; _quarto.yml sidebar and contents sections updated under Preprocessing, Processing, Multitask, and Exceptions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (tests): pin exact-boundary TTL semantics — three new tests confirm
that a snapshot/file stamped exactly NOW-TTL is treated as valid across
newest_valid, prune, and seed_from_file.
FIX 2 (write docstring): document second-precision truncation — timestamps
use %Y%m%dT%H%M%SZ; sub-second components are silently truncated; two
writes within the same UTC second overwrite via last-writer-wins os.replace.
FIX 3 (prune Returns): clarify "Number of files deleted (expired .csv
snapshots plus any stale .tmp files)."
FIX 4 (__post_init__ validation): SnapshotStore now raises ValueError for
non-positive ttl or non-absolute root; two tests added.
FIX 5 (dead guard): remove redundant `p.name.endswith(".tmp")` check in
newest_valid and seed_from_file — suffix != ".csv" already excludes .tmp
files (.tmp has suffix .tmp, not .csv).
FIX 6 (SPDX header): copyright year updated to 2024-2026 in both files.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (_quarto.yml): add ZoneResult and build_zone_qc_frame to both the website sidebar Downloader section and the quartodoc sections list, alongside download_zone_loads entries. FIX 2 (downloader/__init__.py): remove ZoneResult and build_zone_qc_frame re-exports, restoring __init__ to develop convention (download_new_data and merge_build_manual only); full-path imports via spotforecast2_safe.downloader.entsoe are the project standard. FIX 3 (entsoe.py): validate start/end before the mode branch so a bogus format raises in both raise and collect mode; update noqa comment to state the catch is limited to download-layer failures after argument validation. FIX 4 (entsoe.py): resolve interim_path fresh via get_data_home() after each successful zone download and assert the file exists before recording ok=True; record ok=False with a descriptive error if missing. FIX 5 (entsoe.py build_zone_qc_frame): replace redundant `all_have_forecast and len(forecasts) == n` guard with the reference-faithful `len(forecasts) == n`; remove unused flag. FIX 6 (entsoe.py build_zone_qc_frame): annotate `data_home: Optional[Union[Path, str]] = None`. FIX 7 (entsoe.py download_zone_loads): merge the two separate raise/collect per-zone for-loops into one; query closure is defined once per zone before the mode branch. FIX 8 (docs/reference/*.qmd): add missing trailing newlines to ZoneResult.qmd, build_zone_qc_frame.qmd, download_zone_loads.qmd. FIX 9 (tests): add test_build_zone_qc_frame_explicit_data_home_no_env_var which calls build_zone_qc_frame(data_home=<explicit path>) without SPOTFORECAST2_DATA being set. FIX 10: no conftest.py exists; _FakeClient/_FailingClient left in place. FIX 11 (entsoe.py ZoneResult): reword `area` docstring to say it is the string area code (e.g. "DE_TENNET"), not an entsoe-py Area enum instance. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The collect-mode branch's _quarto.yml accidentally carried the Utils registrations of the parallel snapshot-store branch, whose module does not exist here, breaking quartodoc build. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
FIX 1 (blocker): guard IndexError in last_complete_hour when actual has only one non-NaN row — mode() of a single diff is empty; now raises ValueError with a clear message. Add test. FIX 2 (blocker): remove stale "raises on the first surface" sentence from assert_no_leakage docstring; implementation collects all violations and raises once. FIX 3 (should-fix): replace Sphinx-style tilde cross-references with plain Markdown backticks in coverage.py (module docstring + 3 function docstrings) and multitask/guards.py (function docstring). Only branch-modified files touched; pre-existing refs in model.py and target_corruption.py left unchanged. FIX 4 (should-fix): revert out-of-scope cosmetic reformats from config_multi.py and test_config_multi.py to avoid merge conflicts with an in-flight branch. FIX 5 (nit): add invariant note to ShapeCheckReport.skipped property docstring: check_forecast_shape always stores n_overlap=0 when skipping; manually-constructed reports with n_overlap > 0 and NaN metrics report skipped=False. FIX 6 (nit): tighten test assertion in test_multitask_guards.py from surfaces_mentioned >= 2 to == 3 (fixture violates all three surfaces). Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…chool-holiday-bundesland feat(calendar): per-Bundesland school-holiday features (roadmap #6)
…one-collect-mode feat(downloader): per-zone collect mode for download_zone_loads + bottom-up zone QC frame
…napshot-store-2 feat(utils): generic TTL'd atomic snapshot store
…perational-guards feat(preprocessing): coverage guards, forecast shape report, leakage guard
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
## [22.1.0-rc.2](v22.1.0-rc.1...v22.1.0-rc.2) (2026-06-12) ### Features * **downloader:** per-zone collect mode for download_zone_loads + bottom-up zone QC frame ([48b96a7](48b96a7)), closes [#363](#363) * **preprocessing:** coverage guards, forecast shape report, leakage guard ([34a7a0c](34a7a0c)) * **utils:** generic TTL'd atomic snapshot store ([cfeeda4](cfeeda4)) ### Bug Fixes * **docs:** drop cross-branch snapshot-store entries from _quarto.yml ([de7f3ca](de7f3ca)) * **downloader:** apply code-review fixes to zone-collect-mode ([f40cf41](f40cf41)) * **preprocessing,guards:** six surgical code-review fixes ([1d7d0ab](1d7d0ab)) * **utils:** code-review fixes for SnapshotStore ([9e2710a](9e2710a)) ### Documentation * **downloader:** regenerate reference pages after docstring fixes ([6e22a12](6e22a12)) * **utils:** regenerate reference pages after docstring fixes ([2d7f2d6](2d7f2d6))
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Promotes develop → main for the 22.1.0 release:
include_school_holiday_featuresflagdownload_zone_loads(closes downloader: optional structured per-zone result for download_zone_loads (on_zone_failure="collect") #363) +build_zone_qc_frameSnapshotStoreCoverageError), pure forecast shape report, leakage guard (+LeakageError)All mechanism, no policy: substitution/fallback decisions stay operator-side per the fail-safe boundary. Union verified locally: 2545 tests, ruff, REUSE, full quarto render green; freeze outputs tracked.
🤖 Generated with Claude Code