From 260a08432177181f8cec78a7d1a679379888a16f Mon Sep 17 00:00:00 2001 From: bartzbeielstein <32470350+bartzbeielstein@users.noreply.github.com> Date: Sat, 13 Jun 2026 18:52:29 +0200 Subject: [PATCH 1/2] feat(weather): per-zone weather for the ENTSO-E four-zone pipeline When `ConfigMulti.per_zone_weather=True`, each of the four German TSO control zones (load_50hertz/load_amprion/load_tennet/load_transnetbw) is driven by weather from its own region instead of one shared national index. Calendar, holiday and event-window features stay shared (they are national); only weather becomes regional, under the same un-prefixed column names so no estimator, factory or column matcher downstream changes. Opt-in, default OFF -> byte -identical to the prior shared-weather baseline. Mechanism: the existing single global weather build stays as the shared schema/ baseline; build_exogenous_features additionally fetches one weather frame per zone (reusing the multi-city get_weather_features path with each zone's city list) into self.zone_weather_aligned. At the single per-target seam (get_target_data, new keyword-only `zone_weather`) the zone's weather columns overwrite the shared values in-place; column order and shape are preserved. The per-zone frame keeps its native [start, cov_end] index so it spans the forecast horizon (regression-tested). On all-zones-success the shared schema is seeded from the first zone, so per-zone weather survives a global fetch that failed under on_weather_failure="skip". - weather/locations.py: registry 13->15 (Mannheim, Karlsruhe for TransnetBW); frozen GERMAN_TSO_ZONE_CITIES partition (provenance: SMARD control-area map + TSO wiki; Hamburg in 50Hertz, Bremen in TenneT); locations_for_zone resolver (fail-safe ValueError on unknown zone). Pre-existing Sphinx roles swept. - configurator: per_zone_weather + zone_weather_locations fields; validate_config guards reject combination with use_population_weighted_weather (mutual exclusivity), use_exogenous_features=False, and poly_features_degree>=2. - Fail-safe: under "skip" any zone fetch failure degrades the whole pipeline to no-weather (never substitutes the global index); under "raise" aborts naming the zone. - tasks/task_safe_zone_load_demo.py: --per_zone_weather flag (default off, offline-safe via on_weather_failure="skip"). - 36 network-free tests incl. mapping/partition, config guards, the seam overwrite, forecast-horizon coverage, skip-degradation, and non-zone-target fail-safe. quartodoc reference + freeze regenerated. Co-Authored-By: Claude Opus 4.8 (1M context) --- .../execute-results/html.json | 4 +- .../execute-results/html.json | 4 +- .../execute-results/html.json | 4 +- .../execute-results/html.json | 4 +- .../execute-results/html.json | 6 +- .../execute-results/html.json | 6 +- .../execute-results/html.json | 6 +- ...onfigurator.config_entsoe.ConfigEntsoe.qmd | 2 + .../configurator.config_multi.ConfigMulti.qmd | 120 ++-- .../manager.features.get_target_data.qmd | 22 +- docs/reference/multitask.base.BaseTask.qmd | 15 +- .../tasks.task_safe_zone_load_demo.qmd | 28 +- ...her.locations.default_german_locations.qmd | 8 +- docs/reference/weather.locations.weights.qmd | 10 +- .../configurator/_base_config.py | 27 + .../configurator/config_multi.py | 32 + src/spotforecast2_safe/manager/features.py | 21 + src/spotforecast2_safe/multitask/base.py | 93 +++ .../tasks/task_safe_zone_load_demo.py | 63 +- src/spotforecast2_safe/weather/locations.py | 112 +++- tests/test_per_zone_weather.py | 609 ++++++++++++++++++ tests/test_weather_locations.py | 2 +- 22 files changed, 1069 insertions(+), 129 deletions(-) create mode 100644 tests/test_per_zone_weather.py diff --git a/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json b/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json index f9787527..cf33de45 100644 --- a/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json +++ b/_freeze/docs/reference/configurator.config_entsoe.ConfigEntsoe/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "df358d89e8796124284183fc6efecd0c", + "hash": "c2f76276942f6735a97f7843afa6912d", "result": { "engine": "jupyter", - "markdown": "---\ntitle: configurator.config_entsoe.ConfigEntsoe\n---\n\n\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe(\n country_code='DE',\n periods=default_periods(),\n lags_consider=(lambda: list(range(1, 24)))(),\n train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n use_population_weighted_weather=False,\n include_degree_hours=False,\n include_apparent_temperature=False,\n degree_hours_base_heating=15.0,\n degree_hours_base_cooling=22.0,\n include_ephemeris_features=False,\n include_day_type_features=False,\n include_school_holiday_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n include_football_match_window=False,\n include_energy_saving_window=False,\n index_name='Time (UTC)',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n max_time_spotoptim=None,\n warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n retrain_max_age=(lambda: pd.Timedelta(days=7))(),\n)\n```\n\nConfiguration for the ENTSO-E forecasting pipeline.\n\nSingle-target counterpart to `ConfigMulti`, used by the ENTSO-E CLI\n(``spotforecast2.tasks.task_entsoe``) and any single-target pipeline routed\nthrough ``spotforecast2.multitask.runner.run(config_cls=ConfigEntsoe)``.\n\n``ConfigEntsoe`` **inherits every field and method of `ConfigMulti`** — so any\nfeature flag added to ``ConfigMulti`` is available here automatically (this\nis what closes the historical feature-flag parity gap structurally, rather\nthan via a hand-maintained mirror). It differs from ``ConfigMulti`` in\nexactly two ways:\n\n- ``index_name`` defaults to ``\"Time (UTC)\"`` (the ENTSO-E CSV time column)\n instead of ``\"DateTime\"``.\n- it adds ``retrain_max_age`` — the maximum age of a previously trained model\n before retraining is required (consumed by\n `spotforecast2_safe.manager.trainer.should_retrain`).\n\nSee `ConfigMulti` for the full field reference (training/validation windows,\nfeature toggles, exogenous-provider flags, target-corruption knobs, …).\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------|\n| index_name | [str](`str`) | Datetime column name used when resetting the index. Defaults to ``\"Time (UTC)\"``. | `'Time (UTC)'` |\n| retrain_max_age | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Maximum age of a trained model before a retrain is forced. Defaults to ``pd.Timedelta(days=7)``. | `(lambda: pd.Timedelta(days=7))()` |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#fdcc96fe .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nconfig = ConfigEntsoe(country_code=\"DE\")\n# ENTSO-E-specific defaults:\nprint(\"index_name:\", config.index_name)\nprint(\"retrain_max_age:\", config.retrain_max_age)\nassert config.index_name == \"Time (UTC)\"\nassert config.retrain_max_age == pd.Timedelta(days=7)\n\n# Inherits the full ConfigMulti surface, incl. the opt-in feature flags:\nassert isinstance(config, ConfigMulti)\nconfig = ConfigEntsoe(\n include_ephemeris_features=True,\n include_day_type_features=True,\n include_degree_hours=True,\n)\nprint(\"ephemeris:\", config.include_ephemeris_features)\nprint(\"predict_size:\", config.predict_size)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nindex_name: Time (UTC)\nretrain_max_age: 7 days 00:00:00\nephemeris: True\npredict_size: 24\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: configurator.config_entsoe.ConfigEntsoe\n---\n\n\n\n```python\nconfigurator.config_entsoe.ConfigEntsoe(\n country_code='DE',\n periods=default_periods(),\n lags_consider=(lambda: list(range(1, 24)))(),\n train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n use_population_weighted_weather=False,\n per_zone_weather=False,\n zone_weather_locations=None,\n include_degree_hours=False,\n include_apparent_temperature=False,\n degree_hours_base_heating=15.0,\n degree_hours_base_cooling=22.0,\n include_ephemeris_features=False,\n include_day_type_features=False,\n include_school_holiday_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n include_football_match_window=False,\n include_energy_saving_window=False,\n index_name='Time (UTC)',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n max_time_spotoptim=None,\n warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n retrain_max_age=(lambda: pd.Timedelta(days=7))(),\n)\n```\n\nConfiguration for the ENTSO-E forecasting pipeline.\n\nSingle-target counterpart to `ConfigMulti`, used by the ENTSO-E CLI\n(``spotforecast2.tasks.task_entsoe``) and any single-target pipeline routed\nthrough ``spotforecast2.multitask.runner.run(config_cls=ConfigEntsoe)``.\n\n``ConfigEntsoe`` **inherits every field and method of `ConfigMulti`** — so any\nfeature flag added to ``ConfigMulti`` is available here automatically (this\nis what closes the historical feature-flag parity gap structurally, rather\nthan via a hand-maintained mirror). It differs from ``ConfigMulti`` in\nexactly two ways:\n\n- ``index_name`` defaults to ``\"Time (UTC)\"`` (the ENTSO-E CSV time column)\n instead of ``\"DateTime\"``.\n- it adds ``retrain_max_age`` — the maximum age of a previously trained model\n before retraining is required (consumed by\n `spotforecast2_safe.manager.trainer.should_retrain`).\n\nSee `ConfigMulti` for the full field reference (training/validation windows,\nfeature toggles, exogenous-provider flags, target-corruption knobs, …).\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|------------------------------------------------|--------------------------------------------------------------------------------------------------|------------------------------------|\n| index_name | [str](`str`) | Datetime column name used when resetting the index. Defaults to ``\"Time (UTC)\"``. | `'Time (UTC)'` |\n| retrain_max_age | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Maximum age of a trained model before a retrain is forced. Defaults to ``pd.Timedelta(days=7)``. | `(lambda: pd.Timedelta(days=7))()` |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#2f34fa00 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_entsoe import ConfigEntsoe\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nconfig = ConfigEntsoe(country_code=\"DE\")\n# ENTSO-E-specific defaults:\nprint(\"index_name:\", config.index_name)\nprint(\"retrain_max_age:\", config.retrain_max_age)\nassert config.index_name == \"Time (UTC)\"\nassert config.retrain_max_age == pd.Timedelta(days=7)\n\n# Inherits the full ConfigMulti surface, incl. the opt-in feature flags:\nassert isinstance(config, ConfigMulti)\nconfig = ConfigEntsoe(\n include_ephemeris_features=True,\n include_day_type_features=True,\n include_degree_hours=True,\n)\nprint(\"ephemeris:\", config.include_ephemeris_features)\nprint(\"predict_size:\", config.predict_size)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nindex_name: Time (UTC)\nretrain_max_age: 7 days 00:00:00\nephemeris: True\npredict_size: 24\n```\n:::\n:::\n\n\n", "supporting": [ "configurator.config_entsoe.ConfigEntsoe_files/figure-html" ], diff --git a/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json b/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json index 264b4cee..d0e0c504 100644 --- a/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json +++ b/_freeze/docs/reference/configurator.config_multi.ConfigMulti/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "3c90dcaa58925711e795a60681afd75a", + "hash": "1fcf9364d3e4a4f558b3d8a2170def6a", "result": { "engine": "jupyter", - "markdown": "---\ntitle: configurator.config_multi.ConfigMulti\n---\n\n\n\n```python\nconfigurator.config_multi.ConfigMulti(\n country_code='DE',\n periods=default_periods(),\n lags_consider=(lambda: list(range(1, 24)))(),\n train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n use_population_weighted_weather=False,\n include_degree_hours=False,\n include_apparent_temperature=False,\n degree_hours_base_heating=15.0,\n degree_hours_base_cooling=22.0,\n include_ephemeris_features=False,\n include_day_type_features=False,\n include_school_holiday_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n include_football_match_window=False,\n include_energy_saving_window=False,\n index_name='DateTime',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n max_time_spotoptim=None,\n warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n)\n```\n\nConfiguration for the multi-input forecasting pipeline.\n\nThis class manages all configuration parameters for the multi-input task,\nincluding training/prediction intervals, data sources, and feature\nengineering specifications. All parameters can be customized during\ninitialization or used with sensible defaults.\n\n``country_code`` serves as the single ISO country code used for both\nAPI queries and holiday feature generation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). Used for both API queries and holiday feature generation. | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | List of Period objects defining cyclical feature encodings. | `default_periods()` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | List of lag values to consider for feature selection. | `(lambda: list(range(1, 24)))()` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Time window for training data. | `(lambda: pd.Timedelta(days=(3 * 365)))()` |\n| end_train_default | [str](`str`) | Default end date for training period (ISO format with timezone). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window size. | `(lambda: pd.Timedelta(hours=(24 * 7 * 10)))()` |\n| predict_size | [int](`int`) | Number of hours to predict ahead. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Number of days between model refits. | `7` |\n| random_state | [int](`int`) | Random seed for reproducibility. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Number of trials for hyperparameter optimization. | `20` |\n| data_filename | [str](`str`) | Path to the interim merged data file. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | List of target column names to train models for. When ``None`` (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. ``config.targets = df.columns.tolist()``). Replaces standalone ``TARGETS`` and ``target_columns`` variables in pipeline scripts, providing a single source of truth for the active target set. | `None` |\n| use_outlier_detection | [bool](`bool`) | If True, apply IsolationForest-based outlier removal. | `True` |\n| contamination | [float](`float`) | Proportion of outliers for IsolationForest (0 < contamination < 0.5). | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy — ``\"weighted\"`` (n2n-style rolling weights) or ``\"linear\"`` (linear interpolation). | `'weighted'` |\n| window_size | [int](`int`) | Rolling window size in hours for gap detection (weighted imputation). | `72` |\n| use_exogenous_features | [bool](`bool`) | If True, build weather/calendar/day-night/holiday features. | `True` |\n| latitude | [float](`float`) | Latitude of the target location in decimal degrees. | `51.5136` |\n| longitude | [float](`float`) | Longitude of the target location in decimal degrees. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string for the target location (e.g. ``\"Europe/Berlin\"``). | `'UTC'` |\n| state | [str](`str`) | ISO 3166-2 subdivision code for regional holidays (e.g. ``\"NW\"``). | `'NW'` |\n| include_weather_windows | [bool](`bool`) | If True, include rolling weather-window features. | `False` |\n| include_holiday_features | [bool](`bool`) | If True, include public-holiday indicator features. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | If True, include Brückentag and before/after-holiday indicators (``is_brueckentag``, ``is_before_holiday``, ``is_after_holiday``). Defaults to ``False``. | `False` |\n| include_ephemeris_features | [bool](`bool`) | If True, include solar-elevation and daylight-duration features. Defaults to ``False``. | `False` |\n| include_day_type_features | [bool](`bool`) | If True, include working-day and day-type class features (``is_workday``, ``day_type``). Defaults to ``False``. | `False` |\n| include_school_holiday_features | [bool](`bool`) | Append the ``is_school_holiday`` binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only ``country_code=\"DE\"`` is supported. Defaults to ``False``. | `False` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| index_name | [str](`str`) | Name assigned to the datetime column when the index is reset. Defaults to ``\"DateTime\"``. | `'DateTime'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds as a list of ``(lower, upper)`` tuples, one entry per target column. ``None`` until set. | `None` |\n| verbose | [bool](`bool`) | If ``True``, enable verbose output for pipeline steps. Defaults to ``False``. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. ``None`` means the library default (``~/spotforecast2_cache/``) is used. | `None` |\n| n_trials_optuna | [int](`int`) | Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to ``15``. | `15` |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim surrogate-search trials (task 4). Defaults to ``10``. | `10` |\n| n_initial_spotoptim | [int](`int`) | Number of initial random evaluations for SpotOptim (task 4). Defaults to ``5``. | `5` |\n| max_time_spotoptim | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Wall-clock budget for the SpotOptim search in minutes (task 4). The search stops when either ``n_trials_spotoptim`` evaluations or this time limit is reached, whichever comes first. ``None`` (the default) disables the limit. | `None` |\n| warm_start_lags | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Lag set the SpotOptim task injects as a search-space candidate and uses to seed the optimizer's first evaluation. Defaults to ``DEFAULT_WARM_START_LAGS`` (``[1, 2, 3, 23, 24, 25, 47, 48, 167, 168, 169, 336]``). ``None`` or an empty list disables the warm start. | `(lambda: list(DEFAULT_WARM_START_LAGS))()` |\n| task | [str](`str`) | Active prediction task — one of ``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``. Defaults to ``\"lazy\"``. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in ``targets`` (in the same order). Positive values add the target's contribution; negative values invert it. Slice the list to ``agg_weights[:len(targets)]`` when only a subset of targets is active. Defaults to ``None`` (no weights pre-defined; set after loading the dataset). | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True`` so that saved models are immediately available for ``PredictTask`` without an explicit ``save_models()`` call. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|------------------------------------|----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code for API queries and holiday generation. |\n| periods | [List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\] | Cyclical feature encoding specifications. |\n| lags_consider | [List](`typing.List`)\\[[int](`int`)\\] | Lag values for autoregressive features. |\n| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. |\n| end_train_default | [str](`str`) | Default training end date. |\n| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. |\n| predict_size | [int](`int`) | Prediction horizon in hours. |\n| refit_size | [int](`int`) | Refit interval in days. |\n| random_state | [int](`int`) | Random seed. |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until explicitly set from the loaded dataset. |\n| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. |\n| contamination | [float](`float`) | IsolationForest contamination fraction. |\n| imputation_method | [str](`str`) | Gap-filling strategy (``\"weighted\"`` or ``\"linear\"``). |\n| window_size | [int](`int`) | Rolling window size for weighted imputation. |\n| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. |\n| latitude | [float](`float`) | Location latitude. |\n| longitude | [float](`float`) | Location longitude. |\n| timezone | [str](`str`) | IANA timezone string. |\n| state | [str](`str`) | Subdivision code for regional holidays. |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. |\n| include_ephemeris_features | [bool](`bool`) | Solar-elevation and daylight-duration feature toggle. Defaults to ``False``. |\n| include_day_type_features | [bool](`bool`) | Working-day / day-type class feature toggle. Defaults to ``False``. |\n| include_school_holiday_features | [bool](`bool`) | Per-Bundesland school-holiday indicator toggle. Defaults to ``False``. |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). |\n| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for the MI ranking (``None`` = score every row). |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). |\n| include_football_match_window | [bool](`bool`) | Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026. |\n| include_energy_saving_window | [bool](`bool`) | Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise). |\n| index_name | [str](`str`) | Datetime column name used when resetting the index. |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds ``(lower, upper)``. |\n| verbose | [bool](`bool`) | Verbose output toggle. |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. |\n| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. |\n| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. |\n| max_time_spotoptim | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Wall-clock budget for the SpotOptim search in minutes; ``None`` disables the limit. |\n| warm_start_lags | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Seed lag set for the SpotOptim search; ``None`` or empty disables the warm start. |\n| task | [str](`str`) | Active prediction task (``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``). |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Exog-provider failure policy in ``ExogBuilder.build``: ``\"raise\"`` (default) propagates the ``ExogProviderError``; ``\"skip\"`` logs and omits the failing provider's columns. |\n| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Validation window for exog providers: ``\"full\"`` (default) or ``\"train\"``. |\n\n## Notes {.doc-section .doc-section-notes}\n\nThe default period configurations use specific `n_periods` to balance resolution and smoothing:\n- **Daily**: `n_periods=12` (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality.\n- **Weekly**: `n_periods` typically matches range (1:1) to distinguish day-of-week patterns.\n- **Yearly**: `n_periods=12` (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.\n\nSee `docs/PERIOD_CONFIGURATION_RATIONALE.md` for a detailed analysis.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#4c595f5a .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\nprint(f\"Targets (default): {config.targets}\")\nprint(f\"agg_weights (default): {config.agg_weights}\")\nprint(f\"index_name: {config.index_name}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Set targets and bounds (user input that stays on the config)\nconfig.targets = [\"A\", \"B\", \"C\"]\nconfig.bounds = [(-2500, 4500), (-10, 3000)]\nprint(f\"Targets (after setting): {config.targets}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Create custom configuration — country_code serves both API and holiday purposes\ncustom_config = ConfigMulti(\n country_code='FR',\n predict_size=48,\n random_state=42,\n targets=[\"A\", \"B\"],\n index_name=\"DateTime\",\n)\nprint(f\"country_code: {custom_config.country_code}\")\nprint(f\"Predict size: {custom_config.predict_size}\")\nprint(f\"Random state: {custom_config.random_state}\")\nprint(f\"Targets: {custom_config.targets}\")\n\n# Verify training window\nprint(f\"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}\")\n\n# Check default periods\nprint(f\"Number of periods: {len(config.periods)}\")\nprint(f\"First period name: {config.periods[0].name}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: DE\nPredict size: 24\nRandom state: 314159\nTargets (default): None\nagg_weights (default): None\nindex_name: DateTime\nbounds: None\nTargets (after setting): ['A', 'B', 'C']\nbounds: [(-2500, 4500), (-10, 3000)]\ncountry_code: FR\nPredict size: 48\nRandom state: 42\nTargets: ['A', 'B']\nTraining window: True\nNumber of periods: 5\nFirst period name: daily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#d9d08e44 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti(country_code=\"FR\")\np = config.get_params()\nprint(f\"country_code: {p['country_code']}\")\nprint(f\"Predict size: {p['predict_size']}\")\nprint(f\"Random state: {p['random_state']}\")\nprint(f\"index_name: {p['index_name']}\")\nprint(f\"bounds: {p['bounds']}\")\nprint(f\"agg_weights: {p['agg_weights']}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 24\nRandom state: 314159\nindex_name: DateTime\nbounds: None\nagg_weights: None\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|-------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigMulti | [ConfigMulti](`spotforecast2_safe.configurator.config_multi.ConfigMulti`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#4ba40139 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\n_ = config.set_params(country_code=\"FR\", predict_size=48)\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\n\n# Deep parameter setting\n_ = config.set_params(periods__daily__n_periods=24)\nprint(next(p.n_periods for p in config.periods if p.name == \"daily\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 48\nRandom state: 314159\n24\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: configurator.config_multi.ConfigMulti\n---\n\n\n\n```python\nconfigurator.config_multi.ConfigMulti(\n country_code='DE',\n periods=default_periods(),\n lags_consider=(lambda: list(range(1, 24)))(),\n train_size=(lambda: pd.Timedelta(days=(3 * 365)))(),\n end_train_default='2025-12-31 00:00+00:00',\n delta_val=(lambda: pd.Timedelta(hours=(24 * 7 * 10)))(),\n predict_size=24,\n cv_block_size=None,\n refit_size=7,\n random_state=314159,\n n_hyperparameters_trials=20,\n data_filename='interim/energy_load.csv',\n targets=None,\n use_outlier_detection=True,\n contamination=0.01,\n imputation_method='weighted',\n window_size=72,\n imputation_window_size=None,\n use_exogenous_features=True,\n latitude=51.5136,\n longitude=7.4653,\n timezone='UTC',\n state='NW',\n include_weather_windows=False,\n include_holiday_features=False,\n include_holiday_adjacency_features=False,\n use_population_weighted_weather=False,\n per_zone_weather=False,\n zone_weather_locations=None,\n include_degree_hours=False,\n include_apparent_temperature=False,\n degree_hours_base_heating=15.0,\n degree_hours_base_cooling=22.0,\n include_ephemeris_features=False,\n include_day_type_features=False,\n include_school_holiday_features=False,\n poly_features_degree=1,\n max_poly_features=10,\n poly_mi_n_jobs=-1,\n poly_mi_sample_size=4000,\n include_covid_infection_rate=False,\n include_entsoe_forecast_load=False,\n include_entsoe_renewable_forecast=False,\n include_entsoe_net_load=False,\n include_entsoe_day_ahead_price=False,\n include_football_match_window=False,\n include_energy_saving_window=False,\n index_name='DateTime',\n bounds=None,\n verbose=False,\n cache_home=None,\n n_trials_optuna=15,\n n_trials_spotoptim=10,\n n_initial_spotoptim=5,\n max_time_spotoptim=None,\n warm_start_lags=(lambda: list(DEFAULT_WARM_START_LAGS))(),\n task='lazy',\n agg_weights=None,\n forecaster_factory=None,\n data_loader=None,\n test_data_loader=None,\n auto_save_models=True,\n data_frame_name='default',\n number_folds=10,\n on_weather_failure='raise',\n on_exog_provider_failure='raise',\n exog_max_gap_hours=0,\n exog_max_tail_gap_hours=0,\n exog_provider_window='full',\n target_qc_range_mw=None,\n target_qc_step_mw=None,\n target_qc_window_days=None,\n target_corruption_policy='abort',\n target_max_heal_hours=0,\n target_anchor_zone_hours=168,\n target_qc_deviation_mw=None,\n target_qc_deviation_ref=None,\n target_qc_deviation_slots=2,\n)\n```\n\nConfiguration for the multi-input forecasting pipeline.\n\nThis class manages all configuration parameters for the multi-input task,\nincluding training/prediction intervals, data sources, and feature\nengineering specifications. All parameters can be customized during\ninitialization or used with sensible defaults.\n\n``country_code`` serves as the single ISO country code used for both\nAPI queries and holiday feature generation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------------------------|------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|\n| country_code | [str](`str`) | ISO 3166-1 alpha-2 country code (e.g. ``\"DE\"``). Used for both API queries and holiday feature generation. | `'DE'` |\n| periods | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\]\\] | List of Period objects defining cyclical feature encodings. | `default_periods()` |\n| lags_consider | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | List of lag values to consider for feature selection. | `(lambda: list(range(1, 24)))()` |\n| train_size | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Time window for training data. | `(lambda: pd.Timedelta(days=(3 * 365)))()` |\n| end_train_default | [str](`str`) | Default end date for training period (ISO format with timezone). | `'2025-12-31 00:00+00:00'` |\n| delta_val | [Optional](`typing.Optional`)\\[[pd](`pandas`).[Timedelta](`pandas.Timedelta`)\\] | Validation window size. | `(lambda: pd.Timedelta(hours=(24 * 7 * 10)))()` |\n| predict_size | [int](`int`) | Number of hours to predict ahead. | `24` |\n| cv_block_size | [int](`int`) \\| None | Cross-validation test-block width in hours. Defaults to ``None``, meaning the CV uses ``predict_size``. Set to a fixed value (e.g. ``24``) to decouple the cross-validation horizon from a render-dependent live ``predict_size``. | `None` |\n| refit_size | [int](`int`) | Number of days between model refits. | `7` |\n| random_state | [int](`int`) | Random seed for reproducibility. | `314159` |\n| n_hyperparameters_trials | [int](`int`) | Number of trials for hyperparameter optimization. | `20` |\n| data_filename | [str](`str`) | Path to the interim merged data file. | `'interim/energy_load.csv'` |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | List of target column names to train models for. When ``None`` (default), no targets are pre-selected; set this attribute after loading the dataset (e.g. ``config.targets = df.columns.tolist()``). Replaces standalone ``TARGETS`` and ``target_columns`` variables in pipeline scripts, providing a single source of truth for the active target set. | `None` |\n| use_outlier_detection | [bool](`bool`) | If True, apply IsolationForest-based outlier removal. | `True` |\n| contamination | [float](`float`) | Proportion of outliers for IsolationForest (0 < contamination < 0.5). | `0.01` |\n| imputation_method | [str](`str`) | Gap-filling strategy — ``\"weighted\"`` (n2n-style rolling weights) or ``\"linear\"`` (linear interpolation). | `'weighted'` |\n| window_size | [int](`int`) | Rolling window size in hours for gap detection (weighted imputation). | `72` |\n| use_exogenous_features | [bool](`bool`) | If True, build weather/calendar/day-night/holiday features. | `True` |\n| latitude | [float](`float`) | Latitude of the target location in decimal degrees. | `51.5136` |\n| longitude | [float](`float`) | Longitude of the target location in decimal degrees. | `7.4653` |\n| timezone | [str](`str`) | IANA timezone string for the target location (e.g. ``\"Europe/Berlin\"``). | `'UTC'` |\n| state | [str](`str`) | ISO 3166-2 subdivision code for regional holidays (e.g. ``\"NW\"``). | `'NW'` |\n| include_weather_windows | [bool](`bool`) | If True, include rolling weather-window features. | `False` |\n| include_holiday_features | [bool](`bool`) | If True, include public-holiday indicator features. | `False` |\n| include_holiday_adjacency_features | [bool](`bool`) | If True, include Brückentag and before/after-holiday indicators (``is_brueckentag``, ``is_before_holiday``, ``is_after_holiday``). Defaults to ``False``. | `False` |\n| include_ephemeris_features | [bool](`bool`) | If True, include solar-elevation and daylight-duration features. Defaults to ``False``. | `False` |\n| include_day_type_features | [bool](`bool`) | If True, include working-day and day-type class features (``is_workday``, ``day_type``). Defaults to ``False``. | `False` |\n| include_school_holiday_features | [bool](`bool`) | Append the ``is_school_holiday`` binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only ``country_code=\"DE\"`` is supported. Defaults to ``False``. | `False` |\n| per_zone_weather | [bool](`bool`) | When True, each target is treated as a German TSO control zone and receives weather from its own regional cities via ``weather.locations.locations_for_zone``. Mutually exclusive with ``use_population_weighted_weather``; requires ``use_exogenous_features=True``; not compatible with ``poly_features_degree >= 2``. Default ``False`` → byte-identical to the shared-weather baseline. | `False` |\n| zone_weather_locations | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Optional override mapping from zone key (e.g. ``\"load_50hertz\"``) to a list of ``WeatherLocation`` objects. ``None`` (default) uses the built-in registry partition from ``GERMAN_TSO_ZONE_CITIES``. | `None` |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |\n| max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |\n| index_name | [str](`str`) | Name assigned to the datetime column when the index is reset. Defaults to ``\"DateTime\"``. | `'DateTime'` |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds as a list of ``(lower, upper)`` tuples, one entry per target column. ``None`` until set. | `None` |\n| verbose | [bool](`bool`) | If ``True``, enable verbose output for pipeline steps. Defaults to ``False``. | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. ``None`` means the library default (``~/spotforecast2_cache/``) is used. | `None` |\n| n_trials_optuna | [int](`int`) | Number of Optuna Bayesian-search trials for hyperparameter optimization (task 3). Defaults to ``15``. | `15` |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim surrogate-search trials (task 4). Defaults to ``10``. | `10` |\n| n_initial_spotoptim | [int](`int`) | Number of initial random evaluations for SpotOptim (task 4). Defaults to ``5``. | `5` |\n| max_time_spotoptim | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Wall-clock budget for the SpotOptim search in minutes (task 4). The search stops when either ``n_trials_spotoptim`` evaluations or this time limit is reached, whichever comes first. ``None`` (the default) disables the limit. | `None` |\n| warm_start_lags | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Lag set the SpotOptim task injects as a search-space candidate and uses to seed the optimizer's first evaluation. Defaults to ``DEFAULT_WARM_START_LAGS`` (``[1, 2, 3, 23, 24, 25, 47, 48, 167, 168, 169, 336]``). ``None`` or an empty list disables the warm start. | `(lambda: list(DEFAULT_WARM_START_LAGS))()` |\n| task | [str](`str`) | Active prediction task — one of ``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``. Defaults to ``\"lazy\"``. | `'lazy'` |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights used when combining individual target forecasts into a single weighted sum. The list must contain one weight per entry in ``targets`` (in the same order). Positive values add the target's contribution; negative values invert it. Slice the list to ``agg_weights[:len(targets)]`` when only a subset of targets is active. Defaults to ``None`` (no weights pre-defined; set after loading the dataset). | `None` |\n| auto_save_models | [bool](`bool`) | Whether ``BaseTask._run_strategy`` should persist fitted forecasters to ``/models/`` after every training run. Defaults to ``True`` so that saved models are immediately available for ``PredictTask`` without an explicit ``save_models()`` call. | `True` |\n| data_frame_name | [str](`str`) | Identifier for the active dataset. Used by ``BaseTask`` to name cache subdirectories, model files, and the per-dataset log file. Defaults to ``\"default\"``. | `'default'` |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Policy for handling Open-Meteo fetch failures inside ``BaseTask.build_exogenous_features``. ``\"raise\"`` (default) aborts the pipeline with a ``WeatherFetchError`` and preserves the safety-critical fail-safe semantics. ``\"skip\"`` logs a warning and continues with empty weather features so the rest of the pipeline can run without the Open-Meteo dependency. | `'raise'` |\n| exog_max_gap_hours | [int](`int`) | Maximum length, in hours, of a contiguous run of missing exogenous-provider values healed before the provider is rejected. Interior gaps are time-interpolated; leading/trailing edge gaps are back-/forward-filled. ``0`` (default) keeps the strict fail-safe (any gap raises). Healed runs are logged with count and span. Only already-published day-ahead vintages are involved, so healing is leakage-clean (CR-3). | `0` |\n| exog_max_tail_gap_hours | [int](`int`) | Extended healing budget, in hours, applied exclusively to the trailing-edge NaN run (the run containing the last index timestamp). The effective tail budget is ``max(exog_max_gap_hours, exog_max_tail_gap_hours)``. The canonical use case is the ENTSO-E day-ahead publication frontier: the last published vintage is zero-order-held forward to the forecast horizon without touching interior gaps (CR-3-clean). When ``exog_max_tail_gap_hours <= exog_max_gap_hours`` the parameter is inert (the interior budget already covers the tail) and a warning is logged. Defaults to ``0``. | `0` |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Span the exogenous providers are validated against. ``\"full\"`` (default) requires coverage of the entire ``data_start``→``cov_end`` request, matching prior behaviour. ``\"train\"`` validates only the consumed window ``[start_train_ts, cov_end]``, tolerating missing values before the training window. Honoured by the MultiTask pipeline; the forecaster-wrapper path currently always validates the full span. | `'full'` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| country_code | [str](`str`) | ISO country code for API queries and holiday generation. |\n| periods | [List](`typing.List`)\\[[Period](`spotforecast2_safe.data.Period`)\\] | Cyclical feature encoding specifications. |\n| lags_consider | [List](`typing.List`)\\[[int](`int`)\\] | Lag values for autoregressive features. |\n| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. |\n| end_train_default | [str](`str`) | Default training end date. |\n| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. |\n| predict_size | [int](`int`) | Prediction horizon in hours. |\n| refit_size | [int](`int`) | Refit interval in days. |\n| random_state | [int](`int`) | Random seed. |\n| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. |\n| targets | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Active target column names. ``None`` until explicitly set from the loaded dataset. |\n| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. |\n| contamination | [float](`float`) | IsolationForest contamination fraction. |\n| imputation_method | [str](`str`) | Gap-filling strategy (``\"weighted\"`` or ``\"linear\"``). |\n| window_size | [int](`int`) | Rolling window size for weighted imputation. |\n| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. |\n| latitude | [float](`float`) | Location latitude. |\n| longitude | [float](`float`) | Location longitude. |\n| timezone | [str](`str`) | IANA timezone string. |\n| state | [str](`str`) | Subdivision code for regional holidays. |\n| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. |\n| include_holiday_features | [bool](`bool`) | Holiday feature toggle. |\n| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. |\n| include_ephemeris_features | [bool](`bool`) | Solar-elevation and daylight-duration feature toggle. Defaults to ``False``. |\n| include_day_type_features | [bool](`bool`) | Working-day / day-type class feature toggle. Defaults to ``False``. |\n| include_school_holiday_features | [bool](`bool`) | Per-Bundesland school-holiday indicator toggle. Defaults to ``False``. |\n| per_zone_weather | [bool](`bool`) | When True, each target is a TSO control zone that fetches its own regional weather via ``weather.locations.locations_for_zone``. Mutually exclusive with ``use_population_weighted_weather``; requires ``use_exogenous_features=True``; not compatible with ``poly_features_degree >= 2``. Default ``False``. |\n| zone_weather_locations | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Override mapping from zone key to a list of ``WeatherLocation`` objects. ``None`` uses the built-in ``GERMAN_TSO_ZONE_CITIES`` partition. |\n| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). |\n| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). |\n| poly_mi_n_jobs | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). |\n| poly_mi_sample_size | [Optional](`typing.Optional`)\\[[int](`int`)\\] | Row cap for the MI ranking (``None`` = score every row). |\n| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. |\n| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. |\n| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. |\n| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). |\n| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). |\n| include_football_match_window | [bool](`bool`) | Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026. |\n| include_energy_saving_window | [bool](`bool`) | Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise). |\n| index_name | [str](`str`) | Datetime column name used when resetting the index. |\n| bounds | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[tuple](`tuple`)\\]\\] | Per-column outlier bounds ``(lower, upper)``. |\n| verbose | [bool](`bool`) | Verbose output toggle. |\n| cache_home | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Path to the cache directory. |\n| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. |\n| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. |\n| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. |\n| max_time_spotoptim | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Wall-clock budget for the SpotOptim search in minutes; ``None`` disables the limit. |\n| warm_start_lags | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[int](`int`)\\]\\] | Seed lag set for the SpotOptim search; ``None`` or empty disables the warm start. |\n| task | [str](`str`) | Active prediction task (``\"lazy\"``, ``\"training\"``, ``\"optuna\"``, or ``\"spotoptim\"``). |\n| agg_weights | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[float](`float`)\\]\\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. |\n| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. |\n| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. |\n| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. |\n| on_weather_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Open-Meteo fetch-failure policy: ``\"raise\"`` aborts, ``\"skip\"`` continues without weather. |\n| on_exog_provider_failure | [Literal](`typing.Literal`)\\[\\'raise\\', \\'skip\\'\\] | Exog-provider failure policy in ``ExogBuilder.build``: ``\"raise\"`` (default) propagates the ``ExogProviderError``; ``\"skip\"`` logs and omits the failing provider's columns. |\n| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). |\n| exog_provider_window | [Literal](`typing.Literal`)\\[\\'full\\', \\'train\\'\\] | Validation window for exog providers: ``\"full\"`` (default) or ``\"train\"``. |\n\n## Notes {.doc-section .doc-section-notes}\n\nThe default period configurations use specific `n_periods` to balance resolution and smoothing:\n- **Daily**: `n_periods=12` (24h) provides ~2h resolution, smoothing hourly noise and halving dimensionality.\n- **Weekly**: `n_periods` typically matches range (1:1) to distinguish day-of-week patterns.\n- **Yearly**: `n_periods=12` (365d) provides ~1 month resolution, capturing broad seasonal trends without overfitting.\n\nSee `docs/PERIOD_CONFIGURATION_RATIONALE.md` for a detailed analysis.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#5ae8ee5f .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\nprint(f\"Targets (default): {config.targets}\")\nprint(f\"agg_weights (default): {config.agg_weights}\")\nprint(f\"index_name: {config.index_name}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Set targets and bounds (user input that stays on the config)\nconfig.targets = [\"A\", \"B\", \"C\"]\nconfig.bounds = [(-2500, 4500), (-10, 3000)]\nprint(f\"Targets (after setting): {config.targets}\")\nprint(f\"bounds: {config.bounds}\")\n\n# Create custom configuration — country_code serves both API and holiday purposes\ncustom_config = ConfigMulti(\n country_code='FR',\n predict_size=48,\n random_state=42,\n targets=[\"A\", \"B\"],\n index_name=\"DateTime\",\n)\nprint(f\"country_code: {custom_config.country_code}\")\nprint(f\"Predict size: {custom_config.predict_size}\")\nprint(f\"Random state: {custom_config.random_state}\")\nprint(f\"Targets: {custom_config.targets}\")\n\n# Verify training window\nprint(f\"Training window: {config.train_size == pd.Timedelta(days=3 * 365)}\")\n\n# Check default periods\nprint(f\"Number of periods: {len(config.periods)}\")\nprint(f\"First period name: {config.periods[0].name}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: DE\nPredict size: 24\nRandom state: 314159\nTargets (default): None\nagg_weights (default): None\nindex_name: DateTime\nbounds: None\nTargets (after setting): ['A', 'B', 'C']\nbounds: [(-2500, 4500), (-10, 3000)]\ncountry_code: FR\nPredict size: 48\nRandom state: 42\nTargets: ['A', 'B']\nTraining window: True\nNumber of periods: 5\nFirst period name: daily\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [get_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params) | Get parameters for this configuration object. |\n| [set_params](#spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params) | Set the parameters of this configuration object. |\n\n### get_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.get_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.get_params(deep=True)\n```\n\nGet parameters for this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------|-----------------------------------------------------------------------------------------------------------|-----------|\n| deep | [bool](`bool`) | If True, will return the parameters for this configuration and contained sub-objects that are estimators. | `True` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-----------------------------------------------------------|-------------------------------------------------------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Dictionary of parameter names mapped to their values. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#d2bd7221 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti(country_code=\"FR\")\np = config.get_params()\nprint(f\"country_code: {p['country_code']}\")\nprint(f\"Predict size: {p['predict_size']}\")\nprint(f\"Random state: {p['random_state']}\")\nprint(f\"index_name: {p['index_name']}\")\nprint(f\"bounds: {p['bounds']}\")\nprint(f\"agg_weights: {p['agg_weights']}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 24\nRandom state: 314159\nindex_name: DateTime\nbounds: None\nagg_weights: None\n```\n:::\n:::\n\n\n### set_params { #spotforecast2_safe.configurator.config_multi.ConfigMulti.set_params }\n\n```python\nconfigurator.config_multi.ConfigMulti.set_params(params=None, **kwargs)\n```\n\nSet the parameters of this configuration object.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|----------|-----------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| params | [Dict](`typing.Dict`)\\[[str](`str`), [object](`object`)\\] | Optional dictionary of parameter names mapped to their new values. | `None` |\n| **kwargs | [object](`object`) | Additional parameter names mapped to their new values. It supports configuring nested 'Period' objects using the `periods____` notation. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|-------------|---------------------------------------------------------------------------|--------------------------------------------------------------------------------|\n| ConfigMulti | [ConfigMulti](`spotforecast2_safe.configurator.config_multi.ConfigMulti`) | The configuration instance with updated parameters (supports method chaining). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#f9d5b7f4 .cell execution_count=3}\n``` {.python .cell-code}\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\nconfig = ConfigMulti()\n_ = config.set_params(country_code=\"FR\", predict_size=48)\nprint(f\"country_code: {config.country_code}\")\nprint(f\"Predict size: {config.predict_size}\")\nprint(f\"Random state: {config.random_state}\")\n\n# Deep parameter setting\n_ = config.set_params(periods__daily__n_periods=24)\nprint(next(p.n_periods for p in config.periods if p.name == \"daily\"))\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ncountry_code: FR\nPredict size: 48\nRandom state: 314159\n24\n```\n:::\n:::\n\n\n", "supporting": [ "configurator.config_multi.ConfigMulti_files/figure-html" ], diff --git a/_freeze/docs/reference/manager.features.get_target_data/execute-results/html.json b/_freeze/docs/reference/manager.features.get_target_data/execute-results/html.json index d308afe2..2a0e9543 100644 --- a/_freeze/docs/reference/manager.features.get_target_data/execute-results/html.json +++ b/_freeze/docs/reference/manager.features.get_target_data/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "21aeedf16b5a2b156b9649bf82ad5b19", + "hash": "68c696314df7eabc904a56c80e041607", "result": { "engine": "jupyter", - "markdown": "---\ntitle: manager.features.get_target_data\n---\n\n\n\n```python\nmanager.features.get_target_data(\n target,\n df_pipeline,\n config,\n *,\n data_with_exog=None,\n exog_feature_names=None,\n exo_pred=None,\n start_train_ts,\n end_train_ts,\n)\n```\n\nExtract the training series and exogenous slices for one target column.\n\nClips the target column of *df_pipeline* to the training window defined by\n*start_train_ts* and *end_train_ts*. When exogenous features are enabled\n(``config.use_exogenous_features is True``) and *data_with_exog* is\nprovided, the matching exogenous training slice and forecast-horizon slice\nare also returned; otherwise both are ``None``.\n\nThis function is the canonical way to extract per-target data from the\nshared pipeline state so that outlier removal, imputation, and feature\nengineering are applied consistently across all forecasting tasks.\n\nThe training-window timestamps are supplied as explicit parameters so that\nthis helper stays decoupled from ``RunState`` (ADR\n``adr-multitask-configmulti-merge``, step 5). Both parameters are\nrequired; passing ``None`` raises ``ValueError``.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the target column to extract from *df_pipeline*. | _required_ |\n| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame with a tz-aware `DatetimeIndex` containing all target columns produced by the preprocessing pipeline. | _required_ |\n| config | \\'ConfigMulti\\' | Pipeline configuration object. ``use_exogenous_features`` must be set. | _required_ |\n| data_with_exog | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Merged DataFrame of target and exogenous columns covering at least the training window. Required when ``config.use_exogenous_features`` is ``True``. Pass ``None`` (default) to skip exogenous slicing. | `None` |\n| exog_feature_names | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Column names to select from *data_with_exog* and *exo_pred*. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` |\n| exo_pred | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Exogenous feature DataFrame covering the forecast horizon. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` |\n| start_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive start of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.start_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ |\n| end_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive end of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.end_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]]: |\n| | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | A three-tuple ``(y_train, exog_train, exog_future)`` where: |\n| | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | - **y_train** — 1-D Series with the target values over the training window, squeezed to a plain `Series`. |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[Series](`pandas.Series`), [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\], [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\]\\] | - **exog_train** — DataFrame of selected exogenous features over the training window, cast to ``float32``. ``None`` when exogenous features are disabled or *data_with_exog* is ``None``. |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[Series](`pandas.Series`), [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\], [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\]\\] | - **exog_future** — DataFrame of selected exogenous features covering the forecast horizon, cast to ``float32``. ``None`` when exogenous features are disabled or *exo_pred* is ``None``. |\n\n## Examples {.doc-section .doc-section-examples}\n\nExtract training data for a single target without exogenous features:\n\n\n::: {#e497f432 .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.manager.features import get_target_data\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nidx = pd.date_range(\"2024-01-01\", periods=168, freq=\"h\", tz=\"UTC\")\ndf_pipeline = pd.DataFrame({\"load\": np.random.default_rng(0).normal(100, 10, 168)}, index=idx)\n\nconfig = ConfigMulti(\n targets=[\"load\"],\n use_exogenous_features=False,\n)\nstart_ts = pd.Timestamp(\"2024-01-01 00:00\", tz=\"UTC\")\nend_ts = pd.Timestamp(\"2024-01-07 23:00\", tz=\"UTC\")\n\ny_train, exog_train, exog_future = get_target_data(\n target=\"load\",\n df_pipeline=df_pipeline,\n config=config,\n start_train_ts=start_ts,\n end_train_ts=end_ts,\n)\nprint(f\"y_train length: {len(y_train)}\")\nprint(f\"exog_train: {exog_train}\")\nprint(f\"exog_future: {exog_future}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ny_train length: 168\nexog_train: None\nexog_future: None\n```\n:::\n:::\n\n\nExtract training data with exogenous features enabled:\n\n::: {#9872f650 .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.manager.features import get_target_data\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(1)\nidx_train = pd.date_range(\"2024-01-01\", periods=168, freq=\"h\", tz=\"UTC\")\nidx_future = pd.date_range(\"2024-01-08\", periods=24, freq=\"h\", tz=\"UTC\")\n\ndf_pipeline = pd.DataFrame({\"load\": rng.normal(100, 10, 168)}, index=idx_train)\n\ndata_with_exog = pd.DataFrame(\n {\n \"load\": df_pipeline[\"load\"],\n \"hour_sin\": np.sin(2 * np.pi * idx_train.hour / 24),\n \"hour_cos\": np.cos(2 * np.pi * idx_train.hour / 24),\n },\n index=idx_train,\n)\nexo_pred = pd.DataFrame(\n {\n \"hour_sin\": np.sin(2 * np.pi * idx_future.hour / 24),\n \"hour_cos\": np.cos(2 * np.pi * idx_future.hour / 24),\n },\n index=idx_future,\n)\n\nstart_ts = pd.Timestamp(\"2024-01-01 00:00\", tz=\"UTC\")\nend_ts = pd.Timestamp(\"2024-01-07 23:00\", tz=\"UTC\")\nconfig = ConfigMulti(targets=[\"load\"], use_exogenous_features=True)\n\ny_train, exog_train, exog_future = get_target_data(\n target=\"load\",\n df_pipeline=df_pipeline,\n config=config,\n data_with_exog=data_with_exog,\n exog_feature_names=[\"hour_sin\", \"hour_cos\"],\n exo_pred=exo_pred,\n start_train_ts=start_ts,\n end_train_ts=end_ts,\n)\nprint(f\"y_train length: {len(y_train)}\")\nprint(f\"exog_train shape: {exog_train.shape}\")\nprint(f\"exog_future shape: {exog_future.shape}\")\nprint(f\"exog_train dtype: {exog_train.dtypes.iloc[0]}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ny_train length: 168\nexog_train shape: (168, 2)\nexog_future shape: (24, 2)\nexog_train dtype: float32\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: manager.features.get_target_data\n---\n\n\n\n```python\nmanager.features.get_target_data(\n target,\n df_pipeline,\n config,\n *,\n data_with_exog=None,\n exog_feature_names=None,\n exo_pred=None,\n zone_weather=None,\n start_train_ts,\n end_train_ts,\n)\n```\n\nExtract the training series and exogenous slices for one target column.\n\nClips the target column of *df_pipeline* to the training window defined by\n*start_train_ts* and *end_train_ts*. When exogenous features are enabled\n(``config.use_exogenous_features is True``) and *data_with_exog* is\nprovided, the matching exogenous training slice and forecast-horizon slice\nare also returned; otherwise both are ``None``.\n\nThis function is the canonical way to extract per-target data from the\nshared pipeline state so that outlier removal, imputation, and feature\nengineering are applied consistently across all forecasting tasks.\n\nThe training-window timestamps are supplied as explicit parameters so that\nthis helper stays decoupled from ``RunState`` (ADR\n``adr-multitask-configmulti-merge``, step 5). Both parameters are\nrequired; passing ``None`` raises ``ValueError``.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the target column to extract from *df_pipeline*. | _required_ |\n| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame with a tz-aware `DatetimeIndex` containing all target columns produced by the preprocessing pipeline. | _required_ |\n| config | \\'ConfigMulti\\' | Pipeline configuration object. ``use_exogenous_features`` must be set. | _required_ |\n| data_with_exog | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Merged DataFrame of target and exogenous columns covering at least the training window. Required when ``config.use_exogenous_features`` is ``True``. Pass ``None`` (default) to skip exogenous slicing. | `None` |\n| exog_feature_names | [Optional](`typing.Optional`)\\[[List](`typing.List`)\\[[str](`str`)\\]\\] | Column names to select from *data_with_exog* and *exo_pred*. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` |\n| exo_pred | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Exogenous feature DataFrame covering the forecast horizon. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` |\n| zone_weather | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Per-zone weather frame whose columns, where present in ``exog_feature_names``, overwrite the shared weather values for this target. Used by the per-zone weather feature (``config.per_zone_weather=True``). ``None`` (default) means the shared weather columns are used unchanged. The overwrite is in-place on the sliced copies; column order and shape are preserved. | `None` |\n| start_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive start of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.start_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ |\n| end_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive end of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.end_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| | [pd](`pandas`).[Series](`pandas.Series`) | Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]]: |\n| | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | A three-tuple ``(y_train, exog_train, exog_future)`` where: |\n| | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | - **y_train** — 1-D Series with the target values over the training window, squeezed to a plain `Series`. |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[Series](`pandas.Series`), [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\], [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\]\\] | - **exog_train** — DataFrame of selected exogenous features over the training window, cast to ``float32``. ``None`` when exogenous features are disabled or *data_with_exog* is ``None``. |\n| | [Tuple](`typing.Tuple`)\\[[pd](`pandas`).[Series](`pandas.Series`), [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\], [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\]\\] | - **exog_future** — DataFrame of selected exogenous features covering the forecast horizon, cast to ``float32``. ``None`` when exogenous features are disabled or *exo_pred* is ``None``. |\n\n## Examples {.doc-section .doc-section-examples}\n\nExtract training data for a single target without exogenous features:\n\n\n::: {#e88fd84e .cell execution_count=1}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.manager.features import get_target_data\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nidx = pd.date_range(\"2024-01-01\", periods=168, freq=\"h\", tz=\"UTC\")\ndf_pipeline = pd.DataFrame({\"load\": np.random.default_rng(0).normal(100, 10, 168)}, index=idx)\n\nconfig = ConfigMulti(\n targets=[\"load\"],\n use_exogenous_features=False,\n)\nstart_ts = pd.Timestamp(\"2024-01-01 00:00\", tz=\"UTC\")\nend_ts = pd.Timestamp(\"2024-01-07 23:00\", tz=\"UTC\")\n\ny_train, exog_train, exog_future = get_target_data(\n target=\"load\",\n df_pipeline=df_pipeline,\n config=config,\n start_train_ts=start_ts,\n end_train_ts=end_ts,\n)\nprint(f\"y_train length: {len(y_train)}\")\nprint(f\"exog_train: {exog_train}\")\nprint(f\"exog_future: {exog_future}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ny_train length: 168\nexog_train: None\nexog_future: None\n```\n:::\n:::\n\n\nExtract training data with exogenous features enabled:\n\n::: {#0fee6abc .cell execution_count=2}\n``` {.python .cell-code}\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.manager.features import get_target_data\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(1)\nidx_train = pd.date_range(\"2024-01-01\", periods=168, freq=\"h\", tz=\"UTC\")\nidx_future = pd.date_range(\"2024-01-08\", periods=24, freq=\"h\", tz=\"UTC\")\n\ndf_pipeline = pd.DataFrame({\"load\": rng.normal(100, 10, 168)}, index=idx_train)\n\ndata_with_exog = pd.DataFrame(\n {\n \"load\": df_pipeline[\"load\"],\n \"hour_sin\": np.sin(2 * np.pi * idx_train.hour / 24),\n \"hour_cos\": np.cos(2 * np.pi * idx_train.hour / 24),\n },\n index=idx_train,\n)\nexo_pred = pd.DataFrame(\n {\n \"hour_sin\": np.sin(2 * np.pi * idx_future.hour / 24),\n \"hour_cos\": np.cos(2 * np.pi * idx_future.hour / 24),\n },\n index=idx_future,\n)\n\nstart_ts = pd.Timestamp(\"2024-01-01 00:00\", tz=\"UTC\")\nend_ts = pd.Timestamp(\"2024-01-07 23:00\", tz=\"UTC\")\nconfig = ConfigMulti(targets=[\"load\"], use_exogenous_features=True)\n\ny_train, exog_train, exog_future = get_target_data(\n target=\"load\",\n df_pipeline=df_pipeline,\n config=config,\n data_with_exog=data_with_exog,\n exog_feature_names=[\"hour_sin\", \"hour_cos\"],\n exo_pred=exo_pred,\n start_train_ts=start_ts,\n end_train_ts=end_ts,\n)\nprint(f\"y_train length: {len(y_train)}\")\nprint(f\"exog_train shape: {exog_train.shape}\")\nprint(f\"exog_future shape: {exog_future.shape}\")\nprint(f\"exog_train dtype: {exog_train.dtypes.iloc[0]}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ny_train length: 168\nexog_train shape: (168, 2)\nexog_future shape: (24, 2)\nexog_train dtype: float32\n```\n:::\n:::\n\n\n", "supporting": [ "manager.features.get_target_data_files/figure-html" ], diff --git a/_freeze/docs/reference/multitask.base.BaseTask/execute-results/html.json b/_freeze/docs/reference/multitask.base.BaseTask/execute-results/html.json index 4c5e5ab8..89fac421 100644 --- a/_freeze/docs/reference/multitask.base.BaseTask/execute-results/html.json +++ b/_freeze/docs/reference/multitask.base.BaseTask/execute-results/html.json @@ -1,8 +1,8 @@ { - "hash": "25d10dc1682aae1b5342eabd5adefdba", + "hash": "f47f584444e30cd7debc56acc5bd4ea4", "result": { "engine": "jupyter", - "markdown": "---\ntitle: multitask.base.BaseTask\n---\n\n\n\n```python\nmultitask.base.BaseTask(\n config=None,\n *,\n dataframe=None,\n data_test=None,\n cache_home=None,\n log_level=logging.INFO,\n **overrides,\n)\n```\n\nShared base for all multi-target forecasting pipeline tasks.\n\n``BaseTask`` encapsulates the data-preparation pipeline (steps 1-7)\nand all helper methods shared across the task modes (lazy, defaults,\npredict, clean). Subclasses implement the run method with task-specific\ntraining or prediction logic.\n\nThe constructor takes a single ``config`` object satisfying the\n``PipelineConfig`` protocol — typically a ``ConfigMulti``. All pipeline\nparameters (forecast horizon, training window, outlier policy, weather/\nholiday hooks, cross-validation fold count, persistence policy, ...) live\non that object. Only genuinely call-time state (the dataframes, the cache\ndirectory override, the logging level) is passed as separate kwargs.\nExtra ``**overrides`` are forwarded to ``config.set_params`` and mutate\nthe passed-in config in place.\n\nPlotting is not available in ``spotforecast2-safe``. The\n``_show_prediction_figure`` and ``_show_prediction_figure_agg`` hook\nmethods are no-ops; override them in a subclass or use the\n``spotforecast2`` sibling package for interactive visualisation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|\n| config | [Optional](`typing.Optional`)\\[[PipelineConfig](`spotforecast2_safe.multitask.base.PipelineConfig`)\\] | A ``PipelineConfig``-conforming object owning every pipeline parameter. ``ConfigMulti`` satisfies the protocol. | `None` |\n| dataframe | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded input DataFrame with training data. Must contain a datetime column matching ``config.index_name`` plus at least one numeric target column. | `None` |\n| data_test | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded test DataFrame (ground truth for the forecast horizon). Optional. | `None` |\n| cache_home | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Cache directory override. When not ``None``, replaces ``config.cache_home`` for this task instance. | `None` |\n| log_level | [int](`int`) | Logging level for the pipeline logger. | `logging.INFO` |\n| **overrides | [Any](`typing.Any`) | Forwarded to ``config.set_params(**overrides)`` — a convenience for one-line tweaks without building a fresh config. Mutates the caller's config object. | `{}` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|----------------------------------------------------------------------|--------------------------------------------------------|\n| config | [PipelineConfig](`spotforecast2_safe.multitask.base.PipelineConfig`) | Centralised pipeline configuration. |\n| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Pipeline DataFrame after preparation. |\n| df_test | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Test DataFrame (ground truth). |\n| weight_func | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Sample-weight function from imputation. |\n| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Combined exogenous feature matrix. |\n| exog_feature_names | [List](`typing.List`)\\[[str](`str`)\\] | Selected exogenous feature names. |\n| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Merged target + exogenous data. |\n| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates for the forecast horizon. |\n| results | [Dict](`typing.Dict`)\\[[str](`str`), [Dict](`typing.Dict`)\\] | Per-task mapping of target name to prediction package. |\n| agg_results | [Dict](`typing.Dict`) | Mapping of task name to aggregated prediction package. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#e686c3d0 .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask.base import BaseTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 7, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"load\": rng.normal(500, 30, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n task = BaseTask(cfg, dataframe=df)\n print(f\"Task mode: {task.TASK}\")\n print(f\"Config predict_size: {task.config.predict_size}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nTask mode: lazy\nConfig predict_size: 6\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [agg_predictor](#spotforecast2_safe.multitask.base.BaseTask.agg_predictor) | Aggregate per-target prediction packages into a weighted forecast. |\n| [build_exogenous_features](#spotforecast2_safe.multitask.base.BaseTask.build_exogenous_features) | Build, combine, encode, and merge exogenous feature covariates. |\n| [create_forecaster](#spotforecast2_safe.multitask.base.BaseTask.create_forecaster) | Create a fresh forecaster for the given target. |\n| [cv_ts](#spotforecast2_safe.multitask.base.BaseTask.cv_ts) | Build a ``TimeSeriesFold`` for cross-validation. |\n| [detect_outliers](#spotforecast2_safe.multitask.base.BaseTask.detect_outliers) | Apply hard-bound filtering and IsolationForest outlier detection. |\n| [impute](#spotforecast2_safe.multitask.base.BaseTask.impute) | Fill missing values using the configured imputation strategy. |\n| [load_models](#spotforecast2_safe.multitask.base.BaseTask.load_models) | Load the most recent fitted models from the cache directory. |\n| [load_tuning_results](#spotforecast2_safe.multitask.base.BaseTask.load_tuning_results) | Load the most recent tuning results for a target from cache. |\n| [log_summary](#spotforecast2_safe.multitask.base.BaseTask.log_summary) | Log a summary of the current pipeline configuration. |\n| [plot_with_outliers](#spotforecast2_safe.multitask.base.BaseTask.plot_with_outliers) | Visualise original vs. cleaned data with outlier markers. |\n| [prepare_data](#spotforecast2_safe.multitask.base.BaseTask.prepare_data) | Load, resample, validate, and configure the pipeline data. |\n| [run](#spotforecast2_safe.multitask.base.BaseTask.run) | Execute the task-specific training / prediction pipeline. |\n| [save_models](#spotforecast2_safe.multitask.base.BaseTask.save_models) | Save fitted forecaster models to the cache directory. |\n| [save_tuning_results](#spotforecast2_safe.multitask.base.BaseTask.save_tuning_results) | Save tuning results (best parameters and lags) to a JSON file. |\n\n### agg_predictor { #spotforecast2_safe.multitask.base.BaseTask.agg_predictor }\n\n```python\nmultitask.base.BaseTask.agg_predictor(results, targets, weights)\n```\n\nAggregate per-target prediction packages into a weighted forecast.\n\nDelegates to the module-level ``agg_predictor`` function.\nAvailable as an instance method so that subclasses can override the\naggregation strategy when needed.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|---------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|------------|\n| results | [Dict](`typing.Dict`)\\[[str](`str`), [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Mapping of target name to prediction package (as returned by ``build_prediction_package``). | _required_ |\n| targets | [List](`typing.List`)\\[[str](`str`)\\] | Ordered list of target names to include. | _required_ |\n| weights | [List](`typing.List`)\\[[float](`float`)\\] | Per-target aggregation weights aligned with ``targets``. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|-------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Aggregated prediction package dict. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#5a996f74 .cell execution_count=2}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx_train = pd.date_range(\"2023-01-01\", periods=48, freq=\"h\", tz=\"UTC\")\nidx_future = pd.date_range(\"2023-01-03\", periods=6, freq=\"h\", tz=\"UTC\")\n\ndef _pkg(train_val, future_val):\n return {\n \"train_actual\": pd.Series(np.full(48, train_val), index=idx_train),\n \"train_pred\": pd.Series(np.full(48, train_val * 0.99), index=idx_train),\n \"future_pred\": pd.Series(np.full(6, future_val), index=idx_future),\n \"future_actual\": pd.Series(dtype=\"float64\"),\n }\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(cache_home=tmp, verbose=False)\n task = LazyTask(cfg)\n results = {\"wind\": _pkg(100.0, 110.0), \"solar\": _pkg(200.0, 210.0)}\n agg = task.agg_predictor(results, [\"wind\", \"solar\"], [0.4, 0.6])\n print(f\"Weighted future_pred: {agg['future_pred'].iloc[0]:.1f}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nWeighted future_pred: 170.0\n```\n:::\n:::\n\n\n### build_exogenous_features { #spotforecast2_safe.multitask.base.BaseTask.build_exogenous_features }\n\n```python\nmultitask.base.BaseTask.build_exogenous_features()\n```\n\nBuild, combine, encode, and merge exogenous feature covariates.\n\nThis is step 4-7 of the pipeline (run after ``prepare_data``,\n``detect_outliers``, and ``impute``). It assembles the full\nexogenous-covariate matrix that the forecaster consumes, then merges\nit onto the target data. The orchestration proceeds in order:\n\n* 4a — Weather, via ``get_weather_features`` (Open-Meteo). The\n response is parquet-cached only when ``config.cache_home`` is set.\n Fetch failures are handled per ``config.on_weather_failure``:\n ``\"raise\"`` re-raises ``WeatherFetchError``; ``\"skip\"`` logs a\n warning and continues with an empty weather frame (fail-safe).\n* 4b — Calendar features, via ``get_calendar_features``.\n* 4c — Day/night (solar) features, via ``get_day_night_features``\n (computed with ``astral`` from ``config.latitude`` /\n ``config.longitude``).\n* 4d — Holiday features, via ``get_holiday_features`` for\n ``config.country_code`` / ``config.state``.\n* 5 — The four frames are concatenated along the columns and any\n residual gaps are back- then forward-filled. Provider-based\n exogenous columns are then appended via\n ``build_providers_from_config`` (requires ``spotforecast2-safe``\n >= 15.7.0). The active providers are governed by the config flags\n ``include_covid_infection_rate``,\n ``include_entsoe_forecast_load``,\n ``include_entsoe_renewable_forecast``,\n ``include_entsoe_net_load``, and\n ``include_entsoe_day_ahead_price``. Cyclical (sine/cosine)\n encoding is then applied via ``apply_cyclical_encoding``, and\n degree-``config.poly_features_degree`` interaction terms are added\n via ``create_interaction_features``. When the degree is at least\n 2, the polynomial columns are ranked by mutual information with the\n primary target and capped to ``config.max_poly_features`` via\n ``select_top_poly_features``.\n* 6 — The training feature set is chosen via\n ``select_exogenous_features``, with provider columns appended\n (order-preserving, de-duplicated).\n* 7 — Targets and covariates are merged via\n ``merge_data_and_covariates`` into ``self.data_with_exog`` and the\n forecast-horizon covariates ``self.exo_pred``.\n\nWhen ``config.use_exogenous_features`` is ``False`` the method is a\nno-op and returns ``self`` immediately, leaving the pipeline\ntarget-only.\n\n#### Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|------------------------------------------------|-------------------------------------------------------------------------------------------------|\n| weather_aligned | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Weather frame aligned to the pipeline index, reused by the interaction and selection steps. |\n| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Full combined, encoded, and capped exogenous feature matrix. |\n| exog_feature_names | [List](`typing.List`)\\[[str](`str`)\\] | Names of the exogenous features selected for training (including provider columns). |\n| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Target data merged with the selected exogenous covariates. |\n| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates spanning the forecast horizon, supplied to the forecaster at predict time. |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------|-----------------------------------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If ``prepare_data`` has not been called. |\n| | [WeatherFetchError](`spotforecast2_safe.weather.WeatherFetchError`) | If the Open-Meteo fetch fails and ``config.on_weather_failure == \"raise\"``. |\n\n#### Examples {.doc-section .doc-section-examples}\n\nWith exogenous features disabled the method is a no-op, so the\nexample below runs without any network access and leaves the\npipeline target-only.\n\n::: {#11e8ae02 .cell execution_count=3}\n``` {.python .cell-code}\nimport tempfile\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n print(f\"Exogenous features used: {mt.config.use_exogenous_features}\")\n print(f\"Selected exog feature names: {mt.exog_feature_names}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nExogenous features used: False\nSelected exog feature names: []\n```\n:::\n:::\n\n\n### create_forecaster { #spotforecast2_safe.multitask.base.BaseTask.create_forecaster }\n\n```python\nmultitask.base.BaseTask.create_forecaster(target=None)\n```\n\nCreate a fresh forecaster for the given target.\n\nDelegates to ``config.forecaster_factory`` when set; otherwise falls\nback to ``default_lgbm_forecaster_factory``. This factory hook lets\ncallers swap the estimator without subclassing ``BaseTask``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|-----------------------------------------------|------------------------------------------------------------------------------------------------------------|-----------|\n| target | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Optional target column name. Forwarded to the factory so that custom factories can specialise per target. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------|--------------------------------------|\n| | [Any](`typing.Any`) | A new, unfitted forecaster instance. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#f3df1cc5 .cell execution_count=4}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n cache_home=Path(tmp),\n )\n task = LazyTask(cfg)\n forecaster = task.create_forecaster()\nprint(f\"Type: {type(forecaster).__name__}\")\nprint(f\"Lags: {forecaster.lags}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType: ForecasterRecursive\nLags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]\n```\n:::\n:::\n\n\n### cv_ts { #spotforecast2_safe.multitask.base.BaseTask.cv_ts }\n\n```python\nmultitask.base.BaseTask.cv_ts(y_train)\n```\n\nBuild a ``TimeSeriesFold`` for cross-validation.\n\nConstructs the cross-validation splitter used by all tuning tasks.\nInternally uses ``sklearn.model_selection.TimeSeriesSplit`` to\ncompute split boundaries that respect temporal ordering and avoid\ndata leakage between folds.\n\nThe validation boundary is determined by ``run_state.end_train_ts`` minus\n``config.delta_val``. When ``config.train_size`` is set, the sklearn\nsplitter uses a sliding fixed-size training window\n(``max_train_size``); otherwise an expanding window is used.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| y_train | [pd](`pandas`).[Series](`pandas.Series`) | Training time series for the current target. Used both to determine the validation boundary and as the sequence passed to ``TimeSeriesSplit.split`` to derive ``initial_train_size``. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------------------------|----------------------------------------------------------------|\n| | [TimeSeriesFold](`spotforecast2_safe.splitter.split_ts_cv.TimeSeriesFold`) | A configured ``TimeSeriesFold`` instance ready to be passed to |\n| | [TimeSeriesFold](`spotforecast2_safe.splitter.split_ts_cv.TimeSeriesFold`) | a model-selection function. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#3e08117f .cell execution_count=5}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n number_folds=2,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n y_train = mt.df_pipeline[\"a\"]\n cv = mt.cv_ts(y_train)\n print(f\"TimeSeriesFold steps: {cv.steps}\")\n print(f\"initial_train_size: {cv.initial_train_size}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nTimeSeriesFold steps: 6\ninitial_train_size: 324\n```\n:::\n:::\n\n\n### detect_outliers { #spotforecast2_safe.multitask.base.BaseTask.detect_outliers }\n\n```python\nmultitask.base.BaseTask.detect_outliers()\n```\n\nApply hard-bound filtering and IsolationForest outlier detection.\n\nHard bounds from ``config.bounds`` are applied to the pipeline data\n(out-of-bound values are removed and later filled by ``impute()``).\nIsolationForest detection (``config.use_outlier_detection``) is\nadvisory: detected outliers are logged per column but not removed.\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``prepare_data`` has not been called. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#168e8951 .cell execution_count=6}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data()\n mt.detect_outliers()\n print(f\"Pipeline shape: {mt.df_pipeline.shape}\")\n assert mt.df_pipeline_original is not None\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPipeline shape: (336, 1)\n```\n:::\n:::\n\n\n### impute { #spotforecast2_safe.multitask.base.BaseTask.impute }\n\n```python\nmultitask.base.BaseTask.impute()\n```\n\nFill missing values using the configured imputation strategy.\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``prepare_data`` has not been called. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#559a6dd2 .cell execution_count=7}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\nvalues = rng.normal(100, 10, len(idx))\nvalues[10:13] = float(\"nan\") # inject a few gaps\ndf = pd.DataFrame({\"a\": values}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute()\n missing = mt.df_pipeline[\"a\"].isna().sum()\n print(f\"Missing values after imputation: {missing}\")\n assert missing == 0\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMissing values after imputation: 0\n```\n:::\n:::\n\n\n### load_models { #spotforecast2_safe.multitask.base.BaseTask.load_models }\n\n```python\nmultitask.base.BaseTask.load_models(\n task_name=None,\n target=None,\n max_age_days=None,\n)\n```\n\nLoad the most recent fitted models from the cache directory.\n\nScans ``/models//`` for ``.joblib``\nfiles matching the current ``data_frame_name``. Optionally\nfilters by ``task_name``, ``target``, and ``max_age_days``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|---------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only load models from this task (``\"lazy\"``, ``\"defaults\"``, ``\"optuna\"``, or ``\"spotoptim\"``). ``None`` accepts any task. | `None` |\n| target | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only load the model for this target column. ``None`` loads the most recent model for every target found. | `None` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days. Models older than this are ignored. ``None`` accepts any age. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|-----------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Mapping ``{target: forecaster}`` of loaded model objects. |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Empty dict if no matching models were found. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#97daebbb .cell execution_count=8}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n data_frame_name=\"demo\",\n cache_home=Path(tmp),\n verbose=False,\n )\n task = LazyTask(cfg)\n # Save a dummy object, then load it back.\n dummy_forecaster = {\"lags\": [1, 2, 24]}\n task.save_models(\n task_name=\"lazy\",\n forecasters={\"load\": dummy_forecaster},\n )\n loaded = task.load_models(task_name=\"lazy\")\n print(f\"Loaded targets: {list(loaded.keys())}\")\n assert loaded[\"load\"][\"lags\"] == [1, 2, 24]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLoaded targets: ['load']\n```\n:::\n:::\n\n\n### load_tuning_results { #spotforecast2_safe.multitask.base.BaseTask.load_tuning_results }\n\n```python\nmultitask.base.BaseTask.load_tuning_results(\n target,\n task_name=None,\n max_age_days=None,\n)\n```\n\nLoad the most recent tuning results for a target from cache.\n\nScans ``/tuning_results/`` for files matching the\ncurrent ``data_frame_name`` and ``target``. Optionally filters by\n``task_name`` and discards results older than ``max_age_days``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|---------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the forecast target column. | _required_ |\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only consider results from this tuning algorithm (e.g. ``\"optuna\"`` or ``\"spotoptim\"``). ``None`` accepts any algorithm. | `None` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days. Results older than this are ignored. ``None`` accepts any age. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------|\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | A dictionary with keys ``best_params``, ``best_lags``, |\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | ``task_name``, ``target``, ``data_frame_name``, and |\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | ``timestamp``; or ``None`` if no matching file was found. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#e3a55e96 .cell execution_count=9}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(data_frame_name=\"demo10\", cache_home=Path(tmp))\n task = LazyTask(cfg)\n task.save_tuning_results(\n target=\"target_0\",\n task_name=\"optuna\",\n best_params={\"n_estimators\": 100},\n best_lags=24,\n )\n result = task.load_tuning_results(target=\"target_0\")\n print(result[\"best_params\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n{'n_estimators': 100}\n```\n:::\n:::\n\n\n### log_summary { #spotforecast2_safe.multitask.base.BaseTask.log_summary }\n\n```python\nmultitask.base.BaseTask.log_summary()\n```\n\nLog a summary of the current pipeline configuration.\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#dae810e6 .cell execution_count=10}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n # log_summary writes to the pipeline logger; call it to confirm\n # it runs without error.\n mt.log_summary()\n print(\"log_summary completed without error\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nlog_summary completed without error\n```\n:::\n:::\n\n\n### plot_with_outliers { #spotforecast2_safe.multitask.base.BaseTask.plot_with_outliers }\n\n```python\nmultitask.base.BaseTask.plot_with_outliers()\n```\n\nVisualise original vs. cleaned data with outlier markers.\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``detect_outliers`` has not been called. |\n| | [NotImplementedError](`NotImplementedError`) | Always — plotting is not available in ``spotforecast2-safe``. Use the ``spotforecast2`` package for visualisation. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#a4d98db0 .cell execution_count=11}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers()\n try:\n mt.plot_with_outliers()\n except NotImplementedError as exc:\n print(f\"Plotting unavailable in spotforecast2-safe: {exc}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPlotting unavailable in spotforecast2-safe: Plotting is not available in spotforecast2-safe (no plotly/matplotlib). Use the spotforecast2 package for visualisation.\n```\n:::\n:::\n\n\n### prepare_data { #spotforecast2_safe.multitask.base.BaseTask.prepare_data }\n\n```python\nmultitask.base.BaseTask.prepare_data(demo_data=None, df_test=None)\n```\n\nLoad, resample, validate, and configure the pipeline data.\n\nUses the following precedence for the training data:\n\n1. ``demo_data`` argument (if provided).\n2. ``self._dataframe`` set via the constructor.\n\nSimilarly for test data:\n\n1. ``df_test`` argument (if provided).\n2. ``self.data_test`` set via the constructor.\n3. ``self.config.test_data_loader(self.config)`` if set.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|-----------|\n| demo_data | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded input DataFrame. When ``None``, the constructor ``dataframe`` is used. | `None` |\n| df_test | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded test DataFrame. When ``None``, the constructor ``data_test`` is used, then ``config.test_data_loader``. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------|----------------------------------------------------------------------------------|\n| | [ValueError](`ValueError`) | If no data source is available (no ``demo_data``, no constructor ``dataframe``). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#4c639271 .cell execution_count=12}\n``` {.python .cell-code}\nimport tempfile\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data()\n print(f\"Pipeline shape: {mt.df_pipeline.shape}\")\n print(f\"Targets: {mt.run_state.targets}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPipeline shape: (336, 1)\nTargets: ['a']\n```\n:::\n:::\n\n\n### run { #spotforecast2_safe.multitask.base.BaseTask.run }\n\n```python\nmultitask.base.BaseTask.run(\n show=False,\n task=None,\n task_name=None,\n use_tuned_params=True,\n max_age_days=None,\n search_space=None,\n dry_run=False,\n cache_home=None,\n **kwargs,\n)\n```\n\nExecute the task-specific training / prediction pipeline.\n\nSubclasses must override this method.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| show | [bool](`bool`) | If ``True``, invoke the visualisation hooks (no-ops in this package; meaningful only in ``spotforecast2``). | `False` |\n| task | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Task mode override (used by ``MultiTask``). | `None` |\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Restrict model loading to a specific source task (used by ``PredictTask``). | `None` |\n| use_tuned_params | [bool](`bool`) | Load cached tuning results when available (used by ``LazyTask``). | `True` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days for cached results (used by ``LazyTask`` and ``PredictTask``). Freshness is judged against the wall-clock timestamp embedded in the cache filename, so the check is machine-local. | `None` |\n| search_space | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Hyperparameter search-space definition (accepted for API compatibility; not used in this package). | `None` |\n| dry_run | [bool](`bool`) | Report what would be deleted without removing anything (used by ``CleanTask``). | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Override the cache directory (used by ``CleanTask``). | `None` |\n| **kwargs | [Any](`typing.Any`) | Additional task-specific arguments. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|---------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Aggregated prediction package for the task. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------------------------|------------------------------------------|\n| | [NotImplementedError](`NotImplementedError`) | Always, unless overridden by a subclass. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#01b0778d .cell execution_count=13}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask.base import BaseTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\n# BaseTask.run is abstract and always raises NotImplementedError.\n# Concrete subclasses (LazyTask, DefaultsTask, PredictTask, CleanTask)\n# provide the real implementation.\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(cache_home=Path(tmp), verbose=False)\n task = BaseTask(cfg)\n try:\n task.run()\n except NotImplementedError as exc:\n print(f\"Expected: {exc}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nExpected: BaseTask must implement run(). Use LazyTask, DefaultsTask, PredictTask, or CleanTask.\n```\n:::\n:::\n\n\n### save_models { #spotforecast2_safe.multitask.base.BaseTask.save_models }\n\n```python\nmultitask.base.BaseTask.save_models(task_name, forecasters=None)\n```\n\nSave fitted forecaster models to the cache directory.\n\nEach model is serialised with ``joblib`` (compress=3) into\n``/models//`` using a datetime-stamped\nfilename so that multiple snapshots can coexist.\n\nFilename format::\n\n ___.joblib\n\nIf ``forecasters`` is ``None`` the method collects fitted models\nfrom ``self.results[task_name]``, where each prediction package is\nexpected to contain a ``\"forecaster\"`` key.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| task_name | [str](`str`) | Task identifier (``\"lazy\"``, ``\"defaults\"``). The names ``\"optuna\"`` and ``\"spotoptim\"`` are also accepted so that model caches produced by the ``spotforecast2`` sibling package can be saved and loaded; no tuning is performed in this package. | _required_ |\n| forecasters | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Optional mapping ``{target: fitted_forecaster}``. When ``None``, models are taken from the prediction packages stored in ``self.results``. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------|-------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Path](`pathlib.Path`)\\] | Mapping ``{target: Path}`` of saved model file paths. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------------------------------------------------|\n| | [ValueError](`ValueError`) | If ``task_name`` is not one of ``\"lazy\"``, ``\"defaults\"``, ``\"optuna\"``, ``\"spotoptim\"``. |\n| | [RuntimeError](`RuntimeError`) | If no fitted models are available for the requested task. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#8c650908 .cell execution_count=14}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n data_frame_name=\"demo\",\n cache_home=Path(tmp),\n verbose=False,\n )\n task = LazyTask(cfg)\n # Supply a tiny in-memory object as a stand-in for a fitted forecaster.\n dummy_forecaster = object()\n saved = task.save_models(\n task_name=\"lazy\",\n forecasters={\"load\": dummy_forecaster},\n )\n print(f\"Saved targets: {list(saved.keys())}\")\n assert saved[\"load\"].suffix == \".joblib\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nSaved targets: ['load']\n```\n:::\n:::\n\n\n### save_tuning_results { #spotforecast2_safe.multitask.base.BaseTask.save_tuning_results }\n\n```python\nmultitask.base.BaseTask.save_tuning_results(\n target,\n task_name,\n best_params,\n best_lags,\n)\n```\n\nSave tuning results (best parameters and lags) to a JSON file.\n\nThe file is stored under ``/tuning_results/`` with a\ndatetime-stamped filename so that loaders can determine freshness.\n\nFilename format::\n\n ___.json\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|------------------------------------------------------------|-------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the forecast target column. | _required_ |\n| task_name | [str](`str`) | Tuning algorithm identifier (e.g. ``\"optuna\"``, ``\"spotoptim\"``). | _required_ |\n| best_params | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Best hyperparameters discovered during tuning. | _required_ |\n| best_lags | [Any](`typing.Any`) | Best lag configuration (int, list, or nested list). | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------|------------------------------|\n| | [Path](`pathlib.Path`) | Path to the saved JSON file. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#d5d26f68 .cell execution_count=15}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(data_frame_name=\"demo10\", cache_home=Path(tmp))\n task = LazyTask(cfg)\n path = task.save_tuning_results(\n target=\"target_0\",\n task_name=\"optuna\",\n best_params={\"n_estimators\": 100, \"learning_rate\": 0.05},\n best_lags=[1, 2, 24],\n )\n print(path.name[:10])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ndemo10_tar\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: multitask.base.BaseTask\n---\n\n\n\n```python\nmultitask.base.BaseTask(\n config=None,\n *,\n dataframe=None,\n data_test=None,\n cache_home=None,\n log_level=logging.INFO,\n **overrides,\n)\n```\n\nShared base for all multi-target forecasting pipeline tasks.\n\n``BaseTask`` encapsulates the data-preparation pipeline (steps 1-7)\nand all helper methods shared across the task modes (lazy, defaults,\npredict, clean). Subclasses implement the run method with task-specific\ntraining or prediction logic.\n\nThe constructor takes a single ``config`` object satisfying the\n``PipelineConfig`` protocol — typically a ``ConfigMulti``. All pipeline\nparameters (forecast horizon, training window, outlier policy, weather/\nholiday hooks, cross-validation fold count, persistence policy, ...) live\non that object. Only genuinely call-time state (the dataframes, the cache\ndirectory override, the logging level) is passed as separate kwargs.\nExtra ``**overrides`` are forwarded to ``config.set_params`` and mutate\nthe passed-in config in place.\n\nPlotting is not available in ``spotforecast2-safe``. The\n``_show_prediction_figure`` and ``_show_prediction_figure_agg`` hook\nmethods are no-ops; override them in a subclass or use the\n``spotforecast2`` sibling package for interactive visualisation.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|-------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|\n| config | [Optional](`typing.Optional`)\\[[PipelineConfig](`spotforecast2_safe.multitask.base.PipelineConfig`)\\] | A ``PipelineConfig``-conforming object owning every pipeline parameter. ``ConfigMulti`` satisfies the protocol. | `None` |\n| dataframe | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded input DataFrame with training data. Must contain a datetime column matching ``config.index_name`` plus at least one numeric target column. | `None` |\n| data_test | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded test DataFrame (ground truth for the forecast horizon). Optional. | `None` |\n| cache_home | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Cache directory override. When not ``None``, replaces ``config.cache_home`` for this task instance. | `None` |\n| log_level | [int](`int`) | Logging level for the pipeline logger. | `logging.INFO` |\n| **overrides | [Any](`typing.Any`) | Forwarded to ``config.set_params(**overrides)`` — a convenience for one-line tweaks without building a fresh config. Mutates the caller's config object. | `{}` |\n\n## Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|--------------------|----------------------------------------------------------------------|--------------------------------------------------------|\n| config | [PipelineConfig](`spotforecast2_safe.multitask.base.PipelineConfig`) | Centralised pipeline configuration. |\n| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Pipeline DataFrame after preparation. |\n| df_test | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Test DataFrame (ground truth). |\n| weight_func | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Sample-weight function from imputation. |\n| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Combined exogenous feature matrix. |\n| exog_feature_names | [List](`typing.List`)\\[[str](`str`)\\] | Selected exogenous feature names. |\n| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Merged target + exogenous data. |\n| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates for the forecast horizon. |\n| results | [Dict](`typing.Dict`)\\[[str](`str`), [Dict](`typing.Dict`)\\] | Per-task mapping of target name to prediction package. |\n| agg_results | [Dict](`typing.Dict`) | Mapping of task name to aggregated prediction package. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#dcc57822 .cell execution_count=1}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask.base import BaseTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 7, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"load\": rng.normal(500, 30, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n task = BaseTask(cfg, dataframe=df)\n print(f\"Task mode: {task.TASK}\")\n print(f\"Config predict_size: {task.config.predict_size}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nTask mode: lazy\nConfig predict_size: 6\n```\n:::\n:::\n\n\n## Methods\n\n| Name | Description |\n| --- | --- |\n| [agg_predictor](#spotforecast2_safe.multitask.base.BaseTask.agg_predictor) | Aggregate per-target prediction packages into a weighted forecast. |\n| [build_exogenous_features](#spotforecast2_safe.multitask.base.BaseTask.build_exogenous_features) | Build, combine, encode, and merge exogenous feature covariates. |\n| [create_forecaster](#spotforecast2_safe.multitask.base.BaseTask.create_forecaster) | Create a fresh forecaster for the given target. |\n| [cv_ts](#spotforecast2_safe.multitask.base.BaseTask.cv_ts) | Build a ``TimeSeriesFold`` for cross-validation. |\n| [detect_outliers](#spotforecast2_safe.multitask.base.BaseTask.detect_outliers) | Apply hard-bound filtering and IsolationForest outlier detection. |\n| [impute](#spotforecast2_safe.multitask.base.BaseTask.impute) | Fill missing values using the configured imputation strategy. |\n| [load_models](#spotforecast2_safe.multitask.base.BaseTask.load_models) | Load the most recent fitted models from the cache directory. |\n| [load_tuning_results](#spotforecast2_safe.multitask.base.BaseTask.load_tuning_results) | Load the most recent tuning results for a target from cache. |\n| [log_summary](#spotforecast2_safe.multitask.base.BaseTask.log_summary) | Log a summary of the current pipeline configuration. |\n| [plot_with_outliers](#spotforecast2_safe.multitask.base.BaseTask.plot_with_outliers) | Visualise original vs. cleaned data with outlier markers. |\n| [prepare_data](#spotforecast2_safe.multitask.base.BaseTask.prepare_data) | Load, resample, validate, and configure the pipeline data. |\n| [run](#spotforecast2_safe.multitask.base.BaseTask.run) | Execute the task-specific training / prediction pipeline. |\n| [save_models](#spotforecast2_safe.multitask.base.BaseTask.save_models) | Save fitted forecaster models to the cache directory. |\n| [save_tuning_results](#spotforecast2_safe.multitask.base.BaseTask.save_tuning_results) | Save tuning results (best parameters and lags) to a JSON file. |\n\n### agg_predictor { #spotforecast2_safe.multitask.base.BaseTask.agg_predictor }\n\n```python\nmultitask.base.BaseTask.agg_predictor(results, targets, weights)\n```\n\nAggregate per-target prediction packages into a weighted forecast.\n\nDelegates to the module-level ``agg_predictor`` function.\nAvailable as an instance method so that subclasses can override the\naggregation strategy when needed.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|---------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|------------|\n| results | [Dict](`typing.Dict`)\\[[str](`str`), [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Mapping of target name to prediction package (as returned by ``build_prediction_package``). | _required_ |\n| targets | [List](`typing.List`)\\[[str](`str`)\\] | Ordered list of target names to include. | _required_ |\n| weights | [List](`typing.List`)\\[[float](`float`)\\] | Per-target aggregation weights aligned with ``targets``. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|-------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Aggregated prediction package dict. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#1821160d .cell execution_count=2}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx_train = pd.date_range(\"2023-01-01\", periods=48, freq=\"h\", tz=\"UTC\")\nidx_future = pd.date_range(\"2023-01-03\", periods=6, freq=\"h\", tz=\"UTC\")\n\ndef _pkg(train_val, future_val):\n return {\n \"train_actual\": pd.Series(np.full(48, train_val), index=idx_train),\n \"train_pred\": pd.Series(np.full(48, train_val * 0.99), index=idx_train),\n \"future_pred\": pd.Series(np.full(6, future_val), index=idx_future),\n \"future_actual\": pd.Series(dtype=\"float64\"),\n }\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(cache_home=tmp, verbose=False)\n task = LazyTask(cfg)\n results = {\"wind\": _pkg(100.0, 110.0), \"solar\": _pkg(200.0, 210.0)}\n agg = task.agg_predictor(results, [\"wind\", \"solar\"], [0.4, 0.6])\n print(f\"Weighted future_pred: {agg['future_pred'].iloc[0]:.1f}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nWeighted future_pred: 170.0\n```\n:::\n:::\n\n\n### build_exogenous_features { #spotforecast2_safe.multitask.base.BaseTask.build_exogenous_features }\n\n```python\nmultitask.base.BaseTask.build_exogenous_features()\n```\n\nBuild, combine, encode, and merge exogenous feature covariates.\n\nThis is step 4-7 of the pipeline (run after ``prepare_data``,\n``detect_outliers``, and ``impute``). It assembles the full\nexogenous-covariate matrix that the forecaster consumes, then merges\nit onto the target data. The orchestration proceeds in order:\n\n* 4a — Weather, via ``get_weather_features`` (Open-Meteo). The\n response is parquet-cached only when ``config.cache_home`` is set.\n Fetch failures are handled per ``config.on_weather_failure``:\n ``\"raise\"`` re-raises ``WeatherFetchError``; ``\"skip\"`` logs a\n warning and continues with an empty weather frame (fail-safe).\n* 4b — Calendar features, via ``get_calendar_features``.\n* 4c — Day/night (solar) features, via ``get_day_night_features``\n (computed with ``astral`` from ``config.latitude`` /\n ``config.longitude``).\n* 4d — Holiday features, via ``get_holiday_features`` for\n ``config.country_code`` / ``config.state``.\n* 5 — The four frames are concatenated along the columns and any\n residual gaps are back- then forward-filled. Provider-based\n exogenous columns are then appended via\n ``build_providers_from_config`` (requires ``spotforecast2-safe``\n >= 15.7.0). The active providers are governed by the config flags\n ``include_covid_infection_rate``,\n ``include_entsoe_forecast_load``,\n ``include_entsoe_renewable_forecast``,\n ``include_entsoe_net_load``, and\n ``include_entsoe_day_ahead_price``. Cyclical (sine/cosine)\n encoding is then applied via ``apply_cyclical_encoding``, and\n degree-``config.poly_features_degree`` interaction terms are added\n via ``create_interaction_features``. When the degree is at least\n 2, the polynomial columns are ranked by mutual information with the\n primary target and capped to ``config.max_poly_features`` via\n ``select_top_poly_features``.\n* 6 — The training feature set is chosen via\n ``select_exogenous_features``, with provider columns appended\n (order-preserving, de-duplicated).\n* 7 — Targets and covariates are merged via\n ``merge_data_and_covariates`` into ``self.data_with_exog`` and the\n forecast-horizon covariates ``self.exo_pred``.\n\nWhen ``config.use_exogenous_features`` is ``False`` the method is a\nno-op and returns ``self`` immediately, leaving the pipeline\ntarget-only.\n\n#### Attributes {.doc-section .doc-section-attributes}\n\n| Name | Type | Description |\n|----------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|\n| weather_aligned | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Weather frame aligned to the pipeline index, reused by the interaction and selection steps. |\n| zone_weather_aligned | [Dict](`typing.Dict`)\\[[str](`str`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Per-zone weather frames keyed by target name, indexed over ``[data_start, cov_end]`` (covering the forecast horizon). Populated only when ``config.per_zone_weather`` is True and every zone fetch succeeded; empty otherwise (including the fail-safe \"skip\" degradation). Consumed at the per-target seam in ``_get_target_data`` to overwrite the shared weather columns. |\n| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Full combined, encoded, and capped exogenous feature matrix. |\n| exog_feature_names | [List](`typing.List`)\\[[str](`str`)\\] | Names of the exogenous features selected for training (including provider columns). |\n| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Target data merged with the selected exogenous covariates. |\n| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates spanning the forecast horizon, supplied to the forecaster at predict time. |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------|-----------------------------------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If ``prepare_data`` has not been called. |\n| | [WeatherFetchError](`spotforecast2_safe.weather.WeatherFetchError`) | If the Open-Meteo fetch fails and ``config.on_weather_failure == \"raise\"``. |\n\n#### Examples {.doc-section .doc-section-examples}\n\nWith exogenous features disabled the method is a no-op, so the\nexample below runs without any network access and leaves the\npipeline target-only.\n\n::: {#10875b84 .cell execution_count=3}\n``` {.python .cell-code}\nimport tempfile\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n print(f\"Exogenous features used: {mt.config.use_exogenous_features}\")\n print(f\"Selected exog feature names: {mt.exog_feature_names}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nExogenous features used: False\nSelected exog feature names: []\n```\n:::\n:::\n\n\n### create_forecaster { #spotforecast2_safe.multitask.base.BaseTask.create_forecaster }\n\n```python\nmultitask.base.BaseTask.create_forecaster(target=None)\n```\n\nCreate a fresh forecaster for the given target.\n\nDelegates to ``config.forecaster_factory`` when set; otherwise falls\nback to ``default_lgbm_forecaster_factory``. This factory hook lets\ncallers swap the estimator without subclassing ``BaseTask``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|-----------------------------------------------|------------------------------------------------------------------------------------------------------------|-----------|\n| target | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Optional target column name. Forwarded to the factory so that custom factories can specialise per target. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------|--------------------------------------|\n| | [Any](`typing.Any`) | A new, unfitted forecaster instance. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#4b5b2fa9 .cell execution_count=4}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n cache_home=Path(tmp),\n )\n task = LazyTask(cfg)\n forecaster = task.create_forecaster()\nprint(f\"Type: {type(forecaster).__name__}\")\nprint(f\"Lags: {forecaster.lags}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nType: ForecasterRecursive\nLags: [ 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]\n```\n:::\n:::\n\n\n### cv_ts { #spotforecast2_safe.multitask.base.BaseTask.cv_ts }\n\n```python\nmultitask.base.BaseTask.cv_ts(y_train)\n```\n\nBuild a ``TimeSeriesFold`` for cross-validation.\n\nConstructs the cross-validation splitter used by all tuning tasks.\nInternally uses ``sklearn.model_selection.TimeSeriesSplit`` to\ncompute split boundaries that respect temporal ordering and avoid\ndata leakage between folds.\n\nThe validation boundary is determined by ``run_state.end_train_ts`` minus\n``config.delta_val``. When ``config.train_size`` is set, the sklearn\nsplitter uses a sliding fixed-size training window\n(``max_train_size``); otherwise an expanding window is used.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|---------|------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| y_train | [pd](`pandas`).[Series](`pandas.Series`) | Training time series for the current target. Used both to determine the validation boundary and as the sequence passed to ``TimeSeriesSplit.split`` to derive ``initial_train_size``. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------------------------|----------------------------------------------------------------|\n| | [TimeSeriesFold](`spotforecast2_safe.splitter.split_ts_cv.TimeSeriesFold`) | A configured ``TimeSeriesFold`` instance ready to be passed to |\n| | [TimeSeriesFold](`spotforecast2_safe.splitter.split_ts_cv.TimeSeriesFold`) | a model-selection function. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#78d1a947 .cell execution_count=5}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n number_folds=2,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n y_train = mt.df_pipeline[\"a\"]\n cv = mt.cv_ts(y_train)\n print(f\"TimeSeriesFold steps: {cv.steps}\")\n print(f\"initial_train_size: {cv.initial_train_size}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nTimeSeriesFold steps: 6\ninitial_train_size: 324\n```\n:::\n:::\n\n\n### detect_outliers { #spotforecast2_safe.multitask.base.BaseTask.detect_outliers }\n\n```python\nmultitask.base.BaseTask.detect_outliers()\n```\n\nApply hard-bound filtering and IsolationForest outlier detection.\n\nHard bounds from ``config.bounds`` are applied to the pipeline data\n(out-of-bound values are removed and later filled by ``impute()``).\nIsolationForest detection (``config.use_outlier_detection``) is\nadvisory: detected outliers are logged per column but not removed.\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``prepare_data`` has not been called. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#6300da97 .cell execution_count=6}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data()\n mt.detect_outliers()\n print(f\"Pipeline shape: {mt.df_pipeline.shape}\")\n assert mt.df_pipeline_original is not None\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPipeline shape: (336, 1)\n```\n:::\n:::\n\n\n### impute { #spotforecast2_safe.multitask.base.BaseTask.impute }\n\n```python\nmultitask.base.BaseTask.impute()\n```\n\nFill missing values using the configured imputation strategy.\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``prepare_data`` has not been called. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#eb83ce5c .cell execution_count=7}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\nvalues = rng.normal(100, 10, len(idx))\nvalues[10:13] = float(\"nan\") # inject a few gaps\ndf = pd.DataFrame({\"a\": values}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute()\n missing = mt.df_pipeline[\"a\"].isna().sum()\n print(f\"Missing values after imputation: {missing}\")\n assert missing == 0\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nMissing values after imputation: 0\n```\n:::\n:::\n\n\n### load_models { #spotforecast2_safe.multitask.base.BaseTask.load_models }\n\n```python\nmultitask.base.BaseTask.load_models(\n task_name=None,\n target=None,\n max_age_days=None,\n)\n```\n\nLoad the most recent fitted models from the cache directory.\n\nScans ``/models//`` for ``.joblib``\nfiles matching the current ``data_frame_name``. Optionally\nfilters by ``task_name``, ``target``, and ``max_age_days``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|---------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only load models from this task (``\"lazy\"``, ``\"defaults\"``, ``\"optuna\"``, or ``\"spotoptim\"``). ``None`` accepts any task. | `None` |\n| target | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only load the model for this target column. ``None`` loads the most recent model for every target found. | `None` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days. Models older than this are ignored. ``None`` accepts any age. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|-----------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Mapping ``{target: forecaster}`` of loaded model objects. |\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Empty dict if no matching models were found. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#9256ac0c .cell execution_count=8}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n data_frame_name=\"demo\",\n cache_home=Path(tmp),\n verbose=False,\n )\n task = LazyTask(cfg)\n # Save a dummy object, then load it back.\n dummy_forecaster = {\"lags\": [1, 2, 24]}\n task.save_models(\n task_name=\"lazy\",\n forecasters={\"load\": dummy_forecaster},\n )\n loaded = task.load_models(task_name=\"lazy\")\n print(f\"Loaded targets: {list(loaded.keys())}\")\n assert loaded[\"load\"][\"lags\"] == [1, 2, 24]\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nLoaded targets: ['load']\n```\n:::\n:::\n\n\n### load_tuning_results { #spotforecast2_safe.multitask.base.BaseTask.load_tuning_results }\n\n```python\nmultitask.base.BaseTask.load_tuning_results(\n target,\n task_name=None,\n max_age_days=None,\n)\n```\n\nLoad the most recent tuning results for a target from cache.\n\nScans ``/tuning_results/`` for files matching the\ncurrent ``data_frame_name`` and ``target``. Optionally filters by\n``task_name`` and discards results older than ``max_age_days``.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------------|---------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the forecast target column. | _required_ |\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | If given, only consider results from this tuning algorithm (e.g. ``\"optuna\"`` or ``\"spotoptim\"``). ``None`` accepts any algorithm. | `None` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days. Results older than this are ignored. ``None`` accepts any age. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------|\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | A dictionary with keys ``best_params``, ``best_lags``, |\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | ``task_name``, ``target``, ``data_frame_name``, and |\n| | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | ``timestamp``; or ``None`` if no matching file was found. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#e97f461a .cell execution_count=9}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(data_frame_name=\"demo10\", cache_home=Path(tmp))\n task = LazyTask(cfg)\n task.save_tuning_results(\n target=\"target_0\",\n task_name=\"optuna\",\n best_params={\"n_estimators\": 100},\n best_lags=24,\n )\n result = task.load_tuning_results(target=\"target_0\")\n print(result[\"best_params\"])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n{'n_estimators': 100}\n```\n:::\n:::\n\n\n### log_summary { #spotforecast2_safe.multitask.base.BaseTask.log_summary }\n\n```python\nmultitask.base.BaseTask.log_summary()\n```\n\nLog a summary of the current pipeline configuration.\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#32177e21 .cell execution_count=10}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers().impute().build_exogenous_features()\n # log_summary writes to the pipeline logger; call it to confirm\n # it runs without error.\n mt.log_summary()\n print(\"log_summary completed without error\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nlog_summary completed without error\n```\n:::\n:::\n\n\n### plot_with_outliers { #spotforecast2_safe.multitask.base.BaseTask.plot_with_outliers }\n\n```python\nmultitask.base.BaseTask.plot_with_outliers()\n```\n\nVisualise original vs. cleaned data with outlier markers.\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------------------------|---------------------------------------------------------------------------------------------------------------------|\n| | [RuntimeError](`RuntimeError`) | If method ``detect_outliers`` has not been called. |\n| | [NotImplementedError](`NotImplementedError`) | Always — plotting is not available in ``spotforecast2-safe``. Use the ``spotforecast2`` package for visualisation. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#7652c30f .cell execution_count=11}\n``` {.python .cell-code}\nimport tempfile\nimport numpy as np\nimport pandas as pd\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n auto_save_models=False,\n verbose=False,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data().detect_outliers()\n try:\n mt.plot_with_outliers()\n except NotImplementedError as exc:\n print(f\"Plotting unavailable in spotforecast2-safe: {exc}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPlotting unavailable in spotforecast2-safe: Plotting is not available in spotforecast2-safe (no plotly/matplotlib). Use the spotforecast2 package for visualisation.\n```\n:::\n:::\n\n\n### prepare_data { #spotforecast2_safe.multitask.base.BaseTask.prepare_data }\n\n```python\nmultitask.base.BaseTask.prepare_data(demo_data=None, df_test=None)\n```\n\nLoad, resample, validate, and configure the pipeline data.\n\nUses the following precedence for the training data:\n\n1. ``demo_data`` argument (if provided).\n2. ``self._dataframe`` set via the constructor.\n\nSimilarly for test data:\n\n1. ``df_test`` argument (if provided).\n2. ``self.data_test`` set via the constructor.\n3. ``self.config.test_data_loader(self.config)`` if set.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|---------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------|-----------|\n| demo_data | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded input DataFrame. When ``None``, the constructor ``dataframe`` is used. | `None` |\n| df_test | [Optional](`typing.Optional`)\\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\\] | Pre-loaded test DataFrame. When ``None``, the constructor ``data_test`` is used, then ``config.test_data_loader``. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|----------------------------------------------------------|---------------------------------|\n| | [BaseTask](`spotforecast2_safe.multitask.base.BaseTask`) | ``self`` (for method chaining). |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------|----------------------------------------------------------------------------------|\n| | [ValueError](`ValueError`) | If no data source is available (no ``demo_data``, no constructor ``dataframe``). |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#a2bca32b .cell execution_count=12}\n``` {.python .cell-code}\nimport tempfile\nimport pandas as pd\nimport numpy as np\nfrom spotforecast2_safe.multitask import MultiTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nrng = np.random.default_rng(0)\nidx = pd.date_range(\"2023-01-01\", periods=24 * 14, freq=\"h\", tz=\"UTC\")\ndf = pd.DataFrame({\"a\": rng.normal(100, 10, len(idx))}, index=idx)\ndf.index.name = \"DateTime\"\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n predict_size=6,\n use_exogenous_features=False,\n use_outlier_detection=False,\n cache_home=tmp,\n )\n mt = MultiTask(cfg, dataframe=df)\n mt.prepare_data()\n print(f\"Pipeline shape: {mt.df_pipeline.shape}\")\n print(f\"Targets: {mt.run_state.targets}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nPipeline shape: (336, 1)\nTargets: ['a']\n```\n:::\n:::\n\n\n### run { #spotforecast2_safe.multitask.base.BaseTask.run }\n\n```python\nmultitask.base.BaseTask.run(\n show=False,\n task=None,\n task_name=None,\n use_tuned_params=True,\n max_age_days=None,\n search_space=None,\n dry_run=False,\n cache_home=None,\n **kwargs,\n)\n```\n\nExecute the task-specific training / prediction pipeline.\n\nSubclasses must override this method.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------|---------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------|\n| show | [bool](`bool`) | If ``True``, invoke the visualisation hooks (no-ops in this package; meaningful only in ``spotforecast2``). | `False` |\n| task | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Task mode override (used by ``MultiTask``). | `None` |\n| task_name | [Optional](`typing.Optional`)\\[[str](`str`)\\] | Restrict model loading to a specific source task (used by ``PredictTask``). | `None` |\n| use_tuned_params | [bool](`bool`) | Load cached tuning results when available (used by ``LazyTask``). | `True` |\n| max_age_days | [Optional](`typing.Optional`)\\[[float](`float`)\\] | Maximum age in days for cached results (used by ``LazyTask`` and ``PredictTask``). Freshness is judged against the wall-clock timestamp embedded in the cache filename, so the check is machine-local. | `None` |\n| search_space | [Optional](`typing.Optional`)\\[[Any](`typing.Any`)\\] | Hyperparameter search-space definition (accepted for API compatibility; not used in this package). | `None` |\n| dry_run | [bool](`bool`) | Report what would be deleted without removing anything (used by ``CleanTask``). | `False` |\n| cache_home | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Override the cache directory (used by ``CleanTask``). | `None` |\n| **kwargs | [Any](`typing.Any`) | Additional task-specific arguments. | `{}` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------------------|---------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Aggregated prediction package for the task. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|----------------------------------------------|------------------------------------------|\n| | [NotImplementedError](`NotImplementedError`) | Always, unless overridden by a subclass. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#1f7f06a2 .cell execution_count=13}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask.base import BaseTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\n# BaseTask.run is abstract and always raises NotImplementedError.\n# Concrete subclasses (LazyTask, DefaultsTask, PredictTask, CleanTask)\n# provide the real implementation.\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(cache_home=Path(tmp), verbose=False)\n task = BaseTask(cfg)\n try:\n task.run()\n except NotImplementedError as exc:\n print(f\"Expected: {exc}\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nExpected: BaseTask must implement run(). Use LazyTask, DefaultsTask, PredictTask, or CleanTask.\n```\n:::\n:::\n\n\n### save_models { #spotforecast2_safe.multitask.base.BaseTask.save_models }\n\n```python\nmultitask.base.BaseTask.save_models(task_name, forecasters=None)\n```\n\nSave fitted forecaster models to the cache directory.\n\nEach model is serialised with ``joblib`` (compress=3) into\n``/models//`` using a datetime-stamped\nfilename so that multiple snapshots can coexist.\n\nFilename format::\n\n ___.joblib\n\nIf ``forecasters`` is ``None`` the method collects fitted models\nfrom ``self.results[task_name]``, where each prediction package is\nexpected to contain a ``\"forecaster\"`` key.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|---------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| task_name | [str](`str`) | Task identifier (``\"lazy\"``, ``\"defaults\"``). The names ``\"optuna\"`` and ``\"spotoptim\"`` are also accepted so that model caches produced by the ``spotforecast2`` sibling package can be saved and loaded; no tuning is performed in this package. | _required_ |\n| forecasters | [Optional](`typing.Optional`)\\[[Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\]\\] | Optional mapping ``{target: fitted_forecaster}``. When ``None``, models are taken from the prediction packages stored in ``self.results``. | `None` |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------|-------------------------------------------------------|\n| | [Dict](`typing.Dict`)\\[[str](`str`), [Path](`pathlib.Path`)\\] | Mapping ``{target: Path}`` of saved model file paths. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|--------------------------------|-------------------------------------------------------------------------------------------|\n| | [ValueError](`ValueError`) | If ``task_name`` is not one of ``\"lazy\"``, ``\"defaults\"``, ``\"optuna\"``, ``\"spotoptim\"``. |\n| | [RuntimeError](`RuntimeError`) | If no fitted models are available for the requested task. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#5d55fc67 .cell execution_count=14}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(\n data_frame_name=\"demo\",\n cache_home=Path(tmp),\n verbose=False,\n )\n task = LazyTask(cfg)\n # Supply a tiny in-memory object as a stand-in for a fitted forecaster.\n dummy_forecaster = object()\n saved = task.save_models(\n task_name=\"lazy\",\n forecasters={\"load\": dummy_forecaster},\n )\n print(f\"Saved targets: {list(saved.keys())}\")\n assert saved[\"load\"].suffix == \".joblib\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nSaved targets: ['load']\n```\n:::\n:::\n\n\n### save_tuning_results { #spotforecast2_safe.multitask.base.BaseTask.save_tuning_results }\n\n```python\nmultitask.base.BaseTask.save_tuning_results(\n target,\n task_name,\n best_params,\n best_lags,\n)\n```\n\nSave tuning results (best parameters and lags) to a JSON file.\n\nThe file is stored under ``/tuning_results/`` with a\ndatetime-stamped filename so that loaders can determine freshness.\n\nFilename format::\n\n ___.json\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-------------|------------------------------------------------------------|-------------------------------------------------------------------|------------|\n| target | [str](`str`) | Name of the forecast target column. | _required_ |\n| task_name | [str](`str`) | Tuning algorithm identifier (e.g. ``\"optuna\"``, ``\"spotoptim\"``). | _required_ |\n| best_params | [Dict](`typing.Dict`)\\[[str](`str`), [Any](`typing.Any`)\\] | Best hyperparameters discovered during tuning. | _required_ |\n| best_lags | [Any](`typing.Any`) | Best lag configuration (int, list, or nested list). | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------|------------------------------|\n| | [Path](`pathlib.Path`) | Path to the saved JSON file. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#947dfcff .cell execution_count=15}\n``` {.python .cell-code}\nimport tempfile\nfrom pathlib import Path\nfrom spotforecast2_safe.multitask import LazyTask\nfrom spotforecast2_safe.configurator.config_multi import ConfigMulti\n\nwith tempfile.TemporaryDirectory() as tmp:\n cfg = ConfigMulti(data_frame_name=\"demo10\", cache_home=Path(tmp))\n task = LazyTask(cfg)\n path = task.save_tuning_results(\n target=\"target_0\",\n task_name=\"optuna\",\n best_params={\"n_estimators\": 100, \"learning_rate\": 0.05},\n best_lags=[1, 2, 24],\n )\n print(path.name[:10])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\ndemo10_tar\n```\n:::\n:::\n\n\n", "supporting": [ "multitask.base.BaseTask_files/figure-html" ], diff --git a/_freeze/docs/reference/tasks.task_safe_zone_load_demo/execute-results/html.json b/_freeze/docs/reference/tasks.task_safe_zone_load_demo/execute-results/html.json index 3a89fc99..78d18fc0 100644 --- a/_freeze/docs/reference/tasks.task_safe_zone_load_demo/execute-results/html.json +++ b/_freeze/docs/reference/tasks.task_safe_zone_load_demo/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "159e873790f90863c1e2801407e7e78f", + "hash": "dce72f4bc657c3b33bd38711152181ed", "result": { "engine": "jupyter", - "markdown": "---\ntitle: tasks.task_safe_zone_load_demo\n---\n\n\n\n`tasks.task_safe_zone_load_demo`\n\nTask demo: four-zone bottom-up total-load forecast vs. a direct aggregate.\n\nThis task demonstrates the four-German-TSO-zone forecasting design: fit one\nmodel per control-zone load series (Amprion, TenneT, TransnetBW, 50Hertz) and\nsum the four forecasts into the total-German-load forecast (bottom-up\naggregation), then compare it against a single model trained on the aggregate\nload directly. The bottom-up combination reuses the existing multi-target\nmachinery -- the four zones become `ConfigMulti.targets` and the per-target\nforecasts are summed via the `MultiTask` aggregation with\n``agg_weights=[1.0, ...]``.\n\nData source. With ``data_path`` pointing at the assembled four-column interim\nfile (see `spotforecast2_safe.downloader.entsoe.assemble_zone_loads`), the task\nruns on real ENTSO-E control-zone load. With ``data_path=None`` (the default) it\nbuilds a deterministic, seeded synthetic four-zone dataset so the demo and its\nsmoke test run offline without an ENTSO-E API key.\n\nThe comparison uses the project's own backtesting engine\n(`backtesting_forecaster` + `TimeSeriesFold`) over a 24-hour horizon and reports\nMAE / RMSE / MAPE for both methods.\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#c93d1256 .cell execution_count=1}\n``` {.python .cell-code}\n# Shell commands; they cannot run inside a Python kernel.\n# Synthetic offline demo (no API key needed):\n# uv run spotforecast-safe-zone-load-demo\n# Run on assembled real ENTSO-E zone data:\n# uv run spotforecast-safe-zone-load-demo # --data_path ~/spotforecast2_data/interim/energy_load_zones.csv\n```\n:::\n\n\n## Functions\n\n| Name | Description |\n| --- | --- |\n| [main](#spotforecast2_safe.tasks.task_safe_zone_load_demo.main) | Run the four-zone bottom-up load demo. Returns 0 on success, 1 on failure. |\n\n### main { #spotforecast2_safe.tasks.task_safe_zone_load_demo.main }\n\n```python\ntasks.task_safe_zone_load_demo.main(\n synthetic=True,\n data_path=None,\n predict_size=24,\n history_hours=24 * 100,\n random_seed=314159,\n logging_enabled=False,\n)\n```\n\nRun the four-zone bottom-up load demo. Returns 0 on success, 1 on failure.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------------|---------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------|\n| synthetic | [bool](`bool`) | If True (and ``data_path`` is None), use seeded synthetic data. Ignored when ``data_path`` is provided. | `True` |\n| data_path | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Path to an assembled four-column zone-load CSV. When given, real data is used and ``synthetic`` is ignored. | `None` |\n| predict_size | [int](`int`) | Forecast horizon in hours (also the backtest fold size). | `24` |\n| history_hours | [int](`int`) | Length of the synthetic series in hours. | `24 * 100` |\n| random_seed | [int](`int`) | Seed for the synthetic data and the estimators. | `314159` |\n| logging_enabled | [bool](`bool`) | If True, attach the dual console/file handlers. | `False` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#96f49f04 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.tasks.task_safe_zone_load_demo import main\n\n# Fail-fast: a non-existent data_path returns 1 without computing.\nfrom pathlib import Path\nrc = main(data_path=Path(\"/nonexistent/zones.csv\"))\nprint(f\"Return code (missing data): {rc}\")\nassert rc == 1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nReturn code (missing data): 1\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: tasks.task_safe_zone_load_demo\n---\n\n\n\n`tasks.task_safe_zone_load_demo`\n\nTask demo: four-zone bottom-up total-load forecast vs. a direct aggregate.\n\nThis task demonstrates the four-German-TSO-zone forecasting design: fit one\nmodel per control-zone load series (Amprion, TenneT, TransnetBW, 50Hertz) and\nsum the four forecasts into the total-German-load forecast (bottom-up\naggregation), then compare it against a single model trained on the aggregate\nload directly. The bottom-up combination reuses the existing multi-target\nmachinery -- the four zones become `ConfigMulti.targets` and the per-target\nforecasts are summed via the `MultiTask` aggregation with\n``agg_weights=[1.0, ...]``.\n\nData source. With ``data_path`` pointing at the assembled four-column interim\nfile (see `spotforecast2_safe.downloader.entsoe.assemble_zone_loads`), the task\nruns on real ENTSO-E control-zone load. With ``data_path=None`` (the default) it\nbuilds a deterministic, seeded synthetic four-zone dataset so the demo and its\nsmoke test run offline without an ENTSO-E API key.\n\nThe comparison uses the project's own backtesting engine\n(`backtesting_forecaster` + `TimeSeriesFold`) over a 24-hour horizon and reports\nMAE / RMSE / MAPE for both methods.\n\nPer-zone weather (``--per_zone_weather``). When enabled, each zone model\nreceives weather from its own TSO control-area cities instead of the shared\nsingle-point baseline. The pipeline uses ``on_weather_failure=\"skip\"`` so the\nsmoke test stays network-free: if Open-Meteo is unreachable the per-zone\nweather degrades to no-weather rather than failing. The default path\n(``per_zone_weather=False``) is exactly the pre-feature baseline\n(``use_exogenous_features=False``).\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#74b8da34 .cell execution_count=1}\n``` {.python .cell-code}\n# Shell commands; they cannot run inside a Python kernel.\n# Synthetic offline demo (no API key needed):\n# uv run spotforecast-safe-zone-load-demo\n# Run on assembled real ENTSO-E zone data:\n# uv run spotforecast-safe-zone-load-demo # --data_path ~/spotforecast2_data/interim/energy_load_zones.csv\n# Enable per-zone weather (requires Open-Meteo access; degrades gracefully):\n# uv run spotforecast-safe-zone-load-demo --per_zone_weather true\n```\n:::\n\n\n## Functions\n\n| Name | Description |\n| --- | --- |\n| [main](#spotforecast2_safe.tasks.task_safe_zone_load_demo.main) | Run the four-zone bottom-up load demo. Returns 0 on success, 1 on failure. |\n\n### main { #spotforecast2_safe.tasks.task_safe_zone_load_demo.main }\n\n```python\ntasks.task_safe_zone_load_demo.main(\n synthetic=True,\n data_path=None,\n predict_size=24,\n history_hours=24 * 100,\n random_seed=314159,\n logging_enabled=False,\n per_zone_weather=False,\n)\n```\n\nRun the four-zone bottom-up load demo. Returns 0 on success, 1 on failure.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|------------------|---------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| synthetic | [bool](`bool`) | If True (and ``data_path`` is None), use seeded synthetic data. Ignored when ``data_path`` is provided. | `True` |\n| data_path | [Optional](`typing.Optional`)\\[[Path](`pathlib.Path`)\\] | Path to an assembled four-column zone-load CSV. When given, real data is used and ``synthetic`` is ignored. | `None` |\n| predict_size | [int](`int`) | Forecast horizon in hours (also the backtest fold size). | `24` |\n| history_hours | [int](`int`) | Length of the synthetic series in hours. | `24 * 100` |\n| random_seed | [int](`int`) | Seed for the synthetic data and the estimators. | `314159` |\n| logging_enabled | [bool](`bool`) | If True, attach the dual console/file handlers. | `False` |\n| per_zone_weather | [bool](`bool`) | If True, each zone model fetches weather from its own TSO control-area cities (``config.per_zone_weather=True``, ``on_weather_failure=\"skip\"``). Requires Open-Meteo access; if unreachable the pipeline degrades to no-weather. Default ``False`` keeps the pipeline byte-identical to the pre-feature baseline. | `False` |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#7ec42c77 .cell execution_count=2}\n``` {.python .cell-code}\nfrom spotforecast2_safe.tasks.task_safe_zone_load_demo import main\n\n# Fail-fast: a non-existent data_path returns 1 without computing.\nfrom pathlib import Path\nrc = main(data_path=Path(\"/nonexistent/zones.csv\"))\nprint(f\"Return code (missing data): {rc}\")\nassert rc == 1\n```\n\n::: {.cell-output .cell-output-stdout}\n```\nReturn code (missing data): 1\n```\n:::\n:::\n\n\n", "supporting": [ - "tasks.task_safe_zone_load_demo_files" + "tasks.task_safe_zone_load_demo_files/figure-html" ], "filters": [], "includes": {} diff --git a/_freeze/docs/reference/weather.locations.default_german_locations/execute-results/html.json b/_freeze/docs/reference/weather.locations.default_german_locations/execute-results/html.json index 99a5d067..1c51b4f1 100644 --- a/_freeze/docs/reference/weather.locations.default_german_locations/execute-results/html.json +++ b/_freeze/docs/reference/weather.locations.default_german_locations/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "40ac041cee2bb8e00eb524ac169792fc", + "hash": "d170d8adb2a4854dca04d2f760de5ebd", "result": { "engine": "jupyter", - "markdown": "---\ntitle: weather.locations.default_german_locations\n---\n\n\n\n```python\nweather.locations.default_german_locations()\n```\n\nReturn the default population-weighted German load-centre registry.\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------|-----------------------------------------------------------------------|\n| | [WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`) | Tuple[WeatherLocation, ...]: :data:`GERMAN_LOAD_CENTERS`, in a fixed, |\n| | ... | deterministic order. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#357cb814 .cell execution_count=1}\n``` {.python .cell-code}\nfrom spotforecast2_safe.weather.locations import default_german_locations\n\nlocs = default_german_locations()\nprint(len(locs), locs[0].name)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n13 Berlin\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: weather.locations.default_german_locations\n---\n\n\n\n```python\nweather.locations.default_german_locations()\n```\n\nReturn the default population-weighted German load-centre registry.\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|---------------------------------------------------------------------------|-----------------------------------------------------------------|\n| | [WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`) | Tuple[WeatherLocation, ...]: `GERMAN_LOAD_CENTERS`, in a fixed, |\n| | ... | deterministic order. |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#7bfd1692 .cell execution_count=1}\n``` {.python .cell-code}\nfrom spotforecast2_safe.weather.locations import default_german_locations\n\nlocs = default_german_locations()\nprint(len(locs), locs[0].name)\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n15 Berlin\n```\n:::\n:::\n\n\n", "supporting": [ - "weather.locations.default_german_locations_files" + "weather.locations.default_german_locations_files/figure-html" ], "filters": [], "includes": {} diff --git a/_freeze/docs/reference/weather.locations.weights/execute-results/html.json b/_freeze/docs/reference/weather.locations.weights/execute-results/html.json index 6bc47fd1..1d592ac7 100644 --- a/_freeze/docs/reference/weather.locations.weights/execute-results/html.json +++ b/_freeze/docs/reference/weather.locations.weights/execute-results/html.json @@ -1,10 +1,10 @@ { - "hash": "e89d290894356c17d55e496701d1f95c", + "hash": "d6d27fb8e9b9c4216a1088f0429c6ae3", "result": { "engine": "jupyter", - "markdown": "---\ntitle: weather.locations.weights\n---\n\n\n\n```python\nweather.locations.weights(locations)\n```\n\nExtract the raw (un-normalised) weights in order.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|------------------------------------------------------------------------------------------------------------|------------------------|------------|\n| locations | [Sequence](`typing.Sequence`)\\[[WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`)\\] | The locations to read. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-------------------------------------------|-------------------------------------------------------------------------|\n| | [List](`typing.List`)\\[[float](`float`)\\] | List[float]: One weight per location, in the same order. Consumers |\n| | [List](`typing.List`)\\[[float](`float`)\\] | normalise these (e.g. |\n| | [List](`typing.List`)\\[[float](`float`)\\] | func:`spotforecast2_safe.weather.derived.population_weighted_average`). |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#0e01f7e4 .cell execution_count=1}\n``` {.python .cell-code}\nfrom spotforecast2_safe.weather.locations import (\n default_german_locations,\n weights,\n)\n\nw = weights(default_german_locations())\nprint(len(w), w[0])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n13 3677.0\n```\n:::\n:::\n\n\n", + "markdown": "---\ntitle: weather.locations.weights\n---\n\n\n\n```python\nweather.locations.weights(locations)\n```\n\nExtract the raw (un-normalised) weights in order.\n\n## Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|-----------|------------------------------------------------------------------------------------------------------------|------------------------|------------|\n| locations | [Sequence](`typing.Sequence`)\\[[WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`)\\] | The locations to read. | _required_ |\n\n## Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|-------------------------------------------|--------------------------------------------------------------------|\n| | [List](`typing.List`)\\[[float](`float`)\\] | List[float]: One weight per location, in the same order. Consumers |\n| | [List](`typing.List`)\\[[float](`float`)\\] | normalise these (e.g. |\n| | [List](`typing.List`)\\[[float](`float`)\\] | `spotforecast2_safe.weather.derived.population_weighted_average`). |\n\n## Examples {.doc-section .doc-section-examples}\n\n\n::: {#911e0ccd .cell execution_count=1}\n``` {.python .cell-code}\nfrom spotforecast2_safe.weather.locations import (\n default_german_locations,\n weights,\n)\n\nw = weights(default_german_locations())\nprint(len(w), w[0])\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n15 3677.0\n```\n:::\n:::\n\n\n", "supporting": [ - "weather.locations.weights_files" + "weather.locations.weights_files/figure-html" ], "filters": [], "includes": {} diff --git a/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd b/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd index f9e472ac..8609e324 100644 --- a/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd +++ b/docs/reference/configurator.config_entsoe.ConfigEntsoe.qmd @@ -29,6 +29,8 @@ configurator.config_entsoe.ConfigEntsoe( include_holiday_features=False, include_holiday_adjacency_features=False, use_population_weighted_weather=False, + per_zone_weather=False, + zone_weather_locations=None, include_degree_hours=False, include_apparent_temperature=False, degree_hours_base_heating=15.0, diff --git a/docs/reference/configurator.config_multi.ConfigMulti.qmd b/docs/reference/configurator.config_multi.ConfigMulti.qmd index 7002915c..ba76fab4 100644 --- a/docs/reference/configurator.config_multi.ConfigMulti.qmd +++ b/docs/reference/configurator.config_multi.ConfigMulti.qmd @@ -29,6 +29,8 @@ configurator.config_multi.ConfigMulti( include_holiday_features=False, include_holiday_adjacency_features=False, use_population_weighted_weather=False, + per_zone_weather=False, + zone_weather_locations=None, include_degree_hours=False, include_apparent_temperature=False, degree_hours_base_heating=15.0, @@ -123,6 +125,8 @@ API queries and holiday feature generation. | include_ephemeris_features | [bool](`bool`) | If True, include solar-elevation and daylight-duration features. Defaults to ``False``. | `False` | | include_day_type_features | [bool](`bool`) | If True, include working-day and day-type class features (``is_workday``, ``day_type``). Defaults to ``False``. | `False` | | include_school_holiday_features | [bool](`bool`) | Append the ``is_school_holiday`` binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only ``country_code="DE"`` is supported. Defaults to ``False``. | `False` | +| per_zone_weather | [bool](`bool`) | When True, each target is treated as a German TSO control zone and receives weather from its own regional cities via ``weather.locations.locations_for_zone``. Mutually exclusive with ``use_population_weighted_weather``; requires ``use_exogenous_features=True``; not compatible with ``poly_features_degree >= 2``. Default ``False`` → byte-identical to the shared-weather baseline. | `False` | +| zone_weather_locations | [Optional](`typing.Optional`)\[[Dict](`typing.Dict`)\[[str](`str`), [Any](`typing.Any`)\]\] | Optional override mapping from zone key (e.g. ``"load_50hertz"``) to a list of ``WeatherLocation`` objects. ``None`` (default) uses the built-in registry partition from ``GERMAN_TSO_ZONE_CITIES``. | `None` | | poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` | | max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` | | poly_mi_n_jobs | [Optional](`typing.Optional`)\[[int](`int`)\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` | @@ -147,63 +151,65 @@ API queries and holiday feature generation. ## Attributes {.doc-section .doc-section-attributes} -| Name | Type | Description | -|------------------------------------|----------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| country_code | [str](`str`) | ISO country code for API queries and holiday generation. | -| periods | [List](`typing.List`)\[[Period](`spotforecast2_safe.data.Period`)\] | Cyclical feature encoding specifications. | -| lags_consider | [List](`typing.List`)\[[int](`int`)\] | Lag values for autoregressive features. | -| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. | -| end_train_default | [str](`str`) | Default training end date. | -| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. | -| predict_size | [int](`int`) | Prediction horizon in hours. | -| refit_size | [int](`int`) | Refit interval in days. | -| random_state | [int](`int`) | Random seed. | -| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. | -| targets | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Active target column names. ``None`` until explicitly set from the loaded dataset. | -| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. | -| contamination | [float](`float`) | IsolationForest contamination fraction. | -| imputation_method | [str](`str`) | Gap-filling strategy (``"weighted"`` or ``"linear"``). | -| window_size | [int](`int`) | Rolling window size for weighted imputation. | -| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. | -| latitude | [float](`float`) | Location latitude. | -| longitude | [float](`float`) | Location longitude. | -| timezone | [str](`str`) | IANA timezone string. | -| state | [str](`str`) | Subdivision code for regional holidays. | -| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. | -| include_holiday_features | [bool](`bool`) | Holiday feature toggle. | -| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. | -| include_ephemeris_features | [bool](`bool`) | Solar-elevation and daylight-duration feature toggle. Defaults to ``False``. | -| include_day_type_features | [bool](`bool`) | Working-day / day-type class feature toggle. Defaults to ``False``. | -| include_school_holiday_features | [bool](`bool`) | Per-Bundesland school-holiday indicator toggle. Defaults to ``False``. | -| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). | -| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). | -| poly_mi_n_jobs | [Optional](`typing.Optional`)\[[int](`int`)\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). | -| poly_mi_sample_size | [Optional](`typing.Optional`)\[[int](`int`)\] | Row cap for the MI ranking (``None`` = score every row). | -| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. | -| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. | -| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. | -| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). | -| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). | -| include_football_match_window | [bool](`bool`) | Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026. | -| include_energy_saving_window | [bool](`bool`) | Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise). | -| index_name | [str](`str`) | Datetime column name used when resetting the index. | -| bounds | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[tuple](`tuple`)\]\] | Per-column outlier bounds ``(lower, upper)``. | -| verbose | [bool](`bool`) | Verbose output toggle. | -| cache_home | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Path to the cache directory. | -| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. | -| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. | -| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. | -| max_time_spotoptim | [Optional](`typing.Optional`)\[[float](`float`)\] | Wall-clock budget for the SpotOptim search in minutes; ``None`` disables the limit. | -| warm_start_lags | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[int](`int`)\]\] | Seed lag set for the SpotOptim search; ``None`` or empty disables the warm start. | -| task | [str](`str`) | Active prediction task (``"lazy"``, ``"training"``, ``"optuna"``, or ``"spotoptim"``). | -| agg_weights | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[float](`float`)\]\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. | -| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. | -| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. | -| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. | -| on_weather_failure | [Literal](`typing.Literal`)\[\'raise\', \'skip\'\] | Open-Meteo fetch-failure policy: ``"raise"`` aborts, ``"skip"`` continues without weather. | -| on_exog_provider_failure | [Literal](`typing.Literal`)\[\'raise\', \'skip\'\] | Exog-provider failure policy in ``ExogBuilder.build``: ``"raise"`` (default) propagates the ``ExogProviderError``; ``"skip"`` logs and omits the failing provider's columns. | -| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). | -| exog_provider_window | [Literal](`typing.Literal`)\[\'full\', \'train\'\] | Validation window for exog providers: ``"full"`` (default) or ``"train"``. | +| Name | Type | Description | +|------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| country_code | [str](`str`) | ISO country code for API queries and holiday generation. | +| periods | [List](`typing.List`)\[[Period](`spotforecast2_safe.data.Period`)\] | Cyclical feature encoding specifications. | +| lags_consider | [List](`typing.List`)\[[int](`int`)\] | Lag values for autoregressive features. | +| train_size | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Training data window. | +| end_train_default | [str](`str`) | Default training end date. | +| delta_val | [pd](`pandas`).[Timedelta](`pandas.Timedelta`) | Validation window. | +| predict_size | [int](`int`) | Prediction horizon in hours. | +| refit_size | [int](`int`) | Refit interval in days. | +| random_state | [int](`int`) | Random seed. | +| n_hyperparameters_trials | [int](`int`) | Hyperparameter tuning trials. | +| targets | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Active target column names. ``None`` until explicitly set from the loaded dataset. | +| use_outlier_detection | [bool](`bool`) | IsolationForest outlier removal toggle. | +| contamination | [float](`float`) | IsolationForest contamination fraction. | +| imputation_method | [str](`str`) | Gap-filling strategy (``"weighted"`` or ``"linear"``). | +| window_size | [int](`int`) | Rolling window size for weighted imputation. | +| use_exogenous_features | [bool](`bool`) | Exogenous feature construction toggle. | +| latitude | [float](`float`) | Location latitude. | +| longitude | [float](`float`) | Location longitude. | +| timezone | [str](`str`) | IANA timezone string. | +| state | [str](`str`) | Subdivision code for regional holidays. | +| include_weather_windows | [bool](`bool`) | Weather-window feature toggle. | +| include_holiday_features | [bool](`bool`) | Holiday feature toggle. | +| include_holiday_adjacency_features | [bool](`bool`) | Brückentag and before/after-holiday indicator toggle. Defaults to ``False``. | +| include_ephemeris_features | [bool](`bool`) | Solar-elevation and daylight-duration feature toggle. Defaults to ``False``. | +| include_day_type_features | [bool](`bool`) | Working-day / day-type class feature toggle. Defaults to ``False``. | +| include_school_holiday_features | [bool](`bool`) | Per-Bundesland school-holiday indicator toggle. Defaults to ``False``. | +| per_zone_weather | [bool](`bool`) | When True, each target is a TSO control zone that fetches its own regional weather via ``weather.locations.locations_for_zone``. Mutually exclusive with ``use_population_weighted_weather``; requires ``use_exogenous_features=True``; not compatible with ``poly_features_degree >= 2``. Default ``False``. | +| zone_weather_locations | [Optional](`typing.Optional`)\[[Dict](`typing.Dict`)\[[str](`str`), [Any](`typing.Any`)\]\] | Override mapping from zone key to a list of ``WeatherLocation`` objects. ``None`` uses the built-in ``GERMAN_TSO_ZONE_CITIES`` partition. | +| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). | +| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). | +| poly_mi_n_jobs | [Optional](`typing.Optional`)\[[int](`int`)\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). | +| poly_mi_sample_size | [Optional](`typing.Optional`)\[[int](`int`)\] | Row cap for the MI ranking (``None`` = score every row). | +| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. | +| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. | +| include_entsoe_renewable_forecast | [bool](`bool`) | Append the ENTSO-E day-ahead wind/solar generation forecast. | +| include_entsoe_net_load | [bool](`bool`) | Append the ENTSO-E day-ahead net load (Forecasted Load minus wind/solar forecast). | +| include_entsoe_day_ahead_price | [bool](`bool`) | Append the ENTSO-E day-ahead spot price (DE/LU). | +| include_football_match_window | [bool](`bool`) | Append the bundled German football-match event-window feature (1.0 during configured match windows, 0.0 otherwise). Covers German national-team matches and tournament finals from UEFA Euro 2016 through FIFA World Cup 2026. | +| include_energy_saving_window | [bool](`bool`) | Append the bundled German energy-saving regulatory window feature (1.0 during the EnSikuMaV and EU Regulation 2022/1854 periods, 0.0 otherwise). | +| index_name | [str](`str`) | Datetime column name used when resetting the index. | +| bounds | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[tuple](`tuple`)\]\] | Per-column outlier bounds ``(lower, upper)``. | +| verbose | [bool](`bool`) | Verbose output toggle. | +| cache_home | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Path to the cache directory. | +| n_trials_optuna | [int](`int`) | Number of Optuna hyperparameter-search trials. | +| n_trials_spotoptim | [int](`int`) | Number of SpotOptim search trials. | +| n_initial_spotoptim | [int](`int`) | Number of initial SpotOptim evaluations. | +| max_time_spotoptim | [Optional](`typing.Optional`)\[[float](`float`)\] | Wall-clock budget for the SpotOptim search in minutes; ``None`` disables the limit. | +| warm_start_lags | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[int](`int`)\]\] | Seed lag set for the SpotOptim search; ``None`` or empty disables the warm start. | +| task | [str](`str`) | Active prediction task (``"lazy"``, ``"training"``, ``"optuna"``, or ``"spotoptim"``). | +| agg_weights | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[float](`float`)\]\] | Per-target aggregation weights. One weight per entry in ``targets``; positive values add, negative values invert the target's contribution. ``None`` until set. | +| auto_save_models | [bool](`bool`) | Whether to auto-persist fitted forecasters after each training run. | +| data_frame_name | [str](`str`) | Active-dataset identifier used for cache and log-file naming. | +| number_folds | [int](`int`) | Cross-validation fold count for tuning tasks. | +| on_weather_failure | [Literal](`typing.Literal`)\[\'raise\', \'skip\'\] | Open-Meteo fetch-failure policy: ``"raise"`` aborts, ``"skip"`` continues without weather. | +| on_exog_provider_failure | [Literal](`typing.Literal`)\[\'raise\', \'skip\'\] | Exog-provider failure policy in ``ExogBuilder.build``: ``"raise"`` (default) propagates the ``ExogProviderError``; ``"skip"`` logs and omits the failing provider's columns. | +| exog_max_gap_hours | [int](`int`) | Maximum contiguous gap in hours that providers will heal before raising (0 = strict fail-safe). | +| exog_provider_window | [Literal](`typing.Literal`)\[\'full\', \'train\'\] | Validation window for exog providers: ``"full"`` (default) or ``"train"``. | ## Notes {.doc-section .doc-section-notes} diff --git a/docs/reference/manager.features.get_target_data.qmd b/docs/reference/manager.features.get_target_data.qmd index 367e176b..b1f6ddbe 100644 --- a/docs/reference/manager.features.get_target_data.qmd +++ b/docs/reference/manager.features.get_target_data.qmd @@ -9,6 +9,7 @@ manager.features.get_target_data( data_with_exog=None, exog_feature_names=None, exo_pred=None, + zone_weather=None, start_train_ts, end_train_ts, ) @@ -33,16 +34,17 @@ required; passing ``None`` raises ``ValueError``. ## Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|--------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| -| target | [str](`str`) | Name of the target column to extract from *df_pipeline*. | _required_ | -| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame with a tz-aware `DatetimeIndex` containing all target columns produced by the preprocessing pipeline. | _required_ | -| config | \'ConfigMulti\' | Pipeline configuration object. ``use_exogenous_features`` must be set. | _required_ | -| data_with_exog | [Optional](`typing.Optional`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Merged DataFrame of target and exogenous columns covering at least the training window. Required when ``config.use_exogenous_features`` is ``True``. Pass ``None`` (default) to skip exogenous slicing. | `None` | -| exog_feature_names | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Column names to select from *data_with_exog* and *exo_pred*. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` | -| exo_pred | [Optional](`typing.Optional`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Exogenous feature DataFrame covering the forecast horizon. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` | -| start_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive start of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.start_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ | -| end_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive end of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.end_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ | +| Name | Type | Description | Default | +|--------------------|---------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| target | [str](`str`) | Name of the target column to extract from *df_pipeline*. | _required_ | +| df_pipeline | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame with a tz-aware `DatetimeIndex` containing all target columns produced by the preprocessing pipeline. | _required_ | +| config | \'ConfigMulti\' | Pipeline configuration object. ``use_exogenous_features`` must be set. | _required_ | +| data_with_exog | [Optional](`typing.Optional`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Merged DataFrame of target and exogenous columns covering at least the training window. Required when ``config.use_exogenous_features`` is ``True``. Pass ``None`` (default) to skip exogenous slicing. | `None` | +| exog_feature_names | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[str](`str`)\]\] | Column names to select from *data_with_exog* and *exo_pred*. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` | +| exo_pred | [Optional](`typing.Optional`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Exogenous feature DataFrame covering the forecast horizon. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. | `None` | +| zone_weather | [Optional](`typing.Optional`)\[[pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Per-zone weather frame whose columns, where present in ``exog_feature_names``, overwrite the shared weather values for this target. Used by the per-zone weather feature (``config.per_zone_weather=True``). ``None`` (default) means the shared weather columns are used unchanged. The overwrite is in-place on the sliced copies; column order and shape are preserved. | `None` | +| start_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive start of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.start_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ | +| end_train_ts | [pd](`pandas`).[Timestamp](`pandas.Timestamp`) | Inclusive end of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.end_train_ts`` after the pipeline has been prepared. Passing ``None`` raises ``ValueError``. | _required_ | ## Returns {.doc-section .doc-section-returns} diff --git a/docs/reference/multitask.base.BaseTask.qmd b/docs/reference/multitask.base.BaseTask.qmd index f05d8476..275e8419 100644 --- a/docs/reference/multitask.base.BaseTask.qmd +++ b/docs/reference/multitask.base.BaseTask.qmd @@ -214,13 +214,14 @@ target-only. #### Attributes {.doc-section .doc-section-attributes} -| Name | Type | Description | -|--------------------|------------------------------------------------|-------------------------------------------------------------------------------------------------| -| weather_aligned | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Weather frame aligned to the pipeline index, reused by the interaction and selection steps. | -| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Full combined, encoded, and capped exogenous feature matrix. | -| exog_feature_names | [List](`typing.List`)\[[str](`str`)\] | Names of the exogenous features selected for training (including provider columns). | -| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Target data merged with the selected exogenous covariates. | -| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates spanning the forecast horizon, supplied to the forecaster at predict time. | +| Name | Type | Description | +|----------------------|---------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| weather_aligned | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Weather frame aligned to the pipeline index, reused by the interaction and selection steps. | +| zone_weather_aligned | [Dict](`typing.Dict`)\[[str](`str`), [pd](`pandas`).[DataFrame](`pandas.DataFrame`)\] | Per-zone weather frames keyed by target name, indexed over ``[data_start, cov_end]`` (covering the forecast horizon). Populated only when ``config.per_zone_weather`` is True and every zone fetch succeeded; empty otherwise (including the fail-safe "skip" degradation). Consumed at the per-target seam in ``_get_target_data`` to overwrite the shared weather columns. | +| exogenous_features | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Full combined, encoded, and capped exogenous feature matrix. | +| exog_feature_names | [List](`typing.List`)\[[str](`str`)\] | Names of the exogenous features selected for training (including provider columns). | +| data_with_exog | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Target data merged with the selected exogenous covariates. | +| exo_pred | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | Exogenous covariates spanning the forecast horizon, supplied to the forecaster at predict time. | #### Returns {.doc-section .doc-section-returns} diff --git a/docs/reference/tasks.task_safe_zone_load_demo.qmd b/docs/reference/tasks.task_safe_zone_load_demo.qmd index bb2ec680..a567651f 100644 --- a/docs/reference/tasks.task_safe_zone_load_demo.qmd +++ b/docs/reference/tasks.task_safe_zone_load_demo.qmd @@ -23,6 +23,14 @@ The comparison uses the project's own backtesting engine (`backtesting_forecaster` + `TimeSeriesFold`) over a 24-hour horizon and reports MAE / RMSE / MAPE for both methods. +Per-zone weather (``--per_zone_weather``). When enabled, each zone model +receives weather from its own TSO control-area cities instead of the shared +single-point baseline. The pipeline uses ``on_weather_failure="skip"`` so the +smoke test stays network-free: if Open-Meteo is unreachable the per-zone +weather degrades to no-weather rather than failing. The default path +(``per_zone_weather=False``) is exactly the pre-feature baseline +(``use_exogenous_features=False``). + ## Examples {.doc-section .doc-section-examples} ```{python} @@ -32,6 +40,8 @@ MAE / RMSE / MAPE for both methods. # uv run spotforecast-safe-zone-load-demo # Run on assembled real ENTSO-E zone data: # uv run spotforecast-safe-zone-load-demo # --data_path ~/spotforecast2_data/interim/energy_load_zones.csv +# Enable per-zone weather (requires Open-Meteo access; degrades gracefully): +# uv run spotforecast-safe-zone-load-demo --per_zone_weather true ``` ## Functions @@ -50,6 +60,7 @@ tasks.task_safe_zone_load_demo.main( history_hours=24 * 100, random_seed=314159, logging_enabled=False, + per_zone_weather=False, ) ``` @@ -57,14 +68,15 @@ Run the four-zone bottom-up load demo. Returns 0 on success, 1 on failure. #### Parameters {.doc-section .doc-section-parameters} -| Name | Type | Description | Default | -|-----------------|---------------------------------------------------------|-------------------------------------------------------------------------------------------------------------|------------| -| synthetic | [bool](`bool`) | If True (and ``data_path`` is None), use seeded synthetic data. Ignored when ``data_path`` is provided. | `True` | -| data_path | [Optional](`typing.Optional`)\[[Path](`pathlib.Path`)\] | Path to an assembled four-column zone-load CSV. When given, real data is used and ``synthetic`` is ignored. | `None` | -| predict_size | [int](`int`) | Forecast horizon in hours (also the backtest fold size). | `24` | -| history_hours | [int](`int`) | Length of the synthetic series in hours. | `24 * 100` | -| random_seed | [int](`int`) | Seed for the synthetic data and the estimators. | `314159` | -| logging_enabled | [bool](`bool`) | If True, attach the dual console/file handlers. | `False` | +| Name | Type | Description | Default | +|------------------|---------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------| +| synthetic | [bool](`bool`) | If True (and ``data_path`` is None), use seeded synthetic data. Ignored when ``data_path`` is provided. | `True` | +| data_path | [Optional](`typing.Optional`)\[[Path](`pathlib.Path`)\] | Path to an assembled four-column zone-load CSV. When given, real data is used and ``synthetic`` is ignored. | `None` | +| predict_size | [int](`int`) | Forecast horizon in hours (also the backtest fold size). | `24` | +| history_hours | [int](`int`) | Length of the synthetic series in hours. | `24 * 100` | +| random_seed | [int](`int`) | Seed for the synthetic data and the estimators. | `314159` | +| logging_enabled | [bool](`bool`) | If True, attach the dual console/file handlers. | `False` | +| per_zone_weather | [bool](`bool`) | If True, each zone model fetches weather from its own TSO control-area cities (``config.per_zone_weather=True``, ``on_weather_failure="skip"``). Requires Open-Meteo access; if unreachable the pipeline degrades to no-weather. Default ``False`` keeps the pipeline byte-identical to the pre-feature baseline. | `False` | #### Examples {.doc-section .doc-section-examples} diff --git a/docs/reference/weather.locations.default_german_locations.qmd b/docs/reference/weather.locations.default_german_locations.qmd index c2d7a2a6..25874413 100644 --- a/docs/reference/weather.locations.default_german_locations.qmd +++ b/docs/reference/weather.locations.default_german_locations.qmd @@ -8,10 +8,10 @@ Return the default population-weighted German load-centre registry. ## Returns {.doc-section .doc-section-returns} -| Name | Type | Description | -|--------|---------------------------------------------------------------------------|-----------------------------------------------------------------------| -| | [WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`) | Tuple[WeatherLocation, ...]: :data:`GERMAN_LOAD_CENTERS`, in a fixed, | -| | ... | deterministic order. | +| Name | Type | Description | +|--------|---------------------------------------------------------------------------|-----------------------------------------------------------------| +| | [WeatherLocation](`spotforecast2_safe.weather.locations.WeatherLocation`) | Tuple[WeatherLocation, ...]: `GERMAN_LOAD_CENTERS`, in a fixed, | +| | ... | deterministic order. | ## Examples {.doc-section .doc-section-examples} diff --git a/docs/reference/weather.locations.weights.qmd b/docs/reference/weather.locations.weights.qmd index 8f405db0..a74b75c8 100644 --- a/docs/reference/weather.locations.weights.qmd +++ b/docs/reference/weather.locations.weights.qmd @@ -14,11 +14,11 @@ Extract the raw (un-normalised) weights in order. ## Returns {.doc-section .doc-section-returns} -| Name | Type | Description | -|--------|-------------------------------------------|-------------------------------------------------------------------------| -| | [List](`typing.List`)\[[float](`float`)\] | List[float]: One weight per location, in the same order. Consumers | -| | [List](`typing.List`)\[[float](`float`)\] | normalise these (e.g. | -| | [List](`typing.List`)\[[float](`float`)\] | func:`spotforecast2_safe.weather.derived.population_weighted_average`). | +| Name | Type | Description | +|--------|-------------------------------------------|--------------------------------------------------------------------| +| | [List](`typing.List`)\[[float](`float`)\] | List[float]: One weight per location, in the same order. Consumers | +| | [List](`typing.List`)\[[float](`float`)\] | normalise these (e.g. | +| | [List](`typing.List`)\[[float](`float`)\] | `spotforecast2_safe.weather.derived.population_weighted_average`). | ## Examples {.doc-section .doc-section-examples} diff --git a/src/spotforecast2_safe/configurator/_base_config.py b/src/spotforecast2_safe/configurator/_base_config.py index e45f652c..3a250c69 100644 --- a/src/spotforecast2_safe/configurator/_base_config.py +++ b/src/spotforecast2_safe/configurator/_base_config.py @@ -376,3 +376,30 @@ def validate_config(config: object) -> None: raise ValueError( f"target_qc_deviation_slots must be >= 1; got {target_qc_deviation_slots}." ) + + per_zone_weather = getattr(config, "per_zone_weather", False) + if per_zone_weather: + if getattr(config, "use_population_weighted_weather", False): + raise ValueError( + "per_zone_weather and use_population_weighted_weather are mutually " + "exclusive (zones use regional weather, not the global index)." + ) + if not getattr(config, "use_exogenous_features", True): + raise ValueError( + "per_zone_weather requires exogenous features " + "(use_exogenous_features must be True)." + ) + if getattr(config, "poly_features_degree", 1) >= 2: + raise ValueError( + "per-zone weather does not support polynomial interaction features " + "(poly_features_degree>=2), whose products are precomputed from the " + "shared weather." + ) + zone_weather_locations = getattr(config, "zone_weather_locations", None) + if zone_weather_locations is not None and not isinstance( + zone_weather_locations, dict + ): + raise TypeError( + "zone_weather_locations must be a dict or None; " + f"got {type(zone_weather_locations).__name__!r}." + ) diff --git a/src/spotforecast2_safe/configurator/config_multi.py b/src/spotforecast2_safe/configurator/config_multi.py index e8673b09..4489b0a3 100644 --- a/src/spotforecast2_safe/configurator/config_multi.py +++ b/src/spotforecast2_safe/configurator/config_multi.py @@ -93,6 +93,17 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``. binary indicator from the bundled OpenHolidays API dataset (ODbL-1.0). Coverage 2022-01-01 to 2027-12-31 for all 16 German Bundesländer. Only ``country_code="DE"`` is supported. Defaults to ``False``. + per_zone_weather (bool): When True, each target is treated as a German + TSO control zone and receives weather from its own regional cities + via ``weather.locations.locations_for_zone``. Mutually exclusive + with ``use_population_weighted_weather``; requires + ``use_exogenous_features=True``; not compatible with + ``poly_features_degree >= 2``. Default ``False`` → byte-identical + to the shared-weather baseline. + zone_weather_locations (Optional[Dict[str, Any]]): Optional override + mapping from zone key (e.g. ``"load_50hertz"``) to a list of + ``WeatherLocation`` objects. ``None`` (default) uses the built-in + registry partition from ``GERMAN_TSO_ZONE_CITIES``. poly_features_degree (int): Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. @@ -213,6 +224,15 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``. toggle. Defaults to ``False``. include_school_holiday_features (bool): Per-Bundesland school-holiday indicator toggle. Defaults to ``False``. + per_zone_weather (bool): When True, each target is a TSO control zone + that fetches its own regional weather via + ``weather.locations.locations_for_zone``. Mutually exclusive with + ``use_population_weighted_weather``; requires + ``use_exogenous_features=True``; not compatible with + ``poly_features_degree >= 2``. Default ``False``. + zone_weather_locations (Optional[Dict[str, Any]]): Override mapping + from zone key to a list of ``WeatherLocation`` objects. ``None`` + uses the built-in ``GERMAN_TSO_ZONE_CITIES`` partition. poly_features_degree (int): Polynomial-interaction degree (1 = off). max_poly_features (int): Cap on kept ``poly_*`` columns (top-K by MI). poly_mi_n_jobs (Optional[int]): Parallel jobs for the MI ranking @@ -363,6 +383,18 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``. # ``include_degree_hours`` adds heating/cooling degree-hours (hdh/cdh) and # ``include_apparent_temperature`` adds apparent temperature + dew point. use_population_weighted_weather: bool = False + # Per-zone weather: when True, each target is resolved as a TSO control + # zone and fetches weather from its own regional cities (via + # spotforecast2_safe.weather.locations.locations_for_zone). Mutually + # exclusive with use_population_weighted_weather; requires + # use_exogenous_features=True; not compatible with + # poly_features_degree>=2 (polynomial interactions are precomputed from + # the shared weather frame). Default OFF → byte-identical to today. + per_zone_weather: bool = False + # Optional override mapping zone key → list of WeatherLocation objects; + # None uses the built-in registry partition (GERMAN_TSO_ZONE_CITIES). + # Not a mutable default — the field is None, not []. + zone_weather_locations: Optional[Dict[str, Any]] = None include_degree_hours: bool = False include_apparent_temperature: bool = False degree_hours_base_heating: float = 15.0 diff --git a/src/spotforecast2_safe/manager/features.py b/src/spotforecast2_safe/manager/features.py index 0a45aecb..5811a73d 100644 --- a/src/spotforecast2_safe/manager/features.py +++ b/src/spotforecast2_safe/manager/features.py @@ -686,6 +686,7 @@ def get_target_data( data_with_exog: Optional[pd.DataFrame] = None, exog_feature_names: Optional[List[str]] = None, exo_pred: Optional[pd.DataFrame] = None, + zone_weather: Optional[pd.DataFrame] = None, start_train_ts: pd.Timestamp, end_train_ts: pd.Timestamp, ) -> Tuple[pd.Series, Optional[pd.DataFrame], Optional[pd.DataFrame]]: @@ -723,6 +724,13 @@ def get_target_data( exo_pred: Exogenous feature DataFrame covering the forecast horizon. Required when *data_with_exog* is not ``None``. Pass ``None`` (default) when exogenous features are disabled. + zone_weather: Per-zone weather frame whose columns, where present in + ``exog_feature_names``, overwrite the shared weather values for this + target. Used by the per-zone weather feature + (``config.per_zone_weather=True``). ``None`` (default) means the + shared weather columns are used unchanged. The overwrite is + in-place on the sliced copies; column order and shape are + preserved. start_train_ts: Inclusive start of the training window (tz-aware ``pd.Timestamp``). **Keyword-only, required** — pass ``task.run_state.start_train_ts`` after the pipeline has been @@ -865,4 +873,17 @@ def get_target_data( "exo_pred must be a pandas DataFrame when using exogenous features." ) + if zone_weather is not None and exog_feature_names is not None: + overwrite = [c for c in exog_feature_names if c in zone_weather.columns] + if overwrite and exog_train is not None: + for c in overwrite: + exog_train[c] = ( + zone_weather[c].reindex(exog_train.index).astype("float32") + ) + if overwrite and exog_future is not None: + for c in overwrite: + exog_future[c] = ( + zone_weather[c].reindex(exog_future.index).astype("float32") + ) + return y_train, exog_train, exog_future diff --git a/src/spotforecast2_safe/multitask/base.py b/src/spotforecast2_safe/multitask/base.py index e81f50ce..405fd9d1 100644 --- a/src/spotforecast2_safe/multitask/base.py +++ b/src/spotforecast2_safe/multitask/base.py @@ -391,6 +391,7 @@ def __init__( self.weight_func: Optional[Any] = None self.exogenous_features: Optional[pd.DataFrame] = None self.weather_aligned: Optional[pd.DataFrame] = None + self.zone_weather_aligned: Dict[str, pd.DataFrame] = {} self.exog_feature_names: List[str] = [] self.data_with_exog: Optional[pd.DataFrame] = None self.exo_pred: Optional[pd.DataFrame] = None @@ -989,6 +990,13 @@ def build_exogenous_features(self) -> "BaseTask": Attributes: weather_aligned (pd.DataFrame): Weather frame aligned to the pipeline index, reused by the interaction and selection steps. + zone_weather_aligned (Dict[str, pd.DataFrame]): Per-zone weather + frames keyed by target name, indexed over ``[data_start, + cov_end]`` (covering the forecast horizon). Populated only when + ``config.per_zone_weather`` is True and every zone fetch + succeeded; empty otherwise (including the fail-safe "skip" + degradation). Consumed at the per-target seam in + ``_get_target_data`` to overwrite the shared weather columns. exogenous_features (pd.DataFrame): Full combined, encoded, and capped exogenous feature matrix. exog_feature_names (List[str]): Names of the exogenous features @@ -1090,6 +1098,90 @@ def build_exogenous_features(self) -> "BaseTask": self.weather_aligned = pd.DataFrame(index=self.df_pipeline.index) self.logger.info(" Weather features: %s", weather_features.shape) + # 4a-bis. Per-zone weather (opt-in, default OFF). + # The global weather frame above stays as the shared SCHEMA/baseline; + # here we build a zone-specific frame for each target. At the per-target + # seam (_get_target_data) the zone values overwrite the shared weather + # columns — column names and schema are identical, only values differ. + self.zone_weather_aligned = {} + if getattr(self.config, "per_zone_weather", False): + from spotforecast2_safe.weather.locations import coordinates as _coords + from spotforecast2_safe.weather.locations import locations_for_zone + from spotforecast2_safe.weather.locations import weights as _wts + + override = getattr(self.config, "zone_weather_locations", None) + zone_frames: Dict[str, pd.DataFrame] = {} + failed: list[tuple[str, Exception]] = [] + shared_seed: Optional[tuple[pd.DataFrame, pd.DataFrame]] = None + for tgt in self.run_state.targets: + locs = locations_for_zone( + tgt, override=override + ) # raises ValueError on unknown zone (fail-safe) + try: + wf_z, wa_z = get_weather_features( + data=self.df_pipeline, + start=self.run_state.data_start, + cov_end=self.run_state.cov_end, + forecast_horizon=self.config.predict_size, + latitude=self.config.latitude, + longitude=self.config.longitude, + timezone=self.config.timezone, + freq="h", + cache_home=self.config.cache_home, + verbose=self.config.verbose, + locations=_coords(locs), + location_weights=_wts(locs), + derived_features=weather_derived or None, + hdh_base=getattr( + self.config, "degree_hours_base_heating", 15.0 + ), + cdh_base=getattr( + self.config, "degree_hours_base_cooling", 22.0 + ), + ) + except WeatherFetchError as exc: + if self.config.on_weather_failure == "raise": + raise WeatherFetchError( + f"Per-zone weather fetch failed for zone {tgt!r}: {exc}" + ) from exc + failed.append((tgt, exc)) + continue + # wf_z (the WindowFeatures output) already carries the raw + + # derived + window columns NaN-free; list it FIRST so the + # duplicate-column drop keeps its fully-filled raw columns over + # wa_z's ffill-only (possibly leading-NaN) ones. Keep the native + # [start, cov_end] index — it spans the forecast horizon + # (cov_end > data_end), which df_pipeline.index does NOT. The + # per-target seam reindexes per slice, so reindexing to + # df_pipeline.index here would NaN-out the per-zone exog_future. + combined = pd.concat([wf_z, wa_z], axis=1) + combined = combined.loc[:, ~combined.columns.duplicated()] + combined = combined.bfill().ffill() + zone_frames[tgt] = combined + if shared_seed is None: + shared_seed = (wf_z, wa_z) + if failed: + # Fail-safe: any zone failure under 'skip' degrades the WHOLE + # pipeline to no-weather (never substitute the global index for + # a failed zone). + self.logger.warning( + "Per-zone weather fetch failed for %s; on_weather_failure=" + "'skip' -> continuing with NO weather features for any target.", + ", ".join(t for t, _ in failed), + ) + weather_features = pd.DataFrame(index=self.df_pipeline.index) + self.weather_aligned = pd.DataFrame(index=self.df_pipeline.index) + self.zone_weather_aligned = {} + else: + self.zone_weather_aligned = zone_frames + # Seed the SHARED weather matrix/schema from the first zone so the + # weather columns enter the shared exog (and the per-target seam + # has columns to overwrite) regardless of the global step-4a + # fetch — which, under on_weather_failure="skip", may have failed + # and left self.weather_aligned empty. + if shared_seed is not None: + weather_features, self.weather_aligned = shared_seed + # 4b. Calendar calendar_features = get_calendar_features( start=self.run_state.data_start, @@ -1978,6 +2070,7 @@ def _get_target_data(self, target: str) -> tuple: self.exog_feature_names if self.exog_feature_names else None ), exo_pred=self.exo_pred, + zone_weather=self.zone_weather_aligned.get(target), start_train_ts=self.run_state.start_train_ts, end_train_ts=self.run_state.end_train_ts, ) diff --git a/src/spotforecast2_safe/tasks/task_safe_zone_load_demo.py b/src/spotforecast2_safe/tasks/task_safe_zone_load_demo.py index 30acb305..f3c33c7e 100644 --- a/src/spotforecast2_safe/tasks/task_safe_zone_load_demo.py +++ b/src/spotforecast2_safe/tasks/task_safe_zone_load_demo.py @@ -22,6 +22,14 @@ (`backtesting_forecaster` + `TimeSeriesFold`) over a 24-hour horizon and reports MAE / RMSE / MAPE for both methods. +Per-zone weather (``--per_zone_weather``). When enabled, each zone model +receives weather from its own TSO control-area cities instead of the shared +single-point baseline. The pipeline uses ``on_weather_failure="skip"`` so the +smoke test stays network-free: if Open-Meteo is unreachable the per-zone +weather degrades to no-weather rather than failing. The default path +(``per_zone_weather=False``) is exactly the pre-feature baseline +(``use_exogenous_features=False``). + Examples: ```{python} #| eval: false @@ -31,6 +39,8 @@ # Run on assembled real ENTSO-E zone data: # uv run spotforecast-safe-zone-load-demo \ # --data_path ~/spotforecast2_data/interim/energy_load_zones.csv + # Enable per-zone weather (requires Open-Meteo access; degrades gracefully): + # uv run spotforecast-safe-zone-load-demo --per_zone_weather true ``` """ @@ -207,6 +217,7 @@ def main( history_hours: int = 24 * 100, random_seed: int = 314159, logging_enabled: bool = False, + per_zone_weather: bool = False, ) -> int: """Run the four-zone bottom-up load demo. Returns 0 on success, 1 on failure. @@ -219,6 +230,11 @@ def main( history_hours: Length of the synthetic series in hours. random_seed: Seed for the synthetic data and the estimators. logging_enabled: If True, attach the dual console/file handlers. + per_zone_weather: If True, each zone model fetches weather from its own + TSO control-area cities (``config.per_zone_weather=True``, + ``on_weather_failure="skip"``). Requires Open-Meteo access; if + unreachable the pipeline degrades to no-weather. Default ``False`` + keeps the pipeline byte-identical to the pre-feature baseline. Examples: ```{python} @@ -263,17 +279,32 @@ def main( # --- Operational bottom-up forecast (one model per zone, summed) --- with tempfile.TemporaryDirectory() as tmp: - config = ConfigMulti( - targets=zones, - agg_weights=[1.0] * len(zones), # SUM -> total (not the mean) - predict_size=predict_size, - use_exogenous_features=False, - use_outlier_detection=False, - auto_save_models=False, - number_folds=2, - random_state=random_seed, - cache_home=tmp, - ) + if per_zone_weather: + config = ConfigMulti( + targets=zones, + agg_weights=[1.0] * len(zones), + predict_size=predict_size, + use_exogenous_features=True, + per_zone_weather=True, + on_weather_failure="skip", + use_outlier_detection=False, + auto_save_models=False, + number_folds=2, + random_state=random_seed, + cache_home=tmp, + ) + else: + config = ConfigMulti( + targets=zones, + agg_weights=[1.0] * len(zones), # SUM -> total (not the mean) + predict_size=predict_size, + use_exogenous_features=False, + use_outlier_detection=False, + auto_save_models=False, + number_folds=2, + random_state=random_seed, + cache_home=tmp, + ) with warnings.catch_warnings(): warnings.simplefilter("ignore") task = MultiTask(config, dataframe=df, task="lazy") @@ -345,6 +376,15 @@ def main( default=False, help="Enable logging (both console and file).", ) + parser.add_argument( + "--per_zone_weather", + type=parse_bool, + default=False, + help="When true, each zone model fetches weather from its own TSO " + "control-area cities. Requires Open-Meteo access; if unreachable " + "the pipeline degrades to no-weather (on_weather_failure='skip'). " + "Default false keeps the pre-feature baseline.", + ) args = parser.parse_args() specified_data_path = Path(args.data_path) if args.data_path else None @@ -354,5 +394,6 @@ def main( data_path=specified_data_path, predict_size=args.predict_size, logging_enabled=args.logging, + per_zone_weather=args.per_zone_weather, ) ) diff --git a/src/spotforecast2_safe/weather/locations.py b/src/spotforecast2_safe/weather/locations.py index 0155ed76..a04b1188 100644 --- a/src/spotforecast2_safe/weather/locations.py +++ b/src/spotforecast2_safe/weather/locations.py @@ -6,17 +6,17 @@ A single weather station (the historical default, Dortmund) cannot represent national electricity load: demand is concentrated in the large cities, while the climate signal varies across the country. The remedy recommended in the -forecasting literature is a **population-weighted multi-city temperature index** +forecasting literature is a population-weighted multi-city temperature index (Zimmermann & Ziel 2025, ``zimm25a``): sample several load centres and combine them with fixed weights proportional to population. This module supplies that fixed, deterministic registry. The default set -:data:`GERMAN_LOAD_CENTERS` is chosen "smartly" along two axes: +`GERMAN_LOAD_CENTERS` is chosen "smartly" along two axes: -- **Population weighting** — each centre's weight is its approximate city +- Population weighting — each centre's weight is its approximate city population (in thousands), so the index leans toward where the load actually is (cities) rather than treating every point equally. -- **Geographic spread** — the centres deliberately span the north (Hamburg, +- Geographic spread — the centres deliberately span the north (Hamburg, Bremen, Hannover), south (München, Stuttgart, Nürnberg), east (Berlin, Leipzig, Dresden) and west (Köln, Düsseldorf, Dortmund, Frankfurt), so the rural/peripheral climate diversity between the big cities is still captured @@ -24,13 +24,31 @@ The data are static published figures, so the registry is fully deterministic and key-free, and the weights flow into -:func:`spotforecast2_safe.weather.derived.population_weighted_average`. +`spotforecast2_safe.weather.derived.population_weighted_average`. + +Per-zone partition (`GERMAN_TSO_ZONE_CITIES`, `locations_for_zone`) +------------------------------------------------------------------- + +The four German TSO control areas (50Hertz, Amprion, TenneT, TransnetBW) each +have a distinct climate footprint. `GERMAN_TSO_ZONE_CITIES` partitions the 15 +cities in `GERMAN_LOAD_CENTERS` across those four zones, derived from the SMARD +control-area map (https://www.smard.de/page/home/wiki-article/446/205184/uebertragungsnetzbetreiber) +and the TSO control-area definitions +(https://de.wikipedia.org/wiki/Übertragungsnetzbetreiber). + +Notable quirks: Hamburg belongs to the 50Hertz zone (not to a northern TenneT +enclave), and Bremen is TenneT territory (not 50Hertz despite its northerly +position). + +The resolver `locations_for_zone` maps a zone key to a tuple of +`WeatherLocation` objects taken directly from `GERMAN_LOAD_CENTERS`, so each +city reuses its existing population weight. """ from __future__ import annotations from dataclasses import dataclass -from typing import List, Sequence, Tuple +from typing import Dict, List, Optional, Sequence, Tuple @dataclass(frozen=True) @@ -60,10 +78,11 @@ class WeatherLocation: weight: float -# Thirteen German load centres spanning all regions, weighted by approximate +# Fifteen German load centres spanning all regions, weighted by approximate # city population (thousands, rounded published figures). Geographic spread is # intentional: the set is not simply "the N largest cities" but a regionally # balanced sample so the national index is not dominated by one climate zone. +# Order: descending by weight (population). GERMAN_LOAD_CENTERS: Tuple[WeatherLocation, ...] = ( WeatherLocation("Berlin", 52.5200, 13.4050, 3677.0), WeatherLocation("Hamburg", 53.5511, 9.9937, 1906.0), @@ -78,14 +97,35 @@ class WeatherLocation: WeatherLocation("Dresden", 51.0504, 13.7373, 556.0), WeatherLocation("Hannover", 52.3759, 9.7320, 535.0), WeatherLocation("Nürnberg", 49.4521, 11.0767, 523.0), + # Added for per-zone weather support (TransnetBW zone coverage). + # Weights: Mannheim ~313k, Karlsruhe ~308k (2023 published figures). + WeatherLocation("Mannheim", 49.4875, 8.4660, 313.0), + WeatherLocation("Karlsruhe", 49.0069, 8.4037, 308.0), ) +# Partition of GERMAN_LOAD_CENTERS city names across the four German TSO +# control areas. Keys are the same string literals used as column names by +# spotforecast2_safe.downloader.entsoe.GERMAN_TSO_ZONES (mirrored here as +# literals to avoid an import cycle with the downloader package). +# +# Provenance: derived from the SMARD control-area map +# (https://www.smard.de/page/home/wiki-article/446/205184/uebertragungsnetzbetreiber) +# and the TSO area definitions +# (https://de.wikipedia.org/wiki/Übertragungsnetzbetreiber). +# Notable quirks: Hamburg is 50Hertz territory; Bremen is TenneT territory. +GERMAN_TSO_ZONE_CITIES: Dict[str, Tuple[str, ...]] = { + "load_50hertz": ("Berlin", "Hamburg", "Leipzig", "Dresden"), + "load_amprion": ("Köln", "Düsseldorf", "Dortmund", "Frankfurt"), + "load_tennet": ("München", "Nürnberg", "Hannover", "Bremen"), + "load_transnetbw": ("Stuttgart", "Karlsruhe", "Mannheim"), +} + def default_german_locations() -> Tuple[WeatherLocation, ...]: """Return the default population-weighted German load-centre registry. Returns: - Tuple[WeatherLocation, ...]: :data:`GERMAN_LOAD_CENTERS`, in a fixed, + Tuple[WeatherLocation, ...]: `GERMAN_LOAD_CENTERS`, in a fixed, deterministic order. Examples: @@ -133,7 +173,7 @@ def weights(locations: Sequence[WeatherLocation]) -> List[float]: Returns: List[float]: One weight per location, in the same order. Consumers normalise these (e.g. - :func:`spotforecast2_safe.weather.derived.population_weighted_average`). + `spotforecast2_safe.weather.derived.population_weighted_average`). Examples: ```{python} @@ -147,3 +187,57 @@ def weights(locations: Sequence[WeatherLocation]) -> List[float]: ``` """ return [loc.weight for loc in locations] + + +def locations_for_zone( + zone: str, + *, + override: Optional[Dict[str, Sequence[WeatherLocation]]] = None, +) -> Tuple[WeatherLocation, ...]: + """Return the `WeatherLocation` objects assigned to a German TSO control zone. + + When ``override`` is provided and contains ``zone``, its value is returned + directly (allows callers to inject custom location sets without modifying + the registry). Otherwise the zone is resolved from `GERMAN_TSO_ZONE_CITIES` + by looking up each city name in `GERMAN_LOAD_CENTERS`. + + Each returned location reuses the population weight from the registry, so + the result can be passed directly to `coordinates` and `weights`. + + Args: + zone: One of the four German TSO zone keys: ``"load_50hertz"``, + ``"load_amprion"``, ``"load_tennet"``, or ``"load_transnetbw"``. + override: Optional mapping from zone key to a sequence of + `WeatherLocation` objects. When the key is present, its value is + returned as a tuple and the registry is not consulted. ``None`` + (default) always uses the registry. + + Returns: + Tuple[WeatherLocation, ...]: Ordered tuple of locations for the zone, + in registry order (descending by population weight). + + Raises: + ValueError: If ``zone`` is not in `GERMAN_TSO_ZONE_CITIES` (and no + override covers it), listing the valid zone keys. + + Examples: + ```{python} + from spotforecast2_safe.weather.locations import locations_for_zone + + locs = locations_for_zone("load_50hertz") + print([l.name for l in locs]) + ``` + """ + if override is not None and zone in override: + return tuple(override[zone]) + + if zone not in GERMAN_TSO_ZONE_CITIES: + valid = ", ".join(sorted(GERMAN_TSO_ZONE_CITIES)) + raise ValueError(f"Unknown TSO zone {zone!r}. Valid zones are: {valid}.") + + # Build a name→WeatherLocation index once per call (15 entries, negligible cost). + registry_index: Dict[str, WeatherLocation] = { + loc.name: loc for loc in GERMAN_LOAD_CENTERS + } + city_names = GERMAN_TSO_ZONE_CITIES[zone] + return tuple(registry_index[name] for name in city_names) diff --git a/tests/test_per_zone_weather.py b/tests/test_per_zone_weather.py new file mode 100644 index 00000000..a63c8874 --- /dev/null +++ b/tests/test_per_zone_weather.py @@ -0,0 +1,609 @@ +# SPDX-FileCopyrightText: 2026 bartzbeielstein +# SPDX-License-Identifier: AGPL-3.0-or-later + +"""Tests for the per-zone weather feature (ADR feat/per-zone-weather-entsoe). + +All tests are network-free: no Open-Meteo calls are made. The seam unit test +for get_target_data uses a synthetic zone_weather frame. +""" + +from __future__ import annotations + +import tempfile + +import numpy as np +import pandas as pd +import pytest + +from spotforecast2_safe.configurator.config_multi import ConfigMulti +from spotforecast2_safe.manager.features import get_target_data +from spotforecast2_safe.weather import WeatherFetchError +from spotforecast2_safe.weather.locations import ( + GERMAN_LOAD_CENTERS, + GERMAN_TSO_ZONE_CITIES, + WeatherLocation, + locations_for_zone, +) + +# --------------------------------------------------------------------------- +# 1. Mapping / partition tests +# --------------------------------------------------------------------------- + + +class TestZoneCitiesMapping: + """Verify GERMAN_TSO_ZONE_CITIES covers exactly the four expected zone keys.""" + + EXPECTED_ZONES = { + "load_50hertz", + "load_amprion", + "load_tennet", + "load_transnetbw", + } + + def test_exactly_four_zone_keys(self): + assert set(GERMAN_TSO_ZONE_CITIES.keys()) == self.EXPECTED_ZONES + + def test_no_overlap_between_zones(self): + all_cities: list[str] = [] + for cities in GERMAN_TSO_ZONE_CITIES.values(): + all_cities.extend(cities) + assert len(all_cities) == len( + set(all_cities) + ), "A city appears in more than one TSO zone." + + def test_no_city_used_twice(self): + seen: set[str] = set() + for cities in GERMAN_TSO_ZONE_CITIES.values(): + for city in cities: + assert city not in seen, f"City {city!r} appears in multiple zones." + seen.add(city) + + def test_hamburg_in_50hertz(self): + assert "Hamburg" in GERMAN_TSO_ZONE_CITIES["load_50hertz"] + + def test_bremen_in_tennet(self): + assert "Bremen" in GERMAN_TSO_ZONE_CITIES["load_tennet"] + + def test_all_cities_in_registry(self): + registry_names = {loc.name for loc in GERMAN_LOAD_CENTERS} + for zone, cities in GERMAN_TSO_ZONE_CITIES.items(): + for city in cities: + assert ( + city in registry_names + ), f"City {city!r} (zone {zone!r}) not in GERMAN_LOAD_CENTERS." + + def test_all_15_cities_assigned_to_exactly_one_zone(self): + """Every city in the 15-city registry is assigned to exactly one zone.""" + all_zone_cities = [ + c for cities in GERMAN_TSO_ZONE_CITIES.values() for c in cities + ] + registry_names = {loc.name for loc in GERMAN_LOAD_CENTERS} + # Full cover: all 15 registry cities are partitioned (Dortmund ∈ amprion). + assert set(all_zone_cities) == registry_names + assert len(all_zone_cities) == len(registry_names) == 15 + + +class TestLocationsForZone: + """Tests for the locations_for_zone resolver.""" + + def test_50hertz_returns_four_locations(self): + locs = locations_for_zone("load_50hertz") + assert len(locs) == 4 + assert all(isinstance(loc, WeatherLocation) for loc in locs) + + def test_registry_weights_preserved(self): + locs = locations_for_zone("load_50hertz") + registry_index = {loc.name: loc for loc in GERMAN_LOAD_CENTERS} + for loc in locs: + assert loc.weight == registry_index[loc.name].weight + + def test_unknown_zone_raises_value_error(self): + with pytest.raises(ValueError, match="Unknown TSO zone"): + locations_for_zone("nonsense") + + def test_unknown_zone_error_lists_valid_zones(self): + with pytest.raises(ValueError) as exc_info: + locations_for_zone("load_unknown") + msg = str(exc_info.value) + for zone in GERMAN_TSO_ZONE_CITIES: + assert zone in msg + + def test_override_dict_is_honoured(self): + custom = [WeatherLocation("Custom", 50.0, 8.0, 100.0)] + result = locations_for_zone("load_50hertz", override={"load_50hertz": custom}) + assert result == (WeatherLocation("Custom", 50.0, 8.0, 100.0),) + + def test_override_dict_missing_key_falls_back_to_registry(self): + # Override only covers one zone; the other should still resolve. + custom = [WeatherLocation("Custom", 50.0, 8.0, 100.0)] + locs = locations_for_zone("load_amprion", override={"load_50hertz": custom}) + names = {loc.name for loc in locs} + assert "Köln" in names # expected registry city for amprion + + def test_deterministic_order_across_calls(self): + locs1 = locations_for_zone("load_tennet") + locs2 = locations_for_zone("load_tennet") + assert locs1 == locs2 + + def test_all_four_zones_resolve(self): + for zone in GERMAN_TSO_ZONE_CITIES: + locs = locations_for_zone(zone) + assert len(locs) >= 1 + + +# --------------------------------------------------------------------------- +# 2. Config validation guards +# --------------------------------------------------------------------------- + + +class TestConfigPerZoneWeatherGuards: + """Verify validate_config raises on incompatible per_zone_weather settings.""" + + def test_mutually_exclusive_with_population_weighted(self): + with pytest.raises(ValueError, match="mutually exclusive"): + ConfigMulti( + per_zone_weather=True, + use_population_weighted_weather=True, + use_exogenous_features=True, + ) + + def test_requires_exogenous_features(self): + with pytest.raises(ValueError, match="requires exogenous features"): + ConfigMulti( + per_zone_weather=True, + use_exogenous_features=False, + ) + + def test_incompatible_with_poly_degree_2(self): + with pytest.raises(ValueError, match="polynomial interaction"): + ConfigMulti( + per_zone_weather=True, + use_exogenous_features=True, + poly_features_degree=2, + ) + + def test_zone_weather_locations_non_dict_raises_type_error(self): + with pytest.raises(TypeError, match="zone_weather_locations must be a dict"): + ConfigMulti( + per_zone_weather=True, + use_exogenous_features=True, + zone_weather_locations=123, # type: ignore[arg-type] + ) + + def test_valid_construction_succeeds(self): + cfg = ConfigMulti( + per_zone_weather=True, + use_exogenous_features=True, + ) + assert cfg.per_zone_weather is True + assert cfg.zone_weather_locations is None + + def test_valid_construction_with_override_dict(self): + custom = [WeatherLocation("Custom", 50.0, 8.0, 100.0)] + cfg = ConfigMulti( + per_zone_weather=True, + use_exogenous_features=True, + zone_weather_locations={"load_50hertz": custom}, + ) + assert isinstance(cfg.zone_weather_locations, dict) + + def test_default_per_zone_weather_is_false(self): + cfg = ConfigMulti() + assert cfg.per_zone_weather is False + assert cfg.zone_weather_locations is None + + +# --------------------------------------------------------------------------- +# 3. Seam unit test: get_target_data with zone_weather overwrite +# --------------------------------------------------------------------------- + + +def _make_index(n: int, start: str = "2024-01-01") -> pd.DatetimeIndex: + return pd.date_range(start, periods=n, freq="h", tz="UTC") + + +class TestGetTargetDataZoneWeather: + """Verify that zone_weather overwrites shared weather columns in-place.""" + + def setup_method(self): + rng = np.random.default_rng(42) + self.n_train = 168 + self.n_future = 24 + self.idx_train = _make_index(self.n_train) + self.idx_future = _make_index(self.n_future, start="2024-01-08") + + # Shared weather values (shared baseline) + shared_temp = rng.uniform(5.0, 20.0, self.n_train).astype("float32") + hour_sin = np.sin(2 * np.pi * self.idx_train.hour / 24).astype("float32") + + self.df_pipeline = pd.DataFrame( + {"load": rng.normal(100, 10, self.n_train)}, index=self.idx_train + ) + + self.data_with_exog = pd.DataFrame( + { + "load": self.df_pipeline["load"], + "temperature_2m": shared_temp, + "hour_sin": hour_sin, + }, + index=self.idx_train, + ) + + shared_temp_future = rng.uniform(5.0, 20.0, self.n_future).astype("float32") + hour_sin_future = np.sin(2 * np.pi * self.idx_future.hour / 24).astype( + "float32" + ) + self.exo_pred = pd.DataFrame( + { + "temperature_2m": shared_temp_future, + "hour_sin": hour_sin_future, + }, + index=self.idx_future, + ) + + # Per-zone weather: different temperature values + zone_temp_train = rng.uniform(0.0, 5.0, self.n_train).astype("float32") + zone_temp_future = rng.uniform(0.0, 5.0, self.n_future).astype("float32") + # Deliberately different from the shared values above + self.zone_weather_train = pd.DataFrame( + {"temperature_2m": zone_temp_train}, index=self.idx_train + ) + self.zone_weather_future = pd.DataFrame( + {"temperature_2m": zone_temp_future}, index=self.idx_future + ) + # Combined zone_weather covering both train and future + self.zone_weather = pd.concat( + [self.zone_weather_train, self.zone_weather_future] + ) + + self.config = ConfigMulti( + targets=["load"], + use_exogenous_features=True, + ) + self.start_ts = self.idx_train[0] + self.end_ts = self.idx_train[-1] + + def test_zone_temperature_overwrites_shared(self): + y_train, exog_train, exog_future = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=["temperature_2m", "hour_sin"], + exo_pred=self.exo_pred, + zone_weather=self.zone_weather, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + + expected_train = ( + self.zone_weather["temperature_2m"] + .reindex(self.idx_train) + .astype("float32") + ) + expected_future = ( + self.zone_weather["temperature_2m"] + .reindex(self.idx_future) + .astype("float32") + ) + + pd.testing.assert_series_equal( + exog_train["temperature_2m"].rename(None), + expected_train.rename(None), + check_names=False, + ) + pd.testing.assert_series_equal( + exog_future["temperature_2m"].rename(None), + expected_future.rename(None), + check_names=False, + ) + + def test_non_weather_column_unchanged(self): + _, exog_train, exog_future = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=["temperature_2m", "hour_sin"], + exo_pred=self.exo_pred, + zone_weather=self.zone_weather, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + + expected_sin = ( + self.data_with_exog["hour_sin"].reindex(self.idx_train).astype("float32") + ) + pd.testing.assert_series_equal( + exog_train["hour_sin"].rename(None), + expected_sin.rename(None), + check_names=False, + ) + + def test_column_not_in_zone_weather_is_left_as_is(self): + """A column in exog_feature_names but NOT in zone_weather stays unchanged.""" + zone_weather_partial = self.zone_weather[[]].copy() # empty columns + _, exog_train, _ = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=["temperature_2m", "hour_sin"], + exo_pred=self.exo_pred, + zone_weather=zone_weather_partial, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + expected_temp = ( + self.data_with_exog["temperature_2m"] + .reindex(self.idx_train) + .astype("float32") + ) + pd.testing.assert_series_equal( + exog_train["temperature_2m"].rename(None), + expected_temp.rename(None), + check_names=False, + ) + + def test_shape_and_columns_unchanged(self): + feature_names = ["temperature_2m", "hour_sin"] + _, exog_train, exog_future = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=feature_names, + exo_pred=self.exo_pred, + zone_weather=self.zone_weather, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + assert list(exog_train.columns) == feature_names + assert list(exog_future.columns) == feature_names + assert exog_train.shape == (self.n_train, 2) + assert exog_future.shape == (self.n_future, 2) + + def test_dtype_float32(self): + _, exog_train, exog_future = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=["temperature_2m", "hour_sin"], + exo_pred=self.exo_pred, + zone_weather=self.zone_weather, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + assert exog_train["temperature_2m"].dtype == np.float32 + assert exog_future["temperature_2m"].dtype == np.float32 + + def test_zone_weather_none_uses_shared(self): + """Without zone_weather, the shared values are returned unchanged.""" + _, exog_train_z, _ = get_target_data( + target="load", + df_pipeline=self.df_pipeline, + config=self.config, + data_with_exog=self.data_with_exog, + exog_feature_names=["temperature_2m", "hour_sin"], + exo_pred=self.exo_pred, + zone_weather=None, + start_train_ts=self.start_ts, + end_train_ts=self.end_ts, + ) + expected_temp = ( + self.data_with_exog["temperature_2m"] + .reindex(self.idx_train) + .astype("float32") + ) + pd.testing.assert_series_equal( + exog_train_z["temperature_2m"].rename(None), + expected_temp.rename(None), + check_names=False, + ) + + +# --------------------------------------------------------------------------- +# 4. Determinism +# --------------------------------------------------------------------------- + + +class TestLocationsForZoneDeterminism: + def test_two_calls_identical(self): + locs_a = locations_for_zone("load_transnetbw") + locs_b = locations_for_zone("load_transnetbw") + assert locs_a == locs_b + + def test_all_zones_deterministic(self): + for zone in GERMAN_TSO_ZONE_CITIES: + assert locations_for_zone(zone) == locations_for_zone(zone) + + +# --------------------------------------------------------------------------- +# 5. Regression-when-off: zone_weather_aligned empty without exog features +# --------------------------------------------------------------------------- + + +class TestRegressionPerZoneOff: + """Without per_zone_weather, zone_weather_aligned stays empty.""" + + def test_zone_weather_aligned_empty_without_exog(self): + with tempfile.TemporaryDirectory() as tmp: + cfg = ConfigMulti( + targets=["load"], + use_exogenous_features=False, + cache_home=tmp, + ) + from spotforecast2_safe.multitask import LazyTask + + rng = np.random.default_rng(0) + idx = pd.date_range("2024-01-01", periods=200, freq="h", tz="UTC") + df = pd.DataFrame({"load": rng.normal(100, 5, 200)}, index=idx) + task = LazyTask(cfg, dataframe=df) + task.prepare_data().detect_outliers().impute().build_exogenous_features() + assert task.zone_weather_aligned == {} + + def test_per_zone_weather_false_default(self): + cfg = ConfigMulti(use_exogenous_features=False) + assert cfg.per_zone_weather is False + + +# --------------------------------------------------------------------------- +# 6. Pipeline integration: per-zone frames cover the forecast horizon and +# distinct zone weather reaches each target's exog_future (network-free). +# This guards the regression where the per-zone frame was truncated to +# df_pipeline.index (historical only), NaN-ing out the future weather. +# --------------------------------------------------------------------------- + + +def _zone_load_df(periods: int = 24 * 21) -> pd.DataFrame: + """Two-zone hourly load frame whose columns are valid TSO zone keys.""" + rng = np.random.default_rng(7) + idx = pd.date_range("2023-01-01", periods=periods, freq="h", tz="UTC") + idx.name = "DateTime" + return pd.DataFrame( + { + "load_50hertz": rng.normal(11000, 300, periods), + "load_tennet": rng.normal(25000, 600, periods), + }, + index=idx, + ) + + +def _fake_weather_factory(): + """Return a get_weather_features stand-in: zone-distinct, time-varying temp. + + Indexed over the requested ``[start, cov_end]`` (which spans the forecast + horizon), exactly like the real fetch. The temperature offset is derived + from the mean latitude of ``locations`` so each zone gets a different + series; the global baseline call (``locations is None``) gets its own. + """ + + def fake_weather(*, data, start, cov_end, locations=None, **kwargs): + idx = pd.date_range(start, cov_end, freq="h") + if locations is None: + offset = 50.0 # global baseline sentinel + else: + offset = float(sum(lat for lat, _ in locations) / len(locations)) + temp = offset + np.sin(2 * np.pi * idx.hour / 24) + weather_aligned = pd.DataFrame({"temperature_2m": temp}, index=idx) + # Real WindowFeatures returns the raw columns PLUS window columns; with + # weather windows off only the raw column is selected, so the raw column + # alone reproduces the relevant shape (weather_features == raw here). + weather_features = weather_aligned.copy() + return weather_features, weather_aligned + + return fake_weather + + +class TestPerZonePipelineIntegration: + """build_exogenous_features wires distinct per-zone weather over the horizon.""" + + def _build_task(self, monkeypatch, tmp_path): + import spotforecast2_safe.multitask.base as mt_base + from spotforecast2_safe.multitask import LazyTask + + monkeypatch.setattr(mt_base, "get_weather_features", _fake_weather_factory()) + cfg = ConfigMulti( + targets=["load_50hertz", "load_tennet"], + predict_size=6, + use_exogenous_features=True, + per_zone_weather=True, + on_weather_failure="raise", + use_outlier_detection=False, + auto_save_models=False, + number_folds=2, + verbose=False, + cache_home=str(tmp_path), + ) + task = LazyTask(cfg, dataframe=_zone_load_df()) + task.prepare_data().detect_outliers().impute().build_exogenous_features() + return task + + def test_zone_frames_cover_forecast_horizon(self, monkeypatch, tmp_path): + task = self._build_task(monkeypatch, tmp_path) + assert set(task.zone_weather_aligned) == {"load_50hertz", "load_tennet"} + cov_end = task.run_state.cov_end + data_end = task.run_state.data_end + assert cov_end > data_end # horizon really extends past history + for zone, frame in task.zone_weather_aligned.items(): + # The frame must span the horizon, not just the training history. + assert frame.index.max() == cov_end, ( + f"{zone}: per-zone weather truncated to {frame.index.max()} " + f"(history end {data_end}); must reach cov_end {cov_end}." + ) + + def test_future_exog_is_per_zone_and_not_nan(self, monkeypatch, tmp_path): + task = self._build_task(monkeypatch, tmp_path) + assert "temperature_2m" in task.exog_feature_names + future_means = {} + for zone in ("load_50hertz", "load_tennet"): + _, _, exog_future = task._get_target_data(zone) + temp = exog_future["temperature_2m"] + assert temp.notna().all(), f"{zone}: future weather has NaN" + future_means[zone] = float(temp.mean()) + # Distinct zones -> distinct regional temperature (different offsets). + assert abs(future_means["load_50hertz"] - future_means["load_tennet"]) > 0.5 + + def test_skip_partial_zone_failure_degrades_to_no_weather( + self, monkeypatch, tmp_path + ): + """One zone failing under 'skip' degrades the WHOLE pipeline to no-weather.""" + import spotforecast2_safe.multitask.base as mt_base + from spotforecast2_safe.multitask import LazyTask + + base_fake = _fake_weather_factory() + + def failing_for_tennet(*, data, start, cov_end, locations=None, **kwargs): + # load_tennet includes München (lat 48.1351); fail just that zone. + if locations is not None and any( + abs(lat - 48.1351) < 0.01 for lat, _ in locations + ): + raise WeatherFetchError("simulated Open-Meteo outage for tennet") + return base_fake( + data=data, start=start, cov_end=cov_end, locations=locations, **kwargs + ) + + monkeypatch.setattr(mt_base, "get_weather_features", failing_for_tennet) + cfg = ConfigMulti( + targets=["load_50hertz", "load_tennet"], + predict_size=6, + use_exogenous_features=True, + per_zone_weather=True, + on_weather_failure="skip", + use_outlier_detection=False, + auto_save_models=False, + number_folds=2, + verbose=False, + cache_home=str(tmp_path), + ) + task = LazyTask(cfg, dataframe=_zone_load_df()) + task.prepare_data().detect_outliers().impute().build_exogenous_features() + # Fail-safe degradation: no per-zone frames, no weather columns selected. + assert task.zone_weather_aligned == {} + assert "temperature_2m" not in task.exog_feature_names + + def test_non_zone_target_raises_value_error(self, monkeypatch, tmp_path): + """per_zone_weather=True with a target that is not a TSO zone must raise.""" + import spotforecast2_safe.multitask.base as mt_base + from spotforecast2_safe.multitask import LazyTask + + monkeypatch.setattr(mt_base, "get_weather_features", _fake_weather_factory()) + rng = np.random.default_rng(3) + idx = pd.date_range("2023-01-01", periods=24 * 14, freq="h", tz="UTC") + idx.name = "DateTime" + df = pd.DataFrame({"load": rng.normal(100, 5, len(idx))}, index=idx) + cfg = ConfigMulti( + targets=["load"], # not a GERMAN_TSO_ZONE_CITIES key + predict_size=6, + use_exogenous_features=True, + per_zone_weather=True, + use_outlier_detection=False, + auto_save_models=False, + number_folds=2, + verbose=False, + cache_home=str(tmp_path), + ) + task = LazyTask(cfg, dataframe=df) + with pytest.raises(ValueError, match="Unknown TSO zone"): + task.prepare_data().detect_outliers().impute().build_exogenous_features() diff --git a/tests/test_weather_locations.py b/tests/test_weather_locations.py index 9fbb7967..5572790a 100644 --- a/tests/test_weather_locations.py +++ b/tests/test_weather_locations.py @@ -17,7 +17,7 @@ class TestRegistry: def test_default_returns_registry(self): assert default_german_locations() is GERMAN_LOAD_CENTERS - assert len(GERMAN_LOAD_CENTERS) == 13 + assert len(GERMAN_LOAD_CENTERS) == 15 def test_all_weights_positive(self): assert all(loc.weight > 0 for loc in GERMAN_LOAD_CENTERS) From c9bd8025c3e61447c80cc0865ea4a803bc34fb2c Mon Sep 17 00:00:00 2001 From: semantic-release-bot Date: Sat, 13 Jun 2026 16:59:01 +0000 Subject: [PATCH 2/2] chore(release): 22.4.0-rc.1 [skip ci] ## [22.4.0-rc.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v22.3.0...v22.4.0-rc.1) (2026-06-13) ### Features * **weather:** per-zone weather for the ENTSO-E four-zone pipeline ([260a084](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/260a08432177181f8cec78a7d1a679379888a16f)) ### Documentation * **tasks:** drop stale n-to-1 task wrapper references from task_multi ([00e9dab](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/00e9dabfeba10cc975397aef6aef800630f47cb8)) --- CHANGELOG.md | 12 ++++++++++++ MODEL_CARD.md | 8 ++++---- pyproject.toml | 2 +- 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/CHANGELOG.md b/CHANGELOG.md index 4d00d6ba..13fcef1a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,15 @@ +## [22.4.0-rc.1](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v22.3.0...v22.4.0-rc.1) (2026-06-13) + + +### Features + +* **weather:** per-zone weather for the ENTSO-E four-zone pipeline ([260a084](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/260a08432177181f8cec78a7d1a679379888a16f)) + + +### Documentation + +* **tasks:** drop stale n-to-1 task wrapper references from task_multi ([00e9dab](https://github.com/sequential-parameter-optimization/spotforecast2-safe/commit/00e9dabfeba10cc975397aef6aef800630f47cb8)) + ## [22.3.0](https://github.com/sequential-parameter-optimization/spotforecast2-safe/compare/v22.2.0...v22.3.0) (2026-06-12) diff --git a/MODEL_CARD.md b/MODEL_CARD.md index 167c1c67..c61117de 100644 --- a/MODEL_CARD.md +++ b/MODEL_CARD.md @@ -7,7 +7,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit | Field | Value | | --- | --- | | Name | spotforecast2-safe | -| Version | 22.3.0 | +| Version | 22.4.0-rc.1 | | Type | Deterministic Python library for time series feature engineering and recursive multi-step forecasting. It performs no training of its own. | | Developed by | Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.org/0000-0002-5938-5158) | | Distributed by | the `sequential-parameter-optimization` GitHub organization | @@ -18,7 +18,7 @@ This card describes what spotforecast2-safe is, how to use it safely, the condit The library depends only on numpy, pandas, scikit-learn, lightgbm, numba, pyarrow, requests, feature-engine, holidays, astral, and tqdm. It deliberately excludes plotly, matplotlib, spotoptim, optuna, torch, and tensorflow, so no plotting or automated-tuning code ships in this package. -Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:22.3.0:*:*:*:*:*:*:*`. +Two Common Platform Enumeration (CPE) identifiers let vulnerability-tracking and software bill of materials (SBOM) tools recognize the package. The wildcard identifier `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:*:*:*:*:*:*:*:*` matches any release; the current release is `cpe:2.3:a:sequential_parameter_optimization:spotforecast2_safe:22.4.0-rc.1:*:*:*:*:*:*:*`. The library itself is a low-risk component: it is deterministic, its source is fully inspectable, and it fails safe on invalid input. It is built to support high-risk AI systems in the sense of the EU AI Act, but it is not itself such a system. When it is embedded in a high-risk deployment, the duties that attach to that system fall on the integrator, not on the library. @@ -30,7 +30,7 @@ Responsibilities are divided as follows. | Distribution | sequential-parameter-optimization on GitHub | repository issue tracker | | Deployment, operation, and audit | the system integrator | defined per deployment | -The current release is 22.3.0, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here. +The current release is 22.4.0-rc.1, with a stable public interface pinned in `spotforecast2_safe.__init__.__all__`. The full version history, including release dates, is recorded in `CHANGELOG.md` and on the GitHub Releases page; it is maintained automatically by the release pipeline and is not repeated here. ## 2. Intended Use and Scope @@ -216,7 +216,7 @@ Maintainer: Thomas Bartz-Beielstein, ORCID [0000-0002-5938-5158](https://orcid.o } ``` -Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 22.3.0) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe +Or as a formatted reference: Bartz-Beielstein, T. (2026). *spotforecast2-safe: Safety-critical subset of spotforecast2* (Version 22.4.0-rc.1) [Computer software]. https://github.com/sequential-parameter-optimization/spotforecast2-safe The technical report (`bart26h/index.qmd`) is the long-form reference for design rationale, compliance mapping, and evaluation protocol. diff --git a/pyproject.toml b/pyproject.toml index 05c4b5d3..5893429d 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -1,6 +1,6 @@ [project] name = "spotforecast2-safe" -version = "22.3.0" +version = "22.4.0-rc.1" description = "spotforecast2-safe (Core): Safety-critical time series forecasting for production" readme = "README.md" license = { text = "AGPL-3.0-or-later" }