diff --git a/MODEL_CARD.md b/MODEL_CARD.md index bb0af260..88e5d392 100644 --- a/MODEL_CARD.md +++ b/MODEL_CARD.md @@ -35,7 +35,7 @@ The current release is 3.3.0, tracking the stable public interface re-exported a ## 2. Intended Use and Scope -spotforecast2 supports exploratory and applied forecasting of hourly electricity load and spot prices on ENTSO-E data, end to end, through the `spotforecast2-entsoe` and `spotforecast-demo` console scripts. Its distinctive capabilities are hyperparameter search — Bayesian search over lags, window features, and regressor parameters with Optuna (`bayesian_search_forecaster`), or surrogate-model search with SpotOptim (`spotoptim_search_forecaster`) — interactive inspection through Plotly figures, and global SHAP feature importances via `shap.TreeExplainer`. The `MultiTask` dispatcher and the `run()` entry point orchestrate multi-target pipelines: per-target data preparation, outlier handling, imputation, feature engineering, tuning, and prediction. +spotforecast2 supports exploratory and applied forecasting of hourly electricity load and spot prices on ENTSO-E data, end to end, through the `spotforecast2-entsoe` console script. Its distinctive capabilities are hyperparameter search — Bayesian search over lags, window features, and regressor parameters with Optuna (`bayesian_search_forecaster`), or surrogate-model search with SpotOptim (`spotoptim_search_forecaster`) — interactive inspection through Plotly figures, and global SHAP feature importances via `shap.TreeExplainer`. The `MultiTask` dispatcher and the `run()` entry point orchestrate multi-target pipelines: per-target data preparation, outlier handling, imputation, feature engineering, tuning, and prediction. The intended downstream use is development and model selection: choosing lag windows and regressor hyperparameters here, then promoting the validated configuration into a spotforecast2-safe deployment for the deterministic inference path. Tuned `ForecasterRecursiveLGBMFull` and `ForecasterRecursiveXGBFull` models, or just their best parameters, also feed research notebooks and reporting. @@ -60,15 +60,12 @@ importances = model.get_global_shap_feature_importance(frac=0.1) print(importances.head()) ``` -Run a complete multi-target pipeline programmatically via `run()`, or use the bundled console scripts: +Run a complete multi-target pipeline programmatically via `run()`, or use the bundled console script: ```bash spotforecast2-entsoe # ENTSO-E download / train / predict (needs ENTSOE_API_KEY) -spotforecast-demo # baseline vs. covariate vs. custom-LightGBM comparison ``` -Additional N-to-1 pipeline variants are registered as `spotforecast-n2o1`, `spotforecast-n2o1-df`, `spotforecast-n2o1-cov`, and `spotforecast-n2o1-cov-df`. - ## 4. Technical Specification ### Task and model family diff --git a/_quarto.yml b/_quarto.yml index 621cfd48..69b46340 100644 --- a/_quarto.yml +++ b/_quarto.yml @@ -132,18 +132,8 @@ website: file: docs/reference/stats.autocorrelation.qmd - section: "Tasks" contents: - - text: "task_demo" - file: docs/reference/tasks.task_demo.qmd - text: "task_entsoe" file: docs/reference/tasks.task_entsoe.qmd - - text: "task_n_to_1" - file: docs/reference/tasks.task_n_to_1.qmd - - text: "task_n_to_1_dataframe" - file: docs/reference/tasks.task_n_to_1_dataframe.qmd - - text: "task_n_to_1_with_covariates" - file: docs/reference/tasks.task_n_to_1_with_covariates.qmd - - text: "task_n_to_1_with_covariates_and_dataframe" - file: docs/reference/tasks.task_n_to_1_with_covariates_and_dataframe.qmd - section: "Processing Guides" contents: @@ -166,8 +156,6 @@ website: - section: "Tasks Guide" contents: - - text: "Overview" - file: docs/tasks/tasks.qmd - text: "Multitask Tutorial" file: docs/tasks/multitask.qmd - text: "ENTSO-E CLI" @@ -264,14 +252,9 @@ quartodoc: - stats.autocorrelation - title: "Tasks" - desc: "Demonstration and predefined forecasting tasks." + desc: "ENTSO-E forecasting task and CLI." contents: - - tasks.task_demo - tasks.task_entsoe - - tasks.task_n_to_1 - - tasks.task_n_to_1_dataframe - - tasks.task_n_to_1_with_covariates - - tasks.task_n_to_1_with_covariates_and_dataframe - title: "Warnings" desc: "Warning-style configuration for spotforecast2." diff --git a/docs/tasks/entsoe.qmd b/docs/tasks/entsoe.qmd index 8de13e38..3e90372a 100644 --- a/docs/tasks/entsoe.qmd +++ b/docs/tasks/entsoe.qmd @@ -470,6 +470,5 @@ uv run pytest tests/test_tasks_entsoe.py -v ## See Also -- [Tasks Overview](../tasks/tasks.qmd) - All available CLI commands - [API Reference](../reference/index.qmd#preprocessing) - Detailed API documentation - [Model Persistence](../processing/model_persistence.qmd) - Saving and loading models diff --git a/docs/tasks/tasks.qmd b/docs/tasks/tasks.qmd deleted file mode 100644 index 7f076cad..00000000 --- a/docs/tasks/tasks.qmd +++ /dev/null @@ -1,141 +0,0 @@ -# Task Scripts - -`spotforecast2` provides command-line task scripts for common forecasting workflows. These scripts are registered as console entry points and can be invoked directly via `uv run` or after package installation. - -## Available Taks - -| Command | Description | -|---------|-------------| -| `spotforecast-demo` | Demonstration task comparing baseline, covariate, and custom models | -| `spotforecast-n2o1` | N-to-1 forecasting with weighted aggregation | -| `spotforecast-n2o1-df` | N-to-1 forecasting using a DataFrame input | -| `spotforecast-n2o1-cov` | N-to-1 forecasting with exogenous covariates | -| `spotforecast-n2o1-cov-df` | N-to-1 forecasting with covariates and DataFrame input | -| `spotforecast2-entsoe` | ENTSO-E energy forecasting pipeline (download, train, predict) | - -## Demo Task - -The `spotforecast-demo` command runs a comparison of three forecasting approaches: - -1. **Baseline**: Standard N-to-1 recursive forecaster -2. **Covariate-enhanced**: Includes weather, holidays, and cyclical features -3. **Custom LightGBM**: Optimized hyperparameters - -```bash -# Run with default settings -uv run spotforecast-demo - -# Force retraining and save plot -uv run spotforecast-demo --force_train true --html task_demo_plot.html -``` - ---- - -## N-to-1 Forecasting Tasks - -These tasks implement multi-output time series forecasting with weighted aggregation. - -### Basic N-to-1 - -```bash -uv run spotforecast-n2o1 -``` - -### N-to-1 with DataFrame Input - -```bash -uv run spotforecast-n2o1-df -``` - -### N-to-1 with Covariates - -Includes weather data, holiday indicators, and cyclical time features. - -```bash -uv run spotforecast-n2o1-cov -``` - -### N-to-1 with Covariates and DataFrame - -```bash -uv run spotforecast-n2o1-cov-df -``` - ---- - -## Configuration - -All tasks use sensible defaults but can be customized via: - -- **Environment variables** (e.g., `ENTSOE_API_KEY`) -- **Command-line arguments** (use `--help` for details) -- **Configuration files** stored in `~/spotforecast2_models/` - -```bash -# View available options for any command -uv run spotforecast-demo --help -uv run spotforecast2-entsoe predict --help -``` - ---- - -## Model Persistence - -Trained models are saved to `~/spotforecast2_models//` by default. This allows: - -- **Incremental retraining**: Only retrain when models are stale -- **Reproducibility**: Models are versioned by task and timestamp -- **Auditability**: Full training logs are stored alongside models - -!!! warning "Safety-Critical Consideration" - In production environments, always verify model checksums and training timestamps before deployment. - ---- - -## Testing - -The task scripts are covered by comprehensive safety-critical tests to ensure reliability in production environments. - -### Running Tests - -Run all ENTSO-E task tests: - -```bash -uv run pytest tests/test_tasks_entsoe.py -v -``` - -Run specific test categories: - -```bash -# Run only safety-critical tests -uv run pytest tests/test_tasks_entsoe.py::TestSafetyCriticalEntsoe -v - -# Run parameter validation tests -uv run pytest tests/test_tasks_entsoe.py::TestSafetyCriticalEntsoe::test_train_lgbm_model_parameter_correctness -v - -# Run with coverage -uv run pytest tests/test_tasks_entsoe.py --cov=spotforecast2.tasks.task_entsoe --cov-report=html -``` - -### Test Categories - -The test suite includes: - -- **Parameter Validation**: Ensures correct parameter passing between CLI and internal functions -- **Error Handling**: Validates graceful degradation and meaningful error messages -- **Data Validation**: Tests boundary conditions and edge cases -- **Integration Tests**: Verifies end-to-end functionality -- **Regression Tests**: Protects against known historical bugs -- **Model Selection Safety**: Prevents model mismatch in production pipelines - -!!! tip "Continuous Testing" - Run tests before deployment in production environments: - ```bash - uv run pytest tests/ -v --tb=short - ``` - ---- - -## ENTSO-E Task - -The `spotforecast2-entsoe` command provides a unified CLI for the ENTSO-E energy forecasting pipeline. It is described in the separate document [ENTSO-E Task](entsoe.qmd). \ No newline at end of file diff --git a/pyproject.toml b/pyproject.toml index 6c4f4c1c..9056b82f 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -75,11 +75,6 @@ dev = [ [project.scripts] spotforecast2-entsoe = "spotforecast2.tasks.task_entsoe:main" -spotforecast-demo = "spotforecast2.tasks.task_demo:main" -spotforecast-n2o1 = "spotforecast2.tasks.task_n_to_1:main" -spotforecast-n2o1-df = "spotforecast2.tasks.task_n_to_1_dataframe:main" -spotforecast-n2o1-cov = "spotforecast2.tasks.task_n_to_1_with_covariates:main" -spotforecast-n2o1-cov-df = "spotforecast2.tasks.task_n_to_1_with_covariates_and_dataframe:main" [tool.uv] # Accept pre-releases ONLY for dependencies whose specifier carries an explicit diff --git a/src/spotforecast2/multitask/base.py b/src/spotforecast2/multitask/base.py index 6f8dabd3..b57e8ee0 100644 --- a/src/spotforecast2/multitask/base.py +++ b/src/spotforecast2/multitask/base.py @@ -21,6 +21,8 @@ from spotforecast2_safe.multitask.base import ( # noqa: F401 (re-exported) BaseTask as SafeBaseTask, +) +from spotforecast2_safe.multitask.base import ( PipelineConfig, agg_predictor, ) diff --git a/src/spotforecast2/tasks/task_demo.py b/src/spotforecast2/tasks/task_demo.py deleted file mode 100644 index 2dc21cdb..00000000 --- a/src/spotforecast2/tasks/task_demo.py +++ /dev/null @@ -1,247 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -""" -Task demo: compare baseline, covariate, and custom LightGBM forecasts against ground truth. - -This script executes the baseline N-to-1 task, the covariate-enhanced N-to-1 -pipeline, and a custom LightGBM model with optimized hyperparameters, then loads -the ground truth from ~/spotforecast2_data/data_test.csv using the safety-critical -load_actual_combined function from spotforecast2_safe, and plots Actual vs -Predicted using Plotly. - -The plot includes: - - Actual combined values (ground truth) - - Baseline combined prediction (n2n_predict) - - Covariate combined prediction (n2n_predict_with_covariates, default LGBM) - - Custom LightGBM combined prediction (optimized hyperparameters, Europe/Berlin tz) - -Safety-Critical Features: - - Uses load_actual_combined from spotforecast2_safe for validated data loading - - ConfigDemo provides immutable configuration with sensible defaults - - Path objects ensure cross-platform compatibility - - Comprehensive error handling with file existence checks - -Examples: - ```{python} - #| eval: false - # requires ~/spotforecast2_data/data_test.csv and a pre-trained model cache - # Run the demo (baseline, covariates, and custom LightGBM): - # python tasks/task_demo.py - # - # Force training (case-insensitive boolean): - # python tasks/task_demo.py --force_train false - # - # Save the plot as a single HTML file (default: task_demo_plot.html): - # python tasks/task_demo.py --html - # - # Save to a specific path: - # python tasks/task_demo.py --html results/plot.html - ``` -""" - -from __future__ import annotations - -import argparse -import warnings -from pathlib import Path -from typing import Optional - -from lightgbm import LGBMRegressor -from spotforecast2_safe.configurator import ConfigDemo -from spotforecast2_safe.data import load_actual_combined -from spotforecast2_safe.processing.agg_predict import agg_predict -from spotforecast2_safe.processing.n2n_predict import n2n_predict -from spotforecast2_safe.processing.n2n_predict_with_covariates import ( - n2n_predict_with_covariates, -) -from spotforecast2_safe.utils.parse import parse_bool - -from spotforecast2.plots.plotter import plot_actual_vs_predicted - -warnings.simplefilter("ignore") - - -def main( - force_train: bool = True, - html_path: Optional[str] = None, -) -> None: - """Run the demo, compute predictions for three models, and plot actual vs predicted. - - Args: - force_train (bool, optional): When True, retrain all models even if a cached - version already exists. Defaults to True. - html_path (str, optional): File path at which to save the interactive plot as a - self-contained HTML file. When None the plot is rendered inline / shown - interactively. Defaults to None. - - Examples: - ```{python} - #| eval: false - # requires ~/spotforecast2_data/data_test.csv and spotforecast2_models cache - from spotforecast2.tasks.task_demo import main - - # Run with force_train=False to reuse cached models when available. - main(force_train=False, html_path=None) - ``` - """ - DATA_PATH = "~/spotforecast2_data/data_test.csv" - FORECAST_HORIZON = 24 - CONTAMINATION = 0.01 - WINDOW_SIZE = 72 - LAGS = 24 - TRAIN_RATIO = 0.8 - VERBOSE = True - SHOW_PROGRESS = True - FORCE_TRAIN = force_train - - WEIGHTS = [ - 1.0, - 1.0, - -1.0, - -1.0, - 1.0, - -1.0, - 1.0, - 1.0, - 1.0, - -1.0, - 1.0, - ] - - print("--- Starting task_demo: baseline, covariates, and custom LightGBM ---") - - # --- Baseline predictions --- - baseline_predictions, _ = n2n_predict( - columns=None, - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - verbose=VERBOSE, - show_progress=SHOW_PROGRESS, - force_train=FORCE_TRAIN, - model_dir="~/spotforecast2_models/task_demo_baseline", - ) - - baseline_combined = agg_predict(baseline_predictions, weights=WEIGHTS) - - # --- Covariate-enhanced predictions --- - cov_predictions, _, _ = n2n_predict_with_covariates( - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - lags=LAGS, - train_ratio=TRAIN_RATIO, - verbose=VERBOSE, - show_progress=SHOW_PROGRESS, - force_train=FORCE_TRAIN, - model_dir="~/spotforecast2_models/task_demo_covariates", - ) - - covariates_combined = agg_predict(cov_predictions, weights=WEIGHTS) - - # --- Custom LightGBM predictions (optimized hyperparameters) --- - custom_lgbm = LGBMRegressor( - n_estimators=1059, - learning_rate=0.04191323446625026, - num_leaves=212, - min_child_samples=54, - subsample=0.5014650987802548, - colsample_bytree=0.6080926628683118, - random_state=42, - verbose=-1, - ) - custom_lgbm_predictions, _, _ = n2n_predict_with_covariates( - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - lags=LAGS, - train_ratio=TRAIN_RATIO, - timezone="UTC", - estimator=custom_lgbm, - verbose=VERBOSE, - show_progress=SHOW_PROGRESS, - force_train=FORCE_TRAIN, - model_dir="~/spotforecast2_models/task_demo_custom_lgbm", - ) - custom_lgbm_combined = agg_predict(custom_lgbm_predictions, weights=WEIGHTS) - - # --- Debug output --- - print("\n=== DEBUG INFO ===") - print(f"Baseline combined shape: {baseline_combined.shape}") - print( - f"Baseline index: {baseline_combined.index[0]} to {baseline_combined.index[-1]}" - ) - print(f"Baseline values (first 5): {baseline_combined.head().values}") - print(f"\nCovariates combined shape: {covariates_combined.shape}") - print( - f"Covariates index: {covariates_combined.index[0]} to {covariates_combined.index[-1]}" - ) - print(f"Covariates values (first 5): {covariates_combined.head().values}") - print(f"\nCustom LightGBM combined shape: {custom_lgbm_combined.shape}") - print( - f"Custom LightGBM index: {custom_lgbm_combined.index[0]} to {custom_lgbm_combined.index[-1]}" - ) - print(f"Custom LightGBM values (first 5): {custom_lgbm_combined.head().values}") - print( - f"\nAre indices aligned? {(baseline_combined.index == covariates_combined.index).all()}" - ) - print( - f"Baseline vs Covariates identical? {(baseline_combined.values == covariates_combined.values).all()}" - ) - print( - f"Baseline vs Custom LightGBM identical? {(baseline_combined.values == custom_lgbm_combined.values).all()}" - ) - print( - f"Covariates vs Custom LightGBM identical? {(covariates_combined.values == custom_lgbm_combined.values).all()}" - ) - if not (baseline_combined.values == covariates_combined.values).all(): - diff = baseline_combined - covariates_combined - print(f"Baseline - Covariates diff stats:\n{diff.describe()}") - if not (covariates_combined.values == custom_lgbm_combined.values).all(): - diff_lgbm = covariates_combined - custom_lgbm_combined - print(f"Covariates - Custom LightGBM diff stats:\n{diff_lgbm.describe()}") - print("==================\n") - - # --- Ground truth --- - columns = list(baseline_predictions.columns) - # Use load_actual_combined from spotforecast2_safe with minimal config - config = ConfigDemo(data_path=Path(DATA_PATH).expanduser()) - actual_combined = load_actual_combined( - config=config, - columns=columns, - forecast_horizon=FORECAST_HORIZON, - weights=WEIGHTS, - ) - - # Align indices to predictions for clean plotting - actual_combined = actual_combined.reindex(baseline_combined.index) - - # --- Plot --- - plot_actual_vs_predicted( - actual_combined=actual_combined, - baseline_combined=baseline_combined, - covariates_combined=covariates_combined, - custom_lgbm_combined=custom_lgbm_combined, - html_path=html_path, - ) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Run the spotforecast2 demo task.") - parser.add_argument( - "--force_train", - type=parse_bool, - default=True, - help="Force training (true/false, case-insensitive).", - ) - parser.add_argument( - "--html", - nargs="?", - const="task_demo_plot.html", - default=None, - metavar="PATH", - help="Save the plot as a single self-contained HTML file. Default path: task_demo_plot.html", - ) - args = parser.parse_args() - main(force_train=args.force_train, html_path=args.html) diff --git a/src/spotforecast2/tasks/task_n_to_1.py b/src/spotforecast2/tasks/task_n_to_1.py deleted file mode 100644 index d1896455..00000000 --- a/src/spotforecast2/tasks/task_n_to_1.py +++ /dev/null @@ -1,93 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -import warnings - -from spotforecast2_safe.processing.agg_predict import agg_predict -from spotforecast2_safe.processing.n2n_predict import n2n_predict - -warnings.simplefilter("ignore") - - -def main(): - """Run the N-to-1 baseline forecasting pipeline with automatic data acquisition. - - Fetches time-series data from the default source (no explicit DataFrame - supplied), applies outlier detection, imputation, and equivalent-date - forecasting via `n2n_predict`, then aggregates the per-target predictions - into a single combined series via `agg_predict`. - - This function is the CLI entry point registered as - `spotforecast-n2o1` in `pyproject.toml`. It requires the target CSV file - to be present in the data home directory or a network connection to fetch - it automatically. - - Examples: - ```{python} - # Demonstrate the n2n_predict + agg_predict pipeline that main() wires - # together, using a small synthetic DataFrame instead of the live data - # source that main() fetches automatically. - import numpy as np - import pandas as pd - from spotforecast2_safe.processing.agg_predict import agg_predict - from spotforecast2_safe.processing.n2n_predict import n2n_predict - - rng = np.random.default_rng(0) - dates = pd.date_range("2020-01-01", periods=500, freq="h", tz="UTC") - data = pd.DataFrame( - rng.standard_normal((500, 2)), - index=dates, - columns=["solar", "wind"], - ) - - predictions, forecasters = n2n_predict( - data=data, - columns=["solar", "wind"], - forecast_horizon=3, - contamination=0.01, - window_size=24, - verbose=False, - show_progress=False, - ) - print("Predictions shape:", predictions.shape) - assert predictions.shape == (3, 2) - assert set(predictions.columns) == {"solar", "wind"} - - combined = agg_predict(predictions, weights=[1.0, -1.0]) - print("Combined prediction:", combined.tolist()) - assert len(combined) == 3 - ``` - """ - FORECAST_HORIZON = 24 - CONTAMINATION = 0.01 - WINDOW_SIZE = 72 - VERBOSE = True - SHOW_PROGRESS = True - WEIGHTS = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - print("--- Starting n_to_1_task using modular functions ---") - - # --- Prediction --- - # Fetch, Preprocess, Train, Evaluate, Predict - predictions, _ = n2n_predict( - columns=None, - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - verbose=VERBOSE, - show_progress=SHOW_PROGRESS, - ) - - print("\nMulti-output predictions head:") - print(predictions) - - # --- Aggregation --- - print("Calculating combined prediction...") - combined_prediction = agg_predict(predictions, weights=WEIGHTS) - - print("Combined Prediction:") - print(combined_prediction) - - -if __name__ == "__main__": - main() diff --git a/src/spotforecast2/tasks/task_n_to_1_dataframe.py b/src/spotforecast2/tasks/task_n_to_1_dataframe.py deleted file mode 100644 index b8f66d3a..00000000 --- a/src/spotforecast2/tasks/task_n_to_1_dataframe.py +++ /dev/null @@ -1,75 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -import warnings - -from spotforecast2_safe.data.fetch_data import fetch_data, get_data_home -from spotforecast2_safe.processing.agg_predict import agg_predict -from spotforecast2_safe.processing.n2n_predict import n2n_predict - -warnings.simplefilter("ignore") - - -def main() -> None: - """Execute the complete N-to-1 baseline forecasting pipeline with default parameters. - - This is the entry point when running the script directly. It fetches data - from the user's data home directory, runs the equivalent-date baseline - forecasting pipeline via `n2n_predict`, and aggregates the multi-output - predictions into a single combined series with `agg_predict`. - - The default configuration: - - Reads ``data_in.csv`` from `get_data_home()` (user-specific path) - - Forecasts 24 steps ahead - - Applies 1% contamination for outlier detection - - Uses a 72-step rolling window - - Aggregates with predefined weights - - Returns: - None. Results are printed to stdout. - - Examples: - ```{python} - #| eval: false - # main() reads data_in.csv from the user's data home directory; not reproducible without that file. - from spotforecast2.tasks.task_n_to_1_dataframe import main - - main() - ``` - """ - FORECAST_HORIZON = 24 - CONTAMINATION = 0.01 - WINDOW_SIZE = 72 - VERBOSE = True - SHOW_PROGRESS = True - WEIGHTS = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - df = fetch_data(filename=get_data_home() / "data_in.csv") - - print("--- Starting n_to_1_task using modular functions ---") - - # --- Prediction --- - # Fetch, Preprocess, Train, Evaluate, Predict - predictions, _ = n2n_predict( - data=df, - columns=None, - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - verbose=VERBOSE, - show_progress=SHOW_PROGRESS, - ) - - print("\nMulti-output predictions head:") - print(predictions) - - # --- Aggregation --- - print("Calculating combined prediction...") - combined_prediction = agg_predict(predictions, weights=WEIGHTS) - - print("Combined Prediction:") - print(combined_prediction) - - -if __name__ == "__main__": - main() diff --git a/src/spotforecast2/tasks/task_n_to_1_with_covariates.py b/src/spotforecast2/tasks/task_n_to_1_with_covariates.py deleted file mode 100644 index 14b43469..00000000 --- a/src/spotforecast2/tasks/task_n_to_1_with_covariates.py +++ /dev/null @@ -1,464 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -""" -N-to-1 Forecasting with Exogenous Covariates and Prediction Aggregation. - -This module implements a complete end-to-end pipeline for multi-step time series -forecasting with exogenous variables (weather, holidays, calendar features), -followed by prediction aggregation using configurable weights. - -The pipeline: - 1. Performs multi-output recursive forecasting with exogenous covariates - 2. Aggregates predictions using weighted combinations - 3. Supports flexible model selection (string or object-based) - 4. Allows customization via kwargs for all underlying functions - -Key Features: - - Automatic weather, holiday, and calendar feature generation - - Cyclical and polynomial feature engineering - - Configurable recursive forecaster with LGBMRegressor default - - Weighted prediction aggregation - - Comprehensive parameter flexibility via **kwargs - - Detailed logging and progress tracking - -Examples: - ```{python} - import tempfile - - from spotforecast2.tasks.task_n_to_1_with_covariates import n_to_1_with_covariates - - predictions, combined, metrics, features = n_to_1_with_covariates( - forecast_horizon=2, - lags=4, - window_size=8, - verbose=False, - force_train=True, - model_dir=tempfile.mkdtemp(), - on_weather_failure="skip", - ) - print(f"Predictions shape: {predictions.shape}") - print(f"Combined forecast length: {len(combined)}") - assert predictions.shape[0] == 2 - assert len(combined) == 2 - ``` - - ```{python} - #| eval: false - # main() uses hardcoded forecast_horizon=24 and lags=24; shrinking is not possible without code changes. - from spotforecast2.tasks.task_n_to_1_with_covariates import main - - main() - ``` - -Available Parameters: - -Forecasting Parameters: - forecast_horizon (int): Number of steps ahead to forecast. Default: 24. - contamination (float): Outlier detection threshold [0, 1]. Default: 0.01. - window_size (int): Rolling window size for feature engineering. Default: 72. - lags (int): Number of lag features to create. Default: 24. - train_ratio (float): Train-test split ratio [0, 1]. Default: 0.8. - verbose (bool): Enable detailed progress logging. Default: True. - -Location & Time Parameters: - latitude (float): Location latitude for sun features. Default: 51.5136 (Dortmund). - longitude (float): Location longitude for sun features. Default: 7.4653 (Dortmund). - timezone (str): Timezone for data processing. Default: "UTC". - country_code (str): Country code for holidays (ISO 3166-1 alpha-2). Default: "DE". - state (str): State/region code for holidays (depends on country). Default: "NW". - -Feature Engineering Parameters: - include_weather_windows (bool): Include rolling weather statistics. Default: False. - include_holiday_features (bool): Include holiday indicator features. Default: False. - include_holiday_adjacency_features (bool): Include Brückentag and before/after-holiday indicators. Default: False. - poly_features_degree (int): Polynomial-interaction degree (1 = off, 2 = pairwise). Default: 1. - max_poly_features (int): Cap on kept polynomial columns (top-K by mutual information). Default: 10. - -Model Parameters: - estimator (Optional[Union[str, object]]): Forecaster estimator. Can be: - - None: Uses default LGBMRegressor(n_estimators=100) - - "ForecasterRecursive": String reference (uses default) - - LGBMRegressor(...): Custom estimator object - Default: None. - -Aggregation Parameters: - weights (Optional[Union[Dict[str, float], List[float], np.ndarray]]): - Weights for prediction aggregation. Can be: - - None: Defaults to uniform weights (1.0 for each column) - - Dict: Column name -> weight mapping - - List/Array: Weights in column order - Default: [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0]. -""" - -import warnings -from typing import Any, Dict, List, Optional, Tuple, Union - -import numpy as np -import pandas as pd -from spotforecast2_safe.processing.agg_predict import agg_predict -from spotforecast2_safe.processing.n2n_predict_with_covariates import ( - n2n_predict_with_covariates, -) - -warnings.simplefilter("ignore") - - -def n_to_1_with_covariates( - forecast_horizon: int = 24, - contamination: float = 0.01, - window_size: int = 72, - lags: int = 24, - train_ratio: float = 0.8, - latitude: float = 51.5136, - longitude: float = 7.4653, - timezone: str = "UTC", - country_code: str = "DE", - state: str = "NW", - estimator: Optional[Union[str, object]] = None, - include_weather_windows: bool = False, - include_holiday_features: bool = False, - include_holiday_adjacency_features: bool = False, - poly_features_degree: int = 1, - max_poly_features: int = 10, - weights: Optional[Union[Dict[str, float], List[float], np.ndarray]] = None, - verbose: bool = True, - show_progress: bool = True, - **kwargs: Any, -) -> Tuple[pd.DataFrame, pd.Series, Dict, Dict]: - """Execute N-to-1 forecasting pipeline with exogenous covariates. - - This function performs a complete time series forecasting workflow: - 1. Fetches and preprocesses data - 2. Engineers features (calendar, weather, holidays, cyclical, polynomial) - 3. Trains recursive forecaster on multiple targets - 4. Aggregates predictions using weighted combination - - Args: - forecast_horizon (int): Number of forecast steps ahead. - Determines how many time steps to predict into the future. - Typical values: 24 (1 day), 48 (2 days), 168 (1 week). Default: 24. - - contamination (float): Outlier contamination level for anomaly detection. - Expected proportion of outliers in the training data [0, 1]. - Higher values detect fewer outliers. Default: 0.01 (1%). - - window_size (int): Rolling window size for feature engineering (hours). - Size of the rolling window for computing statistics. - Must be > lags. Typical range: 24-168. Default: 72. - - lags (int): Number of lagged features to create. - Creates AR(p) features with p=lags. - Typical values: 12, 24, 48. Default: 24. - - train_ratio (float): Proportion of data for training [0, 1]. - Remaining data (1 - train_ratio) used for validation/testing. - Typical values: 0.7-0.9. Default: 0.8. - - latitude (float): Geographic latitude for solar features. - Used to compute sunrise/sunset times for day/night features. - Default: 51.5136 (Dortmund, Germany). - - longitude (float): Geographic longitude for solar features. - Used to compute sunrise/sunset times for day/night features. - Default: 7.4653 (Dortmund, Germany). - - timezone (str): Timezone for time-based features. - Any timezone recognized by pytz. Default: "UTC". - - country_code (str): ISO 3166-1 alpha-2 country code for holidays. - Examples: "DE" (Germany), "US" (USA), "GB" (UK). Default: "DE". - - state (str): State/region code for holidays. - Country-dependent. For Germany: "BW", "BY", "NW", etc. - Default: "NW" (Nordrhein-Westfalen). - - estimator (Optional[Union[str, object]]): Forecaster model. - Can be: - - None: Uses LGBMRegressor(n_estimators=100, verbose=-1). - - "ForecasterRecursive": References default estimator (same as None). - - LGBMRegressor(...): Custom pre-configured estimator. - - Any sklearn-compatible regressor. - Default: None. - - include_weather_windows (bool): Add rolling weather statistics. - Creates moving averages, min, max of weather features over - multiple windows (1D, 7D). Increases feature count significantly. - Default: False. - - include_holiday_features (bool): Add holiday binary indicators. - Creates features indicating holidays and special dates. - Useful for capturing demand patterns around holidays. - Default: False. - - include_holiday_adjacency_features (bool): Add Brückentag and - before/after-holiday binary indicators (``is_brueckentag``, - ``is_before_holiday``, ``is_after_holiday``). - Default: False. - - poly_features_degree (int): Polynomial-interaction degree. 1 (default) - adds no interactions; 2 adds pairwise bilinear terms; 3+ higher order. - max_poly_features (int): Cap on kept polynomial interaction columns; only - the top-K ranked by mutual information with the target survive - (<= 0 disables). Default: 10. - - weights (Optional[Union[Dict[str, float], List[float], np.ndarray]]): - Weights for combining multi-output predictions. - Can be: - - None: Default weights [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - Dict: {"col_name": weight, ...} for specific columns - - List: [w1, w2, ...] in column order - - np.ndarray: Same as list - Default: None (uses default weights). - - verbose (bool): Enable progress logging. - Prints intermediate results and timestamps. - Default: True. - - show_progress (bool): Show a progress bar for major pipeline steps. - Default: True. - - **kwargs (Any): Additional parameters for underlying functions. - These are passed to n2n_predict_with_covariates(). - Examples: - - freq: Frequency for data resampling. Default: "h" (hourly). - - columns: Specific columns to forecast. Default: None (all). - Any parameter accepted by n2n_predict_with_covariates(). - - Returns: - Tuple[pd.DataFrame, pd.Series, Dict, Dict]: A tuple containing: - - predictions (pd.DataFrame): Multi-output forecasts from recursive model. - Each column represents a target variable. - Index is datetime matching the forecast period. - - combined_prediction (pd.Series): Aggregated forecast from weighted combination. - Single column combining all output predictions. - Index is datetime matching the forecast period. - - model_metrics (Dict): Performance metrics from recursive forecaster. - Keys may include: 'mae', 'rmse', 'mape', etc. - - feature_info (Dict): Information about engineered features. - Contains feature counts, types, and engineering details. - - Raises: - ValueError: If forecast_horizon <= 0 or invalid parameter combinations. - FileNotFoundError: If data source files cannot be accessed. - RuntimeError: If model training fails or data processing errors occur. - - Examples: - ```{python} - import tempfile - - from spotforecast2.tasks.task_n_to_1_with_covariates import ( - n_to_1_with_covariates, - ) - - predictions, combined, metrics, features = n_to_1_with_covariates( - forecast_horizon=2, - lags=4, - window_size=8, - verbose=False, - force_train=True, - model_dir=tempfile.mkdtemp(), - on_weather_failure="skip", - ) - print(f"Predictions shape: {predictions.shape}") - print("Combined forecast head:") - print(combined.head()) - assert predictions.shape[0] == 2 - assert isinstance(combined.head(), type(combined)) - ``` - - ```{python} - import tempfile - - from lightgbm import LGBMRegressor - - from spotforecast2.tasks.task_n_to_1_with_covariates import ( - n_to_1_with_covariates, - ) - - custom_estimator = LGBMRegressor(n_estimators=50, learning_rate=0.05, max_depth=4) - - predictions, combined, metrics, features = n_to_1_with_covariates( - forecast_horizon=2, - lags=4, - window_size=8, - estimator=custom_estimator, - weights=None, # uses uniform weights over all output columns - verbose=False, - force_train=True, - model_dir=tempfile.mkdtemp(), - on_weather_failure="skip", - ) - print(f"Predictions shape: {predictions.shape}") - print(f"Combined prediction type: {type(combined).__name__}") - assert predictions.shape[0] == 2 - assert isinstance(combined, type(combined)) - ``` - """ - # Default weights if not provided - if weights is None: - weights = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - if verbose: - print("=" * 80) - print("N-to-1 Forecasting with Exogenous Covariates") - print("=" * 80) - print("\nConfiguration:") - print(f" Forecast Horizon: {forecast_horizon} steps") - print(f" Contamination Level: {contamination}") - print(f" Window Size: {window_size}") - print(f" Lags: {lags}") - print(f" Train Ratio: {train_ratio}") - print(" Location: Lat=***, Lon=***") - print(" Timezone: ***") - print(" Country Code: ***, State: ***") - print(f" Estimator: {type(estimator).__name__ if estimator else 'Default'}") - print(" Feature Engineering:") - print(f" - Weather Windows: {include_weather_windows}") - print(f" - Holiday Features: {include_holiday_features}") - print(f" - Holiday Adjacency Features: {include_holiday_adjacency_features}") - print(f" - Polynomial Degree: {poly_features_degree}") - print(f" - Max Polynomial Features: {max_poly_features}") - print(f" Weights Type: {type(weights).__name__}") - print(f"\n{'=' * 80}\n") - - # --- Step 1: Multi-Output Recursive Forecasting with Covariates --- - if verbose: - print("Step 1: Executing multi-output recursive forecasting...") - - # Prepare kwargs for n2n_predict_with_covariates - forecast_kwargs = { - "forecast_horizon": forecast_horizon, - "contamination": contamination, - "window_size": window_size, - "lags": lags, - "train_ratio": train_ratio, - "latitude": latitude, - "longitude": longitude, - "timezone": timezone, - "country_code": country_code, - "state": state, - "estimator": estimator, - "include_weather_windows": include_weather_windows, - "include_holiday_features": include_holiday_features, - "include_holiday_adjacency_features": include_holiday_adjacency_features, - "poly_features_degree": poly_features_degree, - "max_poly_features": max_poly_features, - "verbose": verbose, - "show_progress": show_progress, - } - - # Add any additional kwargs - forecast_kwargs.update(kwargs) - - # Execute recursive forecasting - predictions, model_metrics, feature_info = n2n_predict_with_covariates( - **forecast_kwargs - ) - - if verbose: - print(f"\nMulti-output predictions shape: {predictions.shape}") - print(f"Output columns: {list(predictions.columns)}") - print(f"Date range: {predictions.index[0]} to {predictions.index[-1]}") - - # --- Step 2: Prediction Aggregation --- - if verbose: - print("\nStep 2: Aggregating predictions using weighted combination...") - - combined_prediction = agg_predict(predictions, weights=weights) - - if verbose: - print(f"Combined prediction shape: {combined_prediction.shape}") - print("\nAggregation Summary:") - print(" Combined Prediction Head:") - print(combined_prediction.head()) - print("\n Combined Prediction Statistics:") - print(f" Mean: {combined_prediction.mean():.4f}") - print(f" Std: {combined_prediction.std():.4f}") - print(f" Min: {combined_prediction.min():.4f}") - print(f" Max: {combined_prediction.max():.4f}") - print(f"\n{'=' * 80}\n") - - return predictions, combined_prediction, model_metrics, feature_info - - -def main() -> None: - """Execute the complete N-to-1 forecasting pipeline with default parameters. - - This is the entry point when running the script directly. It executes the full - forecasting pipeline with default settings and prints comprehensive results. - - The default configuration: - - Forecasts 24 steps ahead - - Uses Dortmund, Germany coordinates - - Applies default contamination and window parameters - - Aggregates with predefined weights - - Provides verbose output - - Returns: - None. Results are printed to stdout. - - Examples: - ```{python} - #| eval: false - # main() uses hardcoded forecast_horizon=24 and lags=24; these cannot be shrunk without code changes. - from spotforecast2.tasks.task_n_to_1_with_covariates import main - - main() - ``` - """ - FORECAST_HORIZON = 24 - CONTAMINATION = 0.01 - WINDOW_SIZE = 72 - LAGS = 24 - TRAIN_RATIO = 0.8 - LATITUDE = 51.5136 - LONGITUDE = 7.4653 - TIMEZONE = "UTC" - COUNTRY_CODE = "DE" - STATE = "NW" - INCLUDE_WEATHER_WINDOWS = False - INCLUDE_HOLIDAY_FEATURES = False - INCLUDE_HOLIDAY_ADJACENCY_FEATURES = False - POLY_FEATURES_DEGREE = 1 - MAX_POLY_FEATURES = 10 - VERBOSE = False - WEIGHTS = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - print("--- Starting n_to_1_with_covariates using modular functions ---") - - # Execute the forecasting pipeline - predictions, combined_prediction, model_metrics, feature_info = ( - n_to_1_with_covariates( - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - lags=LAGS, - train_ratio=TRAIN_RATIO, - latitude=LATITUDE, - longitude=LONGITUDE, - timezone=TIMEZONE, - country_code=COUNTRY_CODE, - state=STATE, - estimator=None, - include_weather_windows=INCLUDE_WEATHER_WINDOWS, - include_holiday_features=INCLUDE_HOLIDAY_FEATURES, - include_holiday_adjacency_features=INCLUDE_HOLIDAY_ADJACENCY_FEATURES, - poly_features_degree=POLY_FEATURES_DEGREE, - max_poly_features=MAX_POLY_FEATURES, - weights=WEIGHTS, - verbose=VERBOSE, - ) - ) - - # Print results (similar to n_to_1_task.py) - print("\nMulti-output predictions head:") - print(predictions) - - print("Calculating combined prediction...") - print("Combined Prediction:") - print(combined_prediction) - - -if __name__ == "__main__": - main() diff --git a/src/spotforecast2/tasks/task_n_to_1_with_covariates_and_dataframe.py b/src/spotforecast2/tasks/task_n_to_1_with_covariates_and_dataframe.py deleted file mode 100644 index 1a0a2208..00000000 --- a/src/spotforecast2/tasks/task_n_to_1_with_covariates_and_dataframe.py +++ /dev/null @@ -1,490 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -""" -N-to-1 Forecasting with Exogenous Covariates and Prediction Aggregation. - -This module implements a complete end-to-end pipeline for multi-step time series -forecasting with exogenous variables (weather, holidays, calendar features), -followed by prediction aggregation using configurable weights. - -The pipeline: - 1. Performs multi-output recursive forecasting with exogenous covariates - 2. Aggregates predictions using weighted combinations - 3. Supports flexible model selection (string or object-based) - 4. Allows customization via kwargs for all underlying functions - -Key Features: - - Automatic weather, holiday, and calendar feature generation - - Cyclical and polynomial feature engineering - - Configurable recursive forecaster with LGBMRegressor default - - Weighted prediction aggregation - - Comprehensive parameter flexibility via **kwargs - - Detailed logging and progress tracking - -Examples: - ```{python} - import tempfile - - import numpy as np - import pandas as pd - from lightgbm import LGBMRegressor - - from spotforecast2.tasks.task_n_to_1_with_covariates_and_dataframe import ( - n_to_1_with_covariates, - ) - - rng = np.random.default_rng(0) - n = 200 - idx = pd.date_range("2023-01-01", periods=n, freq="h", tz="UTC") - data = pd.DataFrame( - { - "A": rng.normal(50, 5, n), - "B": rng.normal(30, 3, n), - }, - index=idx, - ) - - estimator = LGBMRegressor(n_estimators=50, verbose=-1) - predictions, combined, metrics, feature_info = n_to_1_with_covariates( - data=data, - forecast_horizon=2, - lags=3, - window_size=6, - train_ratio=0.8, - estimator=estimator, - weights=[1.0, -1.0], - verbose=False, - show_progress=False, - on_weather_failure="skip", - force_train=True, - model_dir=tempfile.mkdtemp(), - ) - print(f"Predictions shape: {predictions.shape}") - print(f"Combined forecast length: {len(combined)}") - assert predictions.shape[0] == 2 - assert len(combined) == 2 - ``` - -Available Parameters: - -Forecasting Parameters: - forecast_horizon (int): Number of steps ahead to forecast. Default: 24. - contamination (float): Outlier detection threshold [0, 1]. Default: 0.01. - window_size (int): Rolling window size for feature engineering. Default: 72. - lags (int): Number of lag features to create. Default: 24. - train_ratio (float): Train-test split ratio [0, 1]. Default: 0.8. - verbose (bool): Enable detailed progress logging. Default: True. - -Location & Time Parameters: - latitude (float): Location latitude for sun features. Default: 51.5136 (Dortmund). - longitude (float): Location longitude for sun features. Default: 7.4653 (Dortmund). - timezone (str): Timezone for data processing. Default: "UTC". - country_code (str): Country code for holidays (ISO 3166-1 alpha-2). Default: "DE". - state (str): State/region code for holidays (depends on country). Default: "NW". - -Feature Engineering Parameters: - include_weather_windows (bool): Include rolling weather statistics. Default: False. - include_holiday_features (bool): Include holiday indicator features. Default: False. - include_holiday_adjacency_features (bool): Include Brückentag and before/after-holiday indicators. Default: False. - poly_features_degree (int): Polynomial-interaction degree (1 = off, 2 = pairwise). Default: 1. - max_poly_features (int): Cap on kept polynomial columns (top-K by mutual information). Default: 10. - -Model Parameters: - estimator (Optional[Union[str, object]]): Forecaster estimator. Can be: - - None: Uses default LGBMRegressor(n_estimators=100) - - "ForecasterRecursive": String reference (uses default) - - LGBMRegressor(...): Custom estimator object - Default: None. - -Aggregation Parameters: - weights (Optional[Union[Dict[str, float], List[float], np.ndarray]]): - Weights for prediction aggregation. Can be: - - None: Defaults to uniform weights (1.0 for each column) - - Dict: Column name -> weight mapping - - List/Array: Weights in column order - Default: [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0]. -""" - -import warnings -from typing import Any, Dict, List, Optional, Tuple, Union - -import numpy as np -import pandas as pd -from spotforecast2_safe.data.fetch_data import fetch_data, get_data_home -from spotforecast2_safe.processing.agg_predict import agg_predict -from spotforecast2_safe.processing.n2n_predict_with_covariates import ( - n2n_predict_with_covariates, -) - -warnings.simplefilter("ignore") - - -def n_to_1_with_covariates( - data: Optional[pd.DataFrame] = None, - forecast_horizon: int = 24, - contamination: float = 0.01, - window_size: int = 72, - lags: int = 24, - train_ratio: float = 0.8, - latitude: float = 51.5136, - longitude: float = 7.4653, - timezone: str = "UTC", - country_code: str = "DE", - state: str = "NW", - estimator: Optional[Union[str, object]] = None, - include_weather_windows: bool = False, - include_holiday_features: bool = False, - include_holiday_adjacency_features: bool = False, - poly_features_degree: int = 1, - max_poly_features: int = 10, - weights: Optional[Union[Dict[str, float], List[float], np.ndarray]] = None, - verbose: bool = True, - show_progress: bool = True, - **kwargs: Any, -) -> Tuple[pd.DataFrame, pd.Series, Dict, Dict]: - """Execute N-to-1 forecasting pipeline with exogenous covariates. - - This function performs a complete time series forecasting workflow: - 1. Fetches and preprocesses data - 2. Engineers features (calendar, weather, holidays, cyclical, polynomial) - 3. Trains recursive forecaster on multiple targets - 4. Aggregates predictions using weighted combination - - Args: - data (Optional[pd.DataFrame]): Optional DataFrame with target time series data. - If None, fetches data automatically. Default: None. - - forecast_horizon (int): Number of forecast steps ahead. - Determines how many time steps to predict into the future. - Typical values: 24 (1 day), 48 (2 days), 168 (1 week). Default: 24. - - contamination (float): Outlier contamination level for anomaly detection. - Expected proportion of outliers in the training data [0, 1]. - Higher values detect fewer outliers. Default: 0.01 (1%). - - window_size (int): Rolling window size for feature engineering (hours). - Size of the rolling window for computing statistics. - Must be > lags. Typical range: 24-168. Default: 72. - - lags (int): Number of lagged features to create. - Creates AR(p) features with p=lags. - Typical values: 12, 24, 48. Default: 24. - - train_ratio (float): Proportion of data for training [0, 1]. - Remaining data (1 - train_ratio) used for validation/testing. - Typical values: 0.7-0.9. Default: 0.8. - - latitude (float): Geographic latitude for solar features. - Used to compute sunrise/sunset times for day/night features. - Default: 51.5136 (Dortmund, Germany). - - longitude (float): Geographic longitude for solar features. - Used to compute sunrise/sunset times for day/night features. - Default: 7.4653 (Dortmund, Germany). - - timezone (str): Timezone for time-based features. - Any timezone recognized by pytz. Default: "UTC". - - country_code (str): ISO 3166-1 alpha-2 country code for holidays. - Examples: "DE" (Germany), "US" (USA), "GB" (UK). Default: "DE". - - state (str): State/region code for holidays. - Country-dependent. For Germany: "BW", "BY", "NW", etc. - Default: "NW" (Nordrhein-Westfalen). - - estimator (Optional[Union[str, object]]): Forecaster model. - Can be: - - None: Uses LGBMRegressor(n_estimators=100, verbose=-1). - - "ForecasterRecursive": References default estimator (same as None). - - LGBMRegressor(...): Custom pre-configured estimator. - - Any sklearn-compatible regressor. - Default: None. - - include_weather_windows (bool): Add rolling weather statistics. - Creates moving averages, min, max of weather features over - multiple windows (1D, 7D). Increases feature count significantly. - Default: False. - - include_holiday_features (bool): Add holiday binary indicators. - Creates features indicating holidays and special dates. - Useful for capturing demand patterns around holidays. - Default: False. - - include_holiday_adjacency_features (bool): Add Brückentag and - before/after-holiday binary indicators (``is_brueckentag``, - ``is_before_holiday``, ``is_after_holiday``). - Default: False. - - poly_features_degree (int): Polynomial-interaction degree. 1 (default) - adds no interactions; 2 adds pairwise bilinear terms; 3+ higher order. - max_poly_features (int): Cap on kept polynomial interaction columns; only - the top-K ranked by mutual information with the target survive - (<= 0 disables). Default: 10. - - weights (Optional[Union[Dict[str, float], List[float], np.ndarray]]): - Weights for combining multi-output predictions. - Can be: - - None: Default weights [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - Dict: {"col_name": weight, ...} for specific columns - - List: [w1, w2, ...] in column order - - np.ndarray: Same as list - Default: None (uses default weights). - - verbose (bool): Enable progress logging. - Prints intermediate results and timestamps. - Default: True. - - show_progress (bool): Show a progress bar for major pipeline steps. - Default: True. - - **kwargs (Any): Additional parameters for underlying functions. - These are passed to n2n_predict_with_covariates(). - Examples: - - freq: Frequency for data resampling. Default: "h" (hourly). - - columns: Specific columns to forecast. Default: None (all). - Any parameter accepted by n2n_predict_with_covariates(). - - Returns: - Tuple[pd.DataFrame, pd.Series, Dict, Dict]: A tuple containing: - - predictions (pd.DataFrame): Multi-output forecasts from recursive model. - Each column represents a target variable. - Index is datetime matching the forecast period. - - combined_prediction (pd.Series): Aggregated forecast from weighted combination. - Single column combining all output predictions. - Index is datetime matching the forecast period. - - model_metrics (Dict): Performance metrics from recursive forecaster. - Keys may include: 'mae', 'rmse', 'mape', etc. - - feature_info (Dict): Information about engineered features. - Contains feature counts, types, and engineering details. - - Raises: - ValueError: If forecast_horizon <= 0 or invalid parameter combinations. - FileNotFoundError: If data source files cannot be accessed. - RuntimeError: If model training fails or data processing errors occur. - - Examples: - ```{python} - import tempfile - - import numpy as np - import pandas as pd - from lightgbm import LGBMRegressor - - from spotforecast2.tasks.task_n_to_1_with_covariates_and_dataframe import ( - n_to_1_with_covariates, - ) - - rng = np.random.default_rng(42) - n = 200 - idx = pd.date_range("2023-01-01", periods=n, freq="h", tz="UTC") - data = pd.DataFrame( - { - "target_a": rng.normal(100, 10, n), - "target_b": rng.normal(60, 6, n), - }, - index=idx, - ) - - custom_estimator = LGBMRegressor( - n_estimators=50, - learning_rate=0.05, - max_depth=4, - verbose=-1, - ) - custom_weights = [1.0, -1.0] - - predictions, combined, metrics, feature_info = n_to_1_with_covariates( - data=data, - forecast_horizon=2, - lags=3, - window_size=6, - train_ratio=0.8, - estimator=custom_estimator, - weights=custom_weights, - verbose=False, - show_progress=False, - on_weather_failure="skip", - force_train=True, - model_dir=tempfile.mkdtemp(), - ) - print(f"Predictions shape: {predictions.shape}") - print("Combined forecast head:") - print(combined.head()) - print(f"Feature info keys: {sorted(feature_info.keys())[:4]}") - assert predictions.shape[0] == 2 - assert len(combined) == 2 - assert isinstance(metrics, dict) - ``` - """ - # Default weights if not provided - if weights is None: - weights = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - if verbose: - print("=" * 80) - print("N-to-1 Forecasting with Exogenous Covariates") - print("=" * 80) - print("\nConfiguration:") - print(f" Forecast Horizon: {forecast_horizon} steps") - print(f" Contamination Level: {contamination}") - print(f" Window Size: {window_size}") - print(f" Lags: {lags}") - print(f" Train Ratio: {train_ratio}") - print(" Location: Lat=***, Lon=***") - print(" Timezone: ***") - print(" Country Code: ***, State: ***") - print(f" Estimator: {type(estimator).__name__ if estimator else 'Default'}") - print(" Feature Engineering:") - print(f" - Weather Windows: {include_weather_windows}") - print(f" - Holiday Features: {include_holiday_features}") - print(f" - Holiday Adjacency Features: {include_holiday_adjacency_features}") - print(f" - Polynomial Degree: {poly_features_degree}") - print(f" - Max Polynomial Features: {max_poly_features}") - print(f" Weights Type: {type(weights).__name__}") - print(f"\n{'=' * 80}\n") - - # --- Step 1: Multi-Output Recursive Forecasting with Covariates --- - if verbose: - print("Step 1: Executing multi-output recursive forecasting...") - - # Prepare kwargs for n2n_predict_with_covariates - forecast_kwargs = { - "data": data, - "forecast_horizon": forecast_horizon, - "contamination": contamination, - "window_size": window_size, - "lags": lags, - "train_ratio": train_ratio, - "latitude": latitude, - "longitude": longitude, - "timezone": timezone, - "country_code": country_code, - "state": state, - "estimator": estimator, - "include_weather_windows": include_weather_windows, - "include_holiday_features": include_holiday_features, - "include_holiday_adjacency_features": include_holiday_adjacency_features, - "poly_features_degree": poly_features_degree, - "max_poly_features": max_poly_features, - "verbose": verbose, - "show_progress": show_progress, - } - - # Add any additional kwargs - forecast_kwargs.update(kwargs) - - # Execute recursive forecasting - predictions, model_metrics, feature_info = n2n_predict_with_covariates( - **forecast_kwargs - ) - - if verbose: - print(f"\nMulti-output predictions shape: {predictions.shape}") - print(f"Output columns: {list(predictions.columns)}") - print(f"Date range: {predictions.index[0]} to {predictions.index[-1]}") - - # --- Step 2: Prediction Aggregation --- - if verbose: - print("\nStep 2: Aggregating predictions using weighted combination...") - - combined_prediction = agg_predict(predictions, weights=weights) - - if verbose: - print(f"Combined prediction shape: {combined_prediction.shape}") - print("\nAggregation Summary:") - print(" Combined Prediction Head:") - print(combined_prediction.head()) - print("\n Combined Prediction Statistics:") - print(f" Mean: {combined_prediction.mean():.4f}") - print(f" Std: {combined_prediction.std():.4f}") - print(f" Min: {combined_prediction.min():.4f}") - print(f" Max: {combined_prediction.max():.4f}") - print(f"\n{'=' * 80}\n") - - return predictions, combined_prediction, model_metrics, feature_info - - -def main() -> None: - """Execute the complete N-to-1 forecasting pipeline with default parameters. - - This is the entry point when running the script directly. It executes the full - forecasting pipeline with default settings and prints comprehensive results. - - The default configuration: - - Forecasts 24 steps ahead - - Uses Dortmund, Germany coordinates - - Applies default contamination and window parameters - - Aggregates with predefined weights - - Provides verbose output - - Returns: - None. Results are printed to stdout. - - Examples: - ```{python} - #| eval: false - # main() reads data_in.csv from the user's data home directory; not reproducible without that file. - from spotforecast2.tasks.task_n_to_1_with_covariates_and_dataframe import main - - main() - ``` - """ - data = fetch_data(filename=get_data_home() / "data_in.csv") - - FORECAST_HORIZON = 24 - CONTAMINATION = 0.01 - WINDOW_SIZE = 72 - LAGS = 24 - TRAIN_RATIO = 0.8 - LATITUDE = 51.5136 - LONGITUDE = 7.4653 - TIMEZONE = "UTC" - COUNTRY_CODE = "DE" - STATE = "NW" - INCLUDE_WEATHER_WINDOWS = False - INCLUDE_HOLIDAY_FEATURES = False - INCLUDE_HOLIDAY_ADJACENCY_FEATURES = False - POLY_FEATURES_DEGREE = 1 - MAX_POLY_FEATURES = 10 - VERBOSE = False - WEIGHTS = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - print("--- Starting n_to_1_with_covariates using modular functions ---") - - # Execute the forecasting pipeline - predictions, combined_prediction, model_metrics, feature_info = ( - n_to_1_with_covariates( - data=data, - forecast_horizon=FORECAST_HORIZON, - contamination=CONTAMINATION, - window_size=WINDOW_SIZE, - lags=LAGS, - train_ratio=TRAIN_RATIO, - latitude=LATITUDE, - longitude=LONGITUDE, - timezone=TIMEZONE, - country_code=COUNTRY_CODE, - state=STATE, - estimator=None, - include_weather_windows=INCLUDE_WEATHER_WINDOWS, - include_holiday_features=INCLUDE_HOLIDAY_FEATURES, - include_holiday_adjacency_features=INCLUDE_HOLIDAY_ADJACENCY_FEATURES, - poly_features_degree=POLY_FEATURES_DEGREE, - max_poly_features=MAX_POLY_FEATURES, - weights=WEIGHTS, - verbose=VERBOSE, - ) - ) - - # Print results (similar to n_to_1_task.py) - print("\nMulti-output predictions head:") - print(predictions) - - print("Calculating combined prediction...") - print("Combined Prediction:") - print(combined_prediction) - - -if __name__ == "__main__": - main() diff --git a/tests/test_task_demo_integration.py b/tests/test_task_demo_integration.py deleted file mode 100644 index 9575e2c4..00000000 --- a/tests/test_task_demo_integration.py +++ /dev/null @@ -1,259 +0,0 @@ -# SPDX-FileCopyrightText: 2026 bartzbeielstein -# SPDX-License-Identifier: AGPL-3.0-or-later - -""" -Test suite for task_demo.py integration with load_actual_combined. - -This module validates that task_demo.py correctly uses load_actual_combined -from spotforecast2_safe for loading ground truth data. -""" - -import tempfile -from pathlib import Path - -import pandas as pd -import pytest -from spotforecast2_safe.configurator import ConfigDemo -from spotforecast2_safe.data import load_actual_combined - - -class TestLoadActualCombinedIntegration: - """Test integration of load_actual_combined in task_demo workflow.""" - - def test_load_actual_combined_with_demo_config(self): - """Test loading actual data using ConfigDemo as in task_demo.py.""" - # Create temporary CSV file - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - f.write("timestamp,col1,col2,col3\n") - for i in range(30): - f.write(f"2020-01-01 {i:02d}:00:00,{i},{i*2},{i*3}\n") - temp_path = Path(f.name) - - try: - # Simulate task_demo.py usage - DATA_PATH = str(temp_path) - FORECAST_HORIZON = 24 - WEIGHTS = [1.0, 1.0, 1.0] - columns = ["col1", "col2", "col3"] - - # Use load_actual_combined as in task_demo.py - config = ConfigDemo(data_path=Path(DATA_PATH).expanduser()) - actual_combined = load_actual_combined( - config=config, - columns=columns, - forecast_horizon=FORECAST_HORIZON, - weights=WEIGHTS, - ) - - # Validate results - assert isinstance(actual_combined, pd.Series) - assert len(actual_combined) == FORECAST_HORIZON - assert actual_combined.index.name == "timestamp" - - finally: - temp_path.unlink() - - def test_load_actual_combined_with_tilde_path(self): - """Test that tilde paths are properly expanded.""" - # Create temp directory structure - with tempfile.TemporaryDirectory() as tmpdir: - data_dir = Path(tmpdir) / "spotforecast2_data" - data_dir.mkdir() - data_file = data_dir / "data_test.csv" - - # Write test data - with open(data_file, "w") as f: - f.write("timestamp,A,B\n") - for i in range(10): - f.write(f"2020-01-01 {i:02d}:00:00,{i},{i*2}\n") - - # Use the file - config = ConfigDemo(data_path=data_file) - result = load_actual_combined( - config=config, - columns=["A", "B"], - forecast_horizon=5, - weights=[1.0, 1.0], - ) - - assert len(result) == 5 - assert isinstance(result, pd.Series) - - def test_load_actual_combined_override_parameters(self): - """Test overriding forecast_horizon and weights as in task_demo.py.""" - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - f.write("timestamp,X,Y,Z\n") - for i in range(50): - f.write(f"2020-01-01 {i:02d}:00:00,{i},{i*2},{i*3}\n") - temp_path = Path(f.name) - - try: - # Config has default horizon of 24 - config = ConfigDemo(data_path=temp_path, forecast_horizon=24) - - # Override with custom horizon - result = load_actual_combined( - config=config, - columns=["X", "Y", "Z"], - forecast_horizon=10, # Override - weights=[1.0, -1.0, 1.0], # Custom weights - ) - - assert len(result) == 10 # Should use overridden value, not 24 - - finally: - temp_path.unlink() - - def test_load_actual_combined_with_standard_weights(self): - """Test using the standard 11-column weight configuration from task_demo.py.""" - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - # Create 11 columns as in task_demo - columns = [f"col{i}" for i in range(11)] - f.write("timestamp," + ",".join(columns) + "\n") - for i in range(30): - values = ",".join([str(i * j) for j in range(11)]) - f.write(f"2020-01-01 {i:02d}:00:00,{values}\n") - temp_path = Path(f.name) - - try: - # Standard weights from task_demo.py - WEIGHTS = [1.0, 1.0, -1.0, -1.0, 1.0, -1.0, 1.0, 1.0, 1.0, -1.0, 1.0] - - config = ConfigDemo(data_path=temp_path) - result = load_actual_combined( - config=config, - columns=columns, - forecast_horizon=24, - weights=WEIGHTS, - ) - - assert len(result) == 24 - assert isinstance(result, pd.Series) - assert len(WEIGHTS) == 11 - - finally: - temp_path.unlink() - - -class TestTaskDemoWorkflow: - """Test the complete workflow as used in task_demo.py.""" - - def test_workflow_simulation(self): - """Simulate the task_demo.py workflow with load_actual_combined.""" - # Create mock prediction data - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - columns = ["A", "B", "C"] - f.write("timestamp," + ",".join(columns) + "\n") - for i in range(30): - values = ",".join([str(100 + i), str(50 + i), str(25 + i)]) - f.write(f"2020-01-01 {i:02d}:00:00,{values}\n") - temp_path = Path(f.name) - - try: - # Simulate task_demo.py constants - DATA_PATH = str(temp_path) - FORECAST_HORIZON = 24 - WEIGHTS = [1.0, -0.5, -0.5] - - # Simulate prediction columns - baseline_predictions_columns = columns - - # Load actual combined as in task_demo.py - config = ConfigDemo(data_path=Path(DATA_PATH).expanduser()) - actual_combined = load_actual_combined( - config=config, - columns=list(baseline_predictions_columns), - forecast_horizon=FORECAST_HORIZON, - weights=WEIGHTS, - ) - - # Validate - assert isinstance(actual_combined, pd.Series) - assert len(actual_combined) == FORECAST_HORIZON - - # Simulate reindexing (as done in task_demo.py) - mock_prediction_index = pd.date_range( - "2020-01-01", periods=FORECAST_HORIZON, freq="h" - ) - actual_combined_reindexed = actual_combined.reindex(mock_prediction_index) - - assert len(actual_combined_reindexed) == FORECAST_HORIZON - - finally: - temp_path.unlink() - - -class TestErrorHandling: - """Test error handling in load_actual_combined integration.""" - - def test_missing_file_error(self): - """Test error when data file does not exist.""" - config = ConfigDemo(data_path=Path("/nonexistent/path/data.csv")) - - with pytest.raises(FileNotFoundError): - load_actual_combined( - config=config, - columns=["A", "B"], - forecast_horizon=24, - weights=[1.0, 1.0], - ) - - def test_missing_columns_error(self): - """Test error when requested columns are missing.""" - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - f.write("timestamp,A,B\n") - f.write("2020-01-01 00:00:00,1,2\n") - temp_path = Path(f.name) - - try: - config = ConfigDemo(data_path=temp_path) - - with pytest.raises(ValueError, match="Missing columns in test data"): - load_actual_combined( - config=config, - columns=["A", "B", "C"], # C doesn't exist - forecast_horizon=1, - weights=[1.0, 1.0, 1.0], - ) - - finally: - temp_path.unlink() - - -class TestBackwardCompatibility: - """Test that the new implementation maintains backward compatibility.""" - - def test_same_results_as_old_implementation(self): - """Verify new load_actual_combined produces same results as old _load_actual_combined.""" - with tempfile.NamedTemporaryFile(mode="w", suffix=".csv", delete=False) as f: - f.write("timestamp,col1,col2\n") - for i in range(10): - f.write(f"2020-01-01 {i:02d}:00:00,{i},{i*2}\n") - temp_path = Path(f.name) - - try: - columns = ["col1", "col2"] - weights = [1.0, 1.0] - forecast_horizon = 5 - - # New implementation - config = ConfigDemo(data_path=temp_path) - new_result = load_actual_combined( - config=config, - columns=columns, - forecast_horizon=forecast_horizon, - weights=weights, - ) - - # Old implementation (inline for comparison) - from spotforecast2_safe.processing.agg_predict import agg_predict - - data_test = pd.read_csv(temp_path, index_col=0, parse_dates=True) - actual_df = data_test[columns].iloc[:forecast_horizon] - old_result = agg_predict(actual_df, weights=weights) - - # Compare results - pd.testing.assert_series_equal(new_result, old_result) - - finally: - temp_path.unlink() diff --git a/tests/test_tasks_smoke.py b/tests/test_tasks_smoke.py deleted file mode 100644 index b83d1c62..00000000 --- a/tests/test_tasks_smoke.py +++ /dev/null @@ -1,52 +0,0 @@ -from unittest.mock import patch - -import numpy as np -import pandas as pd - -from spotforecast2.tasks import task_demo, task_n_to_1_dataframe - - -def mock_fetch_data(*args, **kwargs): - dates = pd.date_range("2020-01-01", periods=200, freq="h", tz="UTC") - data = pd.DataFrame( - np.random.rand(200, 11), index=dates, columns=[f"col{i}" for i in range(11)] - ) - return data - - -@patch( - "spotforecast2.tasks.task_n_to_1_dataframe.fetch_data", side_effect=mock_fetch_data -) -def test_task_n_to_1_dataframe_execution(mock_fetch): - """ - Smoke test to ensure task_n_to_1_dataframe.main runs without crashing. - This also implicitly tests n2n_predict from spotforecast2_safe. - """ - # Overwrite the default config simply by patching or relying on defaults - # Since it fetches data directly, our mock will supply the 200 rows. - task_n_to_1_dataframe.main() - - -@patch("spotforecast2.tasks.task_demo.n2n_predict") -@patch("spotforecast2.tasks.task_demo.n2n_predict_with_covariates") -@patch("spotforecast2.tasks.task_demo.load_actual_combined") -@patch("spotforecast2.tasks.task_demo.plot_actual_vs_predicted") -def test_task_demo_execution(mock_plot, mock_load, mock_n2n_cov, mock_n2n): - """ - Smoke test for task_demo.py to ensure it wires up the components correctly. - We mock the heavy prediction functions since they are individually tested elsewhere, - and we just want to ensure the task script itself doesn't crash on orchestration. - """ - dates = pd.date_range("2020-01-01", periods=24, freq="h", tz="UTC") - mock_df = pd.DataFrame(np.random.rand(24, 11), index=dates) - mock_series = pd.Series(np.random.rand(24), index=dates) - - mock_n2n.return_value = (mock_df, {}) - mock_n2n_cov.return_value = (mock_df, {}, {}) - mock_load.return_value = mock_series - - task_demo.main(force_train=False) - - assert mock_n2n.called - assert mock_n2n_cov.called - assert mock_plot.called