Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
{
"hash": "a34fd4bbba62bdccf2b6da5436257635",
"result": {
"engine": "jupyter",
"markdown": "---\ntitle: data.entsoe_loader\n---\n\n\n\n`data.entsoe_loader`\n\nENTSO-E interim-CSV data loaders.\n\nConfig-driven loaders for the merged ENTSO-E interim CSV, suitable for the\n``data_loader`` / ``test_data_loader`` hooks on `ConfigEntsoe`. Ported from\n``spotforecast2.tasks.task_entsoe`` ahead of that subpackage's removal.\n\n## Functions\n\n| Name | Description |\n| --- | --- |\n| [entsoe_data_loader](#spotforecast2_safe.data.entsoe_loader.entsoe_data_loader) | Read the merged interim ENTSO-E CSV that ``config.data_filename`` points at. |\n| [entsoe_test_data_loader](#spotforecast2_safe.data.entsoe_loader.entsoe_test_data_loader) | Return the merged ENTSO-E CSV sliced to the forecast horizon. |\n\n### entsoe_data_loader { #spotforecast2_safe.data.entsoe_loader.entsoe_data_loader }\n\n```python\ndata.entsoe_loader.entsoe_data_loader(config)\n```\n\nRead the merged interim ENTSO-E CSV that ``config.data_filename`` points at.\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------|\n| config | [ConfigEntsoe](`spotforecast2_safe.configurator.ConfigEntsoe`) | A `ConfigEntsoe` with ``data_filename`` set. Relative paths are resolved against `spotforecast2_safe.data.fetch_data.get_data_home`. | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|--------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame indexed by the ENTSO-E timestamp column (``Time (UTC)``) |\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | with the load columns as data columns. |\n\n#### Raises {.doc-section .doc-section-raises}\n\n| Name | Type | Description |\n|--------|------------------------------------------|-----------------------------------------------------------------------------------------------|\n| | [FileNotFoundError](`FileNotFoundError`) | If the merged CSV does not exist. Run ``spotforecast2-entsoe download`` and ``merge`` first. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n\n::: {#3872eaae .cell execution_count=1}\n``` {.python .cell-code}\nimport os\nimport tempfile\n\nimport pandas as pd\nfrom spotforecast2_safe.configurator import ConfigEntsoe\nfrom spotforecast2_safe.data.entsoe_loader import entsoe_data_loader\n\n# Build a tiny synthetic interim CSV in a temp directory.\ntmp = tempfile.mkdtemp()\ncsv_path = os.path.join(tmp, \"energy_load.csv\")\nidx = pd.date_range(\n \"2025-01-01\", periods=48, freq=\"h\", tz=\"UTC\", name=\"Time (UTC)\"\n)\npd.DataFrame({\"Actual Load\": range(48)}, index=idx).to_csv(csv_path)\n\n# Absolute path bypasses get_data_home; loader returns the full frame.\nconfig = ConfigEntsoe()\nconfig.data_filename = csv_path\ndf = entsoe_data_loader(config)\n\nprint(df.shape)\nassert df.shape == (48, 1)\nassert df.index.name == \"Time (UTC)\"\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n(48, 1)\n```\n:::\n:::\n\n\n### entsoe_test_data_loader { #spotforecast2_safe.data.entsoe_loader.entsoe_test_data_loader }\n\n```python\ndata.entsoe_loader.entsoe_test_data_loader(config)\n```\n\nReturn the merged ENTSO-E CSV sliced to the forecast horizon.\n\nThe slice spans ``(end_train, end_train + predict_size * 1 h]`` so that\n``build_prediction_package``'s ``test_actual = ts.reindex(future_pred.index)``\nmatches the hourly forecast row-for-row. ``end_train`` is taken from\n``config.end_train_default`` (treated as the *inclusive* last training\ntimestamp, the same convention the forecaster uses), and the step is\nassumed to be 1 h after the pipeline's hourly resampling.\n\nFor the live ENTSO-E exemplar with ``end_train_default = D-2 23:00 UTC``\nand ``predict_size = 24``, this returns the rows for\n``[D-1 00:00, D 00:00)`` — i.e., ``y_{-1}``. For backtests at an arbitrary\n``end_train_default``, it returns the post-cutoff window the model is\nactually predicting, rather than always \"yesterday in wall-clock UTC\".\n\n#### Parameters {.doc-section .doc-section-parameters}\n\n| Name | Type | Description | Default |\n|--------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|\n| config | [ConfigEntsoe](`spotforecast2_safe.configurator.ConfigEntsoe`) | A `ConfigEntsoe` with ``data_filename``, ``end_train_default``, and ``predict_size`` set; the merged interim CSV must already contain data covering the forecast horizon (run ``spotforecast2-entsoe download`` first). | _required_ |\n\n#### Returns {.doc-section .doc-section-returns}\n\n| Name | Type | Description |\n|--------|------------------------------------------------|------------------------------------------------------------------------|\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame indexed by ``Time (UTC)`` with the rows the forecast will be |\n| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | scored against. |\n\n#### Examples {.doc-section .doc-section-examples}\n\n::: {#f5e3a200 .cell execution_count=2}\n``` {.python .cell-code}\nimport os\nimport tempfile\n\nimport pandas as pd\nfrom spotforecast2_safe.configurator import ConfigEntsoe\nfrom spotforecast2_safe.data.entsoe_loader import entsoe_test_data_loader\n\n# Synthetic interim CSV spanning the forecast window.\ntmp = tempfile.mkdtemp()\ncsv_path = os.path.join(tmp, \"energy_load.csv\")\nidx = pd.date_range(\n \"2025-12-29 00:00\", periods=120, freq=\"h\", tz=\"UTC\", name=\"Time (UTC)\"\n)\npd.DataFrame({\"Actual Load\": range(120)}, index=idx).to_csv(csv_path)\n\nconfig = ConfigEntsoe()\nconfig.data_filename = csv_path\nconfig.end_train_default = \"2025-12-31 00:00+00:00\"\nconfig.predict_size = 24\n\ntest_df = entsoe_test_data_loader(config)\n\n# The slice covers exactly predict_size hourly steps after end_train.\nprint(test_df.shape)\nassert test_df.shape == (24, 1)\nassert test_df.index[0] == pd.Timestamp(\"2025-12-31 01:00\", tz=\"UTC\")\n```\n\n::: {.cell-output .cell-output-stdout}\n```\n(24, 1)\n```\n:::\n:::\n\n\n",
"supporting": [
"data.entsoe_loader_files"
],
"filters": [],
"includes": {}
}
}
3 changes: 3 additions & 0 deletions _quarto.yml
Original file line number Diff line number Diff line change
Expand Up @@ -126,6 +126,8 @@ website:
file: docs/reference/data.fetch_data.load_timeseries_forecast.qmd
- text: "demo_loader"
file: docs/reference/data.demo_loader.qmd
- text: "entsoe_loader"
file: docs/reference/data.entsoe_loader.qmd

- section: "Preprocessing"
contents:
Expand Down Expand Up @@ -587,6 +589,7 @@ quartodoc:
- data.fetch_data.load_day_ahead_price
- data.data_classes
- data.demo_loader
- data.entsoe_loader

# ── Preprocessing ─────────────────────────────────────────────────────────
- title: "Preprocessing"
Expand Down
136 changes: 136 additions & 0 deletions docs/reference/data.entsoe_loader.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,136 @@
# data.entsoe_loader { #spotforecast2_safe.data.entsoe_loader }

`data.entsoe_loader`

ENTSO-E interim-CSV data loaders.

Config-driven loaders for the merged ENTSO-E interim CSV, suitable for the
``data_loader`` / ``test_data_loader`` hooks on `ConfigEntsoe`. Ported from
``spotforecast2.tasks.task_entsoe`` ahead of that subpackage's removal.

## Functions

| Name | Description |
| --- | --- |
| [entsoe_data_loader](#spotforecast2_safe.data.entsoe_loader.entsoe_data_loader) | Read the merged interim ENTSO-E CSV that ``config.data_filename`` points at. |
| [entsoe_test_data_loader](#spotforecast2_safe.data.entsoe_loader.entsoe_test_data_loader) | Return the merged ENTSO-E CSV sliced to the forecast horizon. |

### entsoe_data_loader { #spotforecast2_safe.data.entsoe_loader.entsoe_data_loader }

```python
data.entsoe_loader.entsoe_data_loader(config)
```

Read the merged interim ENTSO-E CSV that ``config.data_filename`` points at.

#### Parameters {.doc-section .doc-section-parameters}

| Name | Type | Description | Default |
|--------|----------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------|------------|
| config | [ConfigEntsoe](`spotforecast2_safe.configurator.ConfigEntsoe`) | A `ConfigEntsoe` with ``data_filename`` set. Relative paths are resolved against `spotforecast2_safe.data.fetch_data.get_data_home`. | _required_ |

#### Returns {.doc-section .doc-section-returns}

| Name | Type | Description |
|--------|------------------------------------------------|--------------------------------------------------------------------|
| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame indexed by the ENTSO-E timestamp column (``Time (UTC)``) |
| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | with the load columns as data columns. |

#### Raises {.doc-section .doc-section-raises}

| Name | Type | Description |
|--------|------------------------------------------|-----------------------------------------------------------------------------------------------|
| | [FileNotFoundError](`FileNotFoundError`) | If the merged CSV does not exist. Run ``spotforecast2-entsoe download`` and ``merge`` first. |

#### Examples {.doc-section .doc-section-examples}

```{python}
import os
import tempfile

import pandas as pd
from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.data.entsoe_loader import entsoe_data_loader

# Build a tiny synthetic interim CSV in a temp directory.
tmp = tempfile.mkdtemp()
csv_path = os.path.join(tmp, "energy_load.csv")
idx = pd.date_range(
"2025-01-01", periods=48, freq="h", tz="UTC", name="Time (UTC)"
)
pd.DataFrame({"Actual Load": range(48)}, index=idx).to_csv(csv_path)

# Absolute path bypasses get_data_home; loader returns the full frame.
config = ConfigEntsoe()
config.data_filename = csv_path
df = entsoe_data_loader(config)

print(df.shape)
assert df.shape == (48, 1)
assert df.index.name == "Time (UTC)"
```

### entsoe_test_data_loader { #spotforecast2_safe.data.entsoe_loader.entsoe_test_data_loader }

```python
data.entsoe_loader.entsoe_test_data_loader(config)
```

Return the merged ENTSO-E CSV sliced to the forecast horizon.

The slice spans ``(end_train, end_train + predict_size * 1 h]`` so that
``build_prediction_package``'s ``test_actual = ts.reindex(future_pred.index)``
matches the hourly forecast row-for-row. ``end_train`` is taken from
``config.end_train_default`` (treated as the *inclusive* last training
timestamp, the same convention the forecaster uses), and the step is
assumed to be 1 h after the pipeline's hourly resampling.

For the live ENTSO-E exemplar with ``end_train_default = D-2 23:00 UTC``
and ``predict_size = 24``, this returns the rows for
``[D-1 00:00, D 00:00)`` — i.e., ``y_{-1}``. For backtests at an arbitrary
``end_train_default``, it returns the post-cutoff window the model is
actually predicting, rather than always "yesterday in wall-clock UTC".

#### Parameters {.doc-section .doc-section-parameters}

| Name | Type | Description | Default |
|--------|----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| config | [ConfigEntsoe](`spotforecast2_safe.configurator.ConfigEntsoe`) | A `ConfigEntsoe` with ``data_filename``, ``end_train_default``, and ``predict_size`` set; the merged interim CSV must already contain data covering the forecast horizon (run ``spotforecast2-entsoe download`` first). | _required_ |

#### Returns {.doc-section .doc-section-returns}

| Name | Type | Description |
|--------|------------------------------------------------|------------------------------------------------------------------------|
| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | DataFrame indexed by ``Time (UTC)`` with the rows the forecast will be |
| | [pd](`pandas`).[DataFrame](`pandas.DataFrame`) | scored against. |

#### Examples {.doc-section .doc-section-examples}

```{python}
import os
import tempfile

import pandas as pd
from spotforecast2_safe.configurator import ConfigEntsoe
from spotforecast2_safe.data.entsoe_loader import entsoe_test_data_loader

# Synthetic interim CSV spanning the forecast window.
tmp = tempfile.mkdtemp()
csv_path = os.path.join(tmp, "energy_load.csv")
idx = pd.date_range(
"2025-12-29 00:00", periods=120, freq="h", tz="UTC", name="Time (UTC)"
)
pd.DataFrame({"Actual Load": range(120)}, index=idx).to_csv(csv_path)

config = ConfigEntsoe()
config.data_filename = csv_path
config.end_train_default = "2025-12-31 00:00+00:00"
config.predict_size = 24

test_df = entsoe_test_data_loader(config)

# The slice covers exactly predict_size hourly steps after end_train.
print(test_df.shape)
assert test_df.shape == (24, 1)
assert test_df.index[0] == pd.Timestamp("2025-12-31 01:00", tz="UTC")
```
1 change: 1 addition & 0 deletions docs/reference/index.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ Utilities for fetching and loading time series, weather, and holiday data.
| [data.fetch_data.load_day_ahead_price](data.fetch_data.load_day_ahead_price.qmd#spotforecast2_safe.data.fetch_data.load_day_ahead_price) | Load the ENTSO-E day-ahead spot price (DE/LU) as an hourly series. |
| [data.data_classes](data.data_classes.qmd#spotforecast2_safe.data.data_classes) | Data structures for input and processed data. |
| [data.demo_loader](data.demo_loader.qmd#spotforecast2_safe.data.demo_loader) | Demo data loader for safety-critical forecasting tasks. |
| [data.entsoe_loader](data.entsoe_loader.qmd#spotforecast2_safe.data.entsoe_loader) | ENTSO-E interim-CSV data loaders. |

## Preprocessing

Expand Down
6 changes: 6 additions & 0 deletions src/spotforecast2_safe/data/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,10 @@

from spotforecast2_safe.data.data_classes import Data, Period
from spotforecast2_safe.data.demo_loader import load_actual_combined
from spotforecast2_safe.data.entsoe_loader import (
entsoe_data_loader,
entsoe_test_data_loader,
)
from spotforecast2_safe.data.fetch_data import (
fetch_data,
fetch_holiday_data,
Expand All @@ -14,6 +18,8 @@
__all__ = [
"Data",
"Period",
"entsoe_data_loader",
"entsoe_test_data_loader",
"fetch_data",
"fetch_holiday_data",
"fetch_weather_data",
Expand Down
Loading
Loading