Easy Pandas DataFrame-Access to publicly available electric load forecasting datasets
padelf provides a minimal Python API to download, cache, and standardize
electric load forecasting datasets for research. Every dataset is returned as a
pandas DataFrame with a UTC DateTimeIndex and a standardized consumption_kW
column.
pip install padelfimport padelf
# See what's available
padelf.list_datasets()
# ['AEMO', 'ELD', 'ENTSO-E', 'GEFCOM12', 'IHPC', 'ISO-NE', 'NYISO', 'OPSD', 'Pecan-Street', 'RTE-France', 'VEA']
# Load a dataset — one line, sensible defaults
df = padelf.get_dataset("OPSD")
print(df.head())Output:
consumption_kW DE_solar_generation_actual DE_wind_onshore_generation_actual
datetime
2015-01-01 00:00:00+00:00 41209.0 NaN 7568.0
2015-01-01 01:00:00+00:00 40029.0 NaN 7666.0
2015-01-01 02:00:00+00:00 38891.0 NaN 7637.0
Every call to get_dataset() returns a DataFrame with:
- DateTimeIndex — UTC timezone, equidistant at the dataset's native resolution
consumption_kW— Load/consumption column, unit-converted to kilowatts- Additional columns — As available in the original dataset (e.g., temperature, solar generation)
df = padelf.get_dataset(
"OPSD",
resolution="15min", # Resample to 15-minute intervals
consumption_unit="MW", # Keep original MW units
interpolate_limit="4h", # Fill gaps up to 4 hours
cache_dir="/tmp/padelf", # Custom cache location
)| Dataset | Abbreviation | Resolution | Region | Status |
|---|---|---|---|---|
| Open Power System Data | OPSD | 60 min | Europe | ✅ Ready |
| Individual Household Power Consumption | IHPC | 1 min | France | ✅ Ready |
| ElectricityLoadDiagrams20112014 | ELD | 15 min | Portugal | ✅ Ready |
| 5359 industrial VEA load profiles | VEA | 15 min | Germany | ✅ Ready |
| GEFCom 2012 | GEFCOM12 | 60 min | US | ⏩ Planned |
| ENTSO-E Transparency | ENTSO-E | 60 min | Europe | ⏩ Planned |
| ISO New England | ISO-NE | 60 min | US | ⏩ Planned |
| NYISO | NYISO | 5 min | US | ⏩ Planned |
| AEMO | AEMO | 60 min | Australia | ⏩ Planned |
| RTE France | RTE-France | 30 min | France | ⏩ Planned |
| Pecan Street | Pecan Street | 15 min | US | ⏩ Planned |
The project uses a src layout with per-dataset YAML configs:
.
├── README.md
├── pyproject.toml # Build config (hatchling backend)
├── mkdocs.yml # Documentation site config
├── LICENSE
├── src/padelf/
│ ├── __init__.py # Public API: list_datasets(), get_dataset()
│ ├── loader.py # Core loader logic: download, cache, parse, standardize
│ ├── utils.py # Unit conversion, gap interpolation, resampling
│ └── configs/
│ ├── _template.yaml # Template for new loader configs
│ ├── OPSD.yaml # Ready
│ ├── IHPC.yaml # Ready
│ ├── ELD.yaml # Ready
│ ├── VEA.yaml # Ready
│ ├── GEFCOM12.yaml # Ready (source URL intermittent)
│ ├── ENTSO-E.yaml # API placeholder
│ ├── ISO-NE.yaml # API placeholder
│ ├── NYISO.yaml # API placeholder
│ ├── AEMO.yaml # API placeholder
│ ├── RTE-France.yaml # API placeholder
│ └── Pecan-Street.yaml # API placeholder
├── docs/ # mkdocs source files
│ ├── index.md
│ ├── getting-started.md
│ ├── api.md
│ └── datasets.md
└── tests/
├── test_loader.py
├── test_utils.py
└── test_smoke.py
The loader architecture follows a per-dataset config pattern. Each YAML file in src/padelf/configs/ defines a dataset's download URL, file format, column mappings, unit, and preprocessing parameters. When get_dataset() is called, loader.py reads the corresponding config, downloads the file (or uses a local cache), parses it, and applies standardization via utils.py: the load column is renamed to consumption_kW with automatic unit conversion (MW, kWh, MWh to kW), the index is converted to an equidistant UTC DateTimeIndex, gaps up to 2 hours are interpolated by default, and optional resampling is applied if requested. Datasets flagged with requires_api: true in their config raise NotImplementedError with a descriptive message -- these are placeholders for future implementation.
See the Loader Developer Guide for details on the loader architecture and how to add new datasets.
Six datasets (ENTSO-E, ISO-NE, NYISO, AEMO, RTE-France, Pecan-Street) are currently configured as API placeholders. Their YAML configs exist with requires_api: true, and calling get_dataset() on them raises NotImplementedError. To convert a placeholder into a working loader, remove the requires_api flag and either provide a direct download URL or implement API-specific download logic in loader.py. Note that ENTSO-E and ISO-NE have direct CSV downloads available and could be implemented as file-based loaders without API integration.
To explore more datasets, check out the original PADELF Repository.
If this work has helped you with your scientific work, we would appreciate a proper mention. ❤️
@inproceedings{baur2024datasets,
title = {Publicly Available Datasets For Electric Load Forecasting -- An Overview},
author = {Baur, Lukas and Chandramouli, Vignesh and Sauer, Alexander},
booktitle = {6th Conference on Production Systems and Logistics (CPSL 2024)},
year = {2024},
doi = {10.15488/17659}
}See the Changelog File
MIT

