Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ configurator.config_entsoe.ConfigEntsoe(
task='lazy',
agg_weights=None,
forecaster_factory=None,
lgbm_n_jobs=1,
data_loader=None,
test_data_loader=None,
auto_save_models=True,
Expand Down
3 changes: 3 additions & 0 deletions docs/reference/configurator.config_multi.ConfigMulti.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -61,6 +61,7 @@ configurator.config_multi.ConfigMulti(
task='lazy',
agg_weights=None,
forecaster_factory=None,
lgbm_n_jobs=1,
data_loader=None,
test_data_loader=None,
auto_save_models=True,
Expand Down Expand Up @@ -130,6 +131,7 @@ API queries and holiday feature generation.
| poly_features_degree | [int](`int`) | Polynomial-interaction degree. ``1`` (default) generates no interactions; ``2`` adds pairwise bilinear terms; ``3+`` higher order. | `1` |
| max_poly_features | [int](`int`) | Cap on polynomial interaction columns; only the top ``max_poly_features`` ranked by mutual information with the target are kept (``<= 0`` disables). Defaults to ``10``. | `10` |
| poly_mi_n_jobs | [Optional](`typing.Optional`)\[[int](`int`)\] | Parallel jobs for the mutual-information ranking that enforces ``max_poly_features``. ``-1`` (default) uses all cores; ``None`` runs single-threaded. Parallelism does not change the selection. | `-1` |
| lgbm_n_jobs | [int](`int`) | Thread count for the LightGBM estimators built by the lgbm forecaster factories (``LGBMRegressor(n_jobs=...)``). Defaults to ``1`` so the backtester parallelises CV folds across processes instead of relying on LightGBM's in-model OpenMP, which anti-scales on heterogeneous-core CPUs (e.g. Apple Silicon). Set ``-1`` / a larger value on many-core homogeneous machines (e.g. Linux Xeon). | `1` |
| poly_mi_sample_size | [Optional](`typing.Optional`)\[[int](`int`)\] | Row cap for that ranking; longer series are scored on a reproducible random subsample of this size (seeded by ``random_state``), which can change which borderline columns make the top K. ``None`` scores every row (the pre-15.8 behaviour). Defaults to ``4000``. | `4000` |
| index_name | [str](`str`) | Name assigned to the datetime column when the index is reset. Defaults to ``"DateTime"``. | `'DateTime'` |
| bounds | [Optional](`typing.Optional`)\[[List](`typing.List`)\[[tuple](`tuple`)\]\] | Per-column outlier bounds as a list of ``(lower, upper)`` tuples, one entry per target column. ``None`` until set. | `None` |
Expand Down Expand Up @@ -184,6 +186,7 @@ API queries and holiday feature generation.
| poly_features_degree | [int](`int`) | Polynomial-interaction degree (1 = off). |
| max_poly_features | [int](`int`) | Cap on kept ``poly_*`` columns (top-K by MI). |
| poly_mi_n_jobs | [Optional](`typing.Optional`)\[[int](`int`)\] | Parallel jobs for the MI ranking (``-1`` = all cores; selection-invariant). |
| lgbm_n_jobs | [int](`int`) | LightGBM estimator thread count for the lgbm forecaster factories (default ``1``; favours per-fold process parallelism over in-model OpenMP, which anti-scales on Apple Silicon). Raise on many-core homogeneous CPUs. |
| poly_mi_sample_size | [Optional](`typing.Optional`)\[[int](`int`)\] | Row cap for the MI ranking (``None`` = score every row). |
| include_covid_infection_rate | [bool](`bool`) | Append the bundled RKI German national COVID-19 7-day incidence as an exogenous regressor. |
| include_entsoe_forecast_load | [bool](`bool`) | Append the ENTSO-E day-ahead Forecasted Load as a near-oracle exogenous prior. |
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ a signature change.

## Parameters {.doc-section .doc-section-parameters}

| Name | Type | Description | Default |
|-------------|------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| config | [Any](`typing.Any`) | Any object satisfying the ``PipelineConfig`` protocol from ``spotforecast2_safe.multitask.base``. Reads ``random_state``, ``lags_consider``, and ``window_size``. | _required_ |
| weight_func | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Optional per-sample weight function produced by the imputation step (``apply_imputation``). | `None` |
| target | [Optional](`typing.Optional`)\[[str](`str`)\] | Target column name. Ignored by this default factory; provided for the benefit of custom factories that need it. | `None` |
| Name | Type | Description | Default |
|-------------|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------|
| config | [Any](`typing.Any`) | Any object satisfying the ``PipelineConfig`` protocol from ``spotforecast2_safe.multitask.base``. Reads ``random_state``, ``lags_consider``, ``window_size`` and ``lgbm_n_jobs`` (the LightGBM thread count, ``getattr`` default ``1`` for config-like objects that predate the field; see ``ConfigMulti.lgbm_n_jobs``). | _required_ |
| weight_func | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Optional per-sample weight function produced by the imputation step (``apply_imputation``). | `None` |
| target | [Optional](`typing.Optional`)\[[str](`str`)\] | Target column name. Ignored by this default factory; provided for the benefit of custom factories that need it. | `None` |

## Returns {.doc-section .doc-section-returns}

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -20,12 +20,12 @@ sf2-safe (LightGBM only, no torch/optuna). Refs ``hong16b``, ``roma19a``.

## Parameters {.doc-section .doc-section-parameters}

| Name | Type | Description | Default |
|-------------|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------|---------------------|
| config | [Any](`typing.Any`) | Object satisfying the ``PipelineConfig`` protocol; reads ``random_state``, ``lags_consider``, ``window_size``. | _required_ |
| quantiles | [Sequence](`typing.Sequence`)\[[float](`float`)\] | Quantile levels in the open interval ``(0, 1)``. Defaults to ``(0.1, 0.5, 0.9)``. | `DEFAULT_QUANTILES` |
| weight_func | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Optional per-sample weight function. | `None` |
| target | [Optional](`typing.Optional`)\[[str](`str`)\] | Accepted and ignored (parity with the default factory). | `None` |
| Name | Type | Description | Default |
|-------------|------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------|
| config | [Any](`typing.Any`) | Object satisfying the ``PipelineConfig`` protocol; reads ``random_state``, ``lags_consider``, ``window_size`` and ``lgbm_n_jobs`` (LightGBM thread count, ``getattr`` default ``1``). | _required_ |
| quantiles | [Sequence](`typing.Sequence`)\[[float](`float`)\] | Quantile levels in the open interval ``(0, 1)``. Defaults to ``(0.1, 0.5, 0.9)``. | `DEFAULT_QUANTILES` |
| weight_func | [Optional](`typing.Optional`)\[[Any](`typing.Any`)\] | Optional per-sample weight function. | `None` |
| target | [Optional](`typing.Optional`)\[[str](`str`)\] | Accepted and ignored (parity with the default factory). | `None` |

## Returns {.doc-section .doc-section-returns}

Expand Down
21 changes: 21 additions & 0 deletions src/spotforecast2_safe/configurator/config_multi.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,6 +114,12 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``.
ranking that enforces ``max_poly_features``. ``-1`` (default) uses
all cores; ``None`` runs single-threaded. Parallelism does not
change the selection.
lgbm_n_jobs (int): Thread count for the LightGBM estimators built by the
lgbm forecaster factories (``LGBMRegressor(n_jobs=...)``). Defaults
to ``1`` so the backtester parallelises CV folds across processes
instead of relying on LightGBM's in-model OpenMP, which anti-scales
on heterogeneous-core CPUs (e.g. Apple Silicon). Set ``-1`` / a
larger value on many-core homogeneous machines (e.g. Linux Xeon).
poly_mi_sample_size (Optional[int]): Row cap for that ranking; longer
series are scored on a reproducible random subsample of this size
(seeded by ``random_state``), which can change which borderline
Expand Down Expand Up @@ -237,6 +243,10 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``.
max_poly_features (int): Cap on kept ``poly_*`` columns (top-K by MI).
poly_mi_n_jobs (Optional[int]): Parallel jobs for the MI ranking
(``-1`` = all cores; selection-invariant).
lgbm_n_jobs (int): LightGBM estimator thread count for the lgbm
forecaster factories (default ``1``; favours per-fold process
parallelism over in-model OpenMP, which anti-scales on Apple
Silicon). Raise on many-core homogeneous CPUs.
poly_mi_sample_size (Optional[int]): Row cap for the MI ranking
(``None`` = score every row).
include_covid_infection_rate (bool): Append the bundled RKI German
Expand Down Expand Up @@ -451,6 +461,17 @@ class features (``is_workday``, ``day_type``). Defaults to ``False``.
# ``BaseTask.create_forecaster`` falls back to
# ``default_lgbm_forecaster_factory``.
forecaster_factory: Optional[Any] = None
# Thread count for the LightGBM estimators built by the lgbm forecaster
# factories (``default_lgbm_forecaster_factory`` /
# ``quantile_lgbm_forecaster_factory``), forwarded as ``LGBMRegressor(n_jobs=...)``.
# Default ``1``: on heterogeneous-core CPUs (e.g. Apple Silicon's
# performance + efficiency cores) LightGBM's all-core OpenMP *anti-scales*
# (the fork-join barrier stalls on the slow E-cores). With ``n_jobs=1`` the
# backtesting heuristic (``select_n_jobs_backtesting``) instead parallelises
# the CV folds across processes, which scales cleanly. Raise it (or set
# ``-1`` for all cores) on many-core homogeneous machines (e.g. Linux Xeon)
# where in-model threading scales well.
lgbm_n_jobs: int = 1
# Data-loader hook (consumed by ``BaseTask.prepare_data``):
# ``data_loader(config) -> pd.DataFrame``. Invoked iff no DataFrame is
# supplied via the constructor or ``prepare_data``.
Expand Down
4 changes: 4 additions & 0 deletions src/spotforecast2_safe/multitask/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -156,6 +156,10 @@ class PipelineConfig(Protocol):
random_state: int
verbose: bool
cache_home: Optional[Any]
# LightGBM estimator thread count for the lgbm forecaster factories
# (LGBMRegressor(n_jobs=...)); default 1 favours per-fold process
# parallelism over in-model OpenMP. See ConfigMulti.lgbm_n_jobs.
lgbm_n_jobs: int
# Optional callables
forecaster_factory: Optional[Any]
data_loader: Optional[Any]
Expand Down
Loading
Loading