Add validation/test split and evaluate final model performance by fquaglio · Pull Request #26 · jiwidi/time-series-forecasting-with-python

fquaglio · 2026-05-07T13:13:19Z

Overview

The original notebook presents an end-to-end workflow for time series forecasting, covering data processing, feature engineering, and model comparison across both statistical and machine learning approaches.

The training set is used to train the models while the test set is directly used for performance evaluation and model comparison. However, an explicit final model selection step with an independent evaluation on unseen data was not included.

Key improvements introduced

In this contribution, a more structured evaluation strategy has been introduced for the pollution time series use case:

A clear train/validation/test split has been defined. This separation provides a more realistic estimate of model generalization.
The validation set is used for model selection and comparison across different models, while the test set is reserved exclusively for the final evaluation of the selected model.
The best-performing model is selected and evaluated separately on unseen test data, with final performance metrics reported.

Note: the original naming convention was preserved for compatibility with the existing notebook structure. Specifically, df_test refers now to the validation set and df_test2 refers to the final test set.

All modifications were implemented in the main notebook time-series-forecasting-tutorial.ipynb, which contains the full analysis pipeline, model comparison, and final evaluation.

The environment.yml file was updated to ensure compatibility between package versions and improve reproducibility of the project setup across different environments.

Model selection and results

Based on validation results across all tested models, LightGBM was selected as the final model due to its performance and lower complexity compared to ensemble approaches, which showed only marginal improvements.

Additional improvements

Early stopping was applied to boosting models (LightGBM and XGBoost) to improve training stability and reduce overfitting.
Residual analysis was performed to better understand model behavior, highlighting good overall fit but reduced accuracy on extreme pollution spikes.
Evaluation metrics (MAE, RMSE, MAPE, R²) were computed and stored separately for validation and final test evaluation.
Minor fixes were applied to several plots to improve readability and consistency.
Units of measurement were added to the first time series plot in order to provide a clearer interpretation of the physical quantities.

Outputs

Validation results for all models: results/results_summary.csv
Final test evaluation for selected model: results/final_scores.csv

fquaglio added 6 commits May 7, 2026 08:15

Update environment configuration

48c09f4

Add validation split and fix minor issues

8e9d45e

Add early stopping and final best model evaluation

74dd446

Improve README with evaluation pipeline explanation

5e33809

comment on hist

66fd654

comment on hist

6ac7ccf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add validation/test split and evaluate final model performance#26

Add validation/test split and evaluate final model performance#26
fquaglio wants to merge 6 commits into
jiwidi:masterfrom
fquaglio:improve-validation-pipeline

fquaglio commented May 7, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

fquaglio commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Key improvements introduced

Model selection and results

Additional improvements

Outputs

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

fquaglio commented May 7, 2026 •

edited

Loading