Skip to content

Add validation/test split and evaluate final model performance#26

Open
fquaglio wants to merge 6 commits into
jiwidi:masterfrom
fquaglio:improve-validation-pipeline
Open

Add validation/test split and evaluate final model performance#26
fquaglio wants to merge 6 commits into
jiwidi:masterfrom
fquaglio:improve-validation-pipeline

Conversation

@fquaglio
Copy link
Copy Markdown

@fquaglio fquaglio commented May 7, 2026

Overview

The original notebook presents an end-to-end workflow for time series forecasting, covering data processing, feature engineering, and model comparison across both statistical and machine learning approaches.

The training set is used to train the models while the test set is directly used for performance evaluation and model comparison. However, an explicit final model selection step with an independent evaluation on unseen data was not included.


Key improvements introduced

In this contribution, a more structured evaluation strategy has been introduced for the pollution time series use case:

  • A clear train/validation/test split has been defined. This separation provides a more realistic estimate of model generalization.
  • The validation set is used for model selection and comparison across different models, while the test set is reserved exclusively for the final evaluation of the selected model.
  • The best-performing model is selected and evaluated separately on unseen test data, with final performance metrics reported.

Note: the original naming convention was preserved for compatibility with the existing notebook structure. Specifically, df_test refers now to the validation set and df_test2 refers to the final test set.

All modifications were implemented in the main notebook time-series-forecasting-tutorial.ipynb, which contains the full analysis pipeline, model comparison, and final evaluation.

The environment.yml file was updated to ensure compatibility between package versions and improve reproducibility of the project setup across different environments.


Model selection and results

Based on validation results across all tested models, LightGBM was selected as the final model due to its performance and lower complexity compared to ensemble approaches, which showed only marginal improvements.


Additional improvements

  • Early stopping was applied to boosting models (LightGBM and XGBoost) to improve training stability and reduce overfitting.
  • Residual analysis was performed to better understand model behavior, highlighting good overall fit but reduced accuracy on extreme pollution spikes.
  • Evaluation metrics (MAE, RMSE, MAPE, R²) were computed and stored separately for validation and final test evaluation.
  • Minor fixes were applied to several plots to improve readability and consistency.
  • Units of measurement were added to the first time series plot in order to provide a clearer interpretation of the physical quantities.

Outputs

  • Validation results for all models: results/results_summary.csv
  • Final test evaluation for selected model: results/final_scores.csv

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant