Skip to content

srujanc09/WhartonDataScience

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Wharton HS Data Science Competition 2026

Team: School:

Project Structure

data/raw/ -> original competition datasets
outputs/tables/ -> generated CSV results
outputs/figures/ -> generated plots
src/ -> analysis scripts

Analysis Pipeline

run_all.py Execution Order

  1. part0_dataset_validation.py - Validate schema, ranges, and season structure
  2. part2_game_level.py - Aggregate line data to game-level statistics
  3. part3_league_table.py - Build league standings and team metrics
  4. part4_matchup_model.py - Train playoff matchup prediction model
  5. part5_line_disparity.py - Identify top 10 teams by line disparity
  6. part6_visualization.py - Create line disparity vs strength plot
  7. part8_probability_calibration.py - Evaluate model calibration
  8. part9_line_disparity_robustness.py - Test ranking stability across metrics
  9. part10_model_diagnostics.py - Compare model against baseline
  10. part14_model_stability_uncertainty.py - Extended model comparison + matchup uncertainty
  11. part15_disparity_defadj_error_analysis.py - Defensive adjustment + error pattern analysis
  12. part17_power_rank_improved.py - Final submission-quality power rankings
  13. part16_round1_calibration.py (optional) - Runs only when actual_winner exists in round1_matchup_probs.csv

Additional Utilities

  • part11_reproducibility_run.py - Clear outputs and regenerate from scratch
  • part12_final_audit_packager.py - Model audit and form-ready output packaging
  • part13_interpretability_insights.py - Optional interpretability report
  • part19_spearman_rank_evaluation.py - Spearman ranking alignment checks

How to Run

Quick Start (All Scripts)

python src/run_all.py

Final Pre-Submission Check

# Clear all outputs and regenerate from scratch
python src/part11_reproducibility_run.py

# Run model audit and generate form-ready files
python src/part12_final_audit_packager.py

Individual Scripts

python src/part0_dataset_validation.py
python src/part2_game_level.py
python src/part3_league_table.py
python src/part4_matchup_model.py
python src/part5_line_disparity.py
python src/part6_visualization.py
python src/part8_probability_calibration.py
python src/part9_line_disparity_robustness.py
python src/part10_model_diagnostics.py
python src/part14_model_stability_uncertainty.py
python src/part15_disparity_defadj_error_analysis.py
python src/part17_power_rank_improved.py
python src/part16_round1_calibration.py
python src/part19_spearman_rank_evaluation.py

Key Outputs

Phase 1 Submission Files

  • outputs/tables/power_rankings_final.csv - Team power rankings (1-32)
  • outputs/tables/round1_matchup_probs.csv - Playoff matchup win probabilities
  • outputs/tables/top10_line_disparity.csv - Top 10 teams by line disparity
  • outputs/figures/line_disparity_vs_strength.png - Visualization

Form-Ready Files (Part 12)

  • outputs/tables/power_rank_form_entry.txt - Numbered team list for form entry
  • outputs/tables/line_disparity_form_entry.txt - Numbered disparity list for form entry
  • outputs/tables/matchup_probs_form_entry.csv - Matchup predictions with slots

Analysis & Diagnostics

  • outputs/tables/calibration_table.csv - Model calibration statistics
  • outputs/tables/line_disparity_robustness.csv - Robustness analysis
  • outputs/tables/model_vs_baseline_metrics.csv - Model comparison
  • outputs/tables/cv_model_audit.csv - 5-fold CV model audit
  • outputs/tables/model_comparison_extended.csv - Extended model comparison (Part 14)
  • outputs/tables/matchup_uncertainty_extended.csv - Extended matchup uncertainty labels
  • outputs/tables/line_disparity_def_adj.csv - Defensive-adjusted disparity for all teams
  • outputs/tables/error_pattern_analysis.csv - Error pattern summary
  • outputs/tables/matchup_uncertainty_analysis.csv - Playoff matchup uncertainty scores
  • outputs/tables/confident_error_summary.csv - Confident prediction error patterns
  • outputs/tables/playoff_team_archetypes.csv - Team classifications by strength/depth
  • outputs/figures/probability_calibration.png - Calibration plot
  • outputs/figures/probability_distribution.png - Prediction distribution
  • outputs/figures/probability_residuals.png - Residual analysis

Reports

  • outputs/reports/final_audit_report.md - Comprehensive final audit and status
  • outputs/reports/interpretability_summary.md - Interpretability insights and team archetypes

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages