A high-performance machine learning system designed to predict track popularity by analyzing acoustic features. Built with a Voting Ensemble architecture and a real-time Spectral DNA visualization dashboard.
graph TD
A[Music Input: File/URL] --> B[FastAPI Backend]
B --> C[Audio Feature Extraction]
C --> D[Feature Normalization]
D --> E{Voting Ensemble Logic}
E --> F[RandomForest]
E --> G[XGBoost]
E --> H[AdaBoost/KNN/DT]
F & G & H --> I[Consensus Result]
I --> J[React Analytics Dashboard]
J --> K[Spectral DNA Visualizer]
cd website/frontend
npm install && npm run dev# Recommended: Create a virtual environment first
cd website/backend
pip install -r requirements.txt
uvicorn main:app --reloadExplore the full model development lifecycle in the model-code/ directory.
# Launch the Jupyter research pipeline
jupyter notebook model-code/MusicPredictor_Pipeline.ipynb- Multi-Input Ingestion: Process local
.mp3/.wavfiles or analyze any YouTube track directly via URL. - Ensemble Voting Consensus: Predictions are powered by the combined intelligence of 5 distinct models, ensuring high reliability (84.5% Baseline Accuracy).
- Spectral DNA Visualization: A real-time, interactive dashboard that visualizes track energy, tempo, and loudness "fingerprints."
- Emerging Track Optimization: Popularity threshold set at 15 to specifically capture rising indie and regional hits.
- Core Engine: Scikit-Learn, XGBoost
- Feature Extraction: Librosa (22,050Hz Sampling)
- Serialization: Joblib (Optimized for Large Weights)
- Class Balancing: Under-sampling algorithm (50:50 Target Ratio)
- Frontend: React 19, TypeScript, Vite
- UI/UX Engine: Framer Motion (Micro-animations), Lucide Icons
- Backend API: FastAPI (Python 3.10+), Pydantic
- Audio Processing: FFmpeg + yt-dlp (YouTube Extraction)
Our research framework is designed to move beyond generic binary classification by optimizing for Emerging Hits (tracks with growing momentum but limited mainstream airplay).
To ensure the integrity of the ensemble's learning environment, we implemented a 4-stage preprocessing pipeline:
- Null-Value Sanitation: Massive dataset cleanup by removing any samples with incomplete metadata or missing acoustic features.
- Acoustical Signature Validation: Filtering for tracks with
energy > 0.1. This eliminates "silent" assets, podcasts, and low-energy noise, ensuring the model only learns from actual musical structures. - Feature Scaling (StandardScaler): Since acoustic metrics vary wildly in range (e.g., Loudness is measured in decibels, Tempo in BPM), we applied Z-score Normalization. This prevents features with larger numerical ranges from overpowering the model during training.
- Threshold Calibration: Established at Popularity: 15 to capture the nuance of emerging regional hits while maintaining high predictive precision for independent labels.
The system transforms raw audio into a high-dimensional vector for the Ensemble Model. Key features extracted using librosa and pyloudnorm include:
| Feature Group | Description | Analytic Purpose |
|---|---|---|
| Tempo (BPM) | Dynamic Tempo Estimation | Correlates with track energy and audience engagement (Danceability). |
| Spectral Centroid | Mean Frequency Center | Identifies "Brightness". Professional hits generally have balanced high-frequency energy. |
| RMS Energy | Root Mean Square Energy | Direct proxy for track "Power" and intensity (Targeted -14.0 LUFS). |
| Key & Mode | Chroma CQT Correlation | Detects harmonic compatibility (e.g., Major keys are statistically prevalent in Top 40). |
| Duration (ms) | Temporal Metadata | Modern hits optimize for the 2:30 - 3:15 window for streaming retention. |
| Loudness | Integrated ITU-R Norm | Normalized to streaming industry standards (-14.0 LUFS) for fair comparison. |
To eliminate majority-class bias (where a model might simply guess the more common "Flop" label), we implemented a 1:1 Undersampling strategy. We strictly balanced the training set to ensure the model identifies the true structural patterns of success.
| Stage | Hits (Pop β₯ 15) | Flops (Pop < 15) | Total Samples |
|---|---|---|---|
| Initial Collection | 87,692 | 26,308 | 114,000 |
| Balanced Training Set | 26,308 | 26,308 | 52,616 |
We utilized a Stratified 80/20 Train-Test Split. This ensures that the 50:50 class balance is perfectly preserved in both the training phase and the validation phase, preventing accidental sampling bias during performance evaluation.
We utilize a Voting Consensus Mechanism to minimize variance and increase prediction robustness across different genres:
- RandomForest: Captures complex non-linear feature interactions.
- XGBoost: Gradient-boosted decision trees for precision.
- AdaBoost: Iteratively focuses on difficult-to-classify samples.
- K-Nearest Neighbors (KNN): Relies on local structural similarities.
- DecisionTree: Provides the foundational logical framework.
The training phase is strictly version-controlled within model-code/MusicPredictor_Pipeline.ipynb, which handles:
- Standard Scaling: All features are Z-score normalized before injection.
- Model Checkpointing: Optimized models are exported as binary
.pklfiles for instant production inference. - Evaluation: Each model is validated using an 80/20 stratified split to maintain class balance in testing.
The live platform combines these 5 models using Majority Voting Logic. A song is predicted as a "HIT" only when the ensemble reaches a consensus, significantly reducing false positives (False Discovery Rate).
The prediction engine is designed to run on Hugging Face Spaces (Docker environment) to handle resource-heavy audio processing.
- Environment: Docker-based FastAPI container.
- Secret Config: Requires a
YT_COOKIESsecret to maintain stable YouTube access. - DNS Patching: Includes a custom patch in
app.pyfor routing YouTube traffic through hardcoded IPs to bypass cloud network restrictions. - Full Guide: Detailed technical specs, Docker configuration, and secret management can be found in deployment_hf/DEPLOYMENT_GUIDE.md.
- website/: The live analytics platform (React + FastAPI).
- model-code/: Core research assets (Notebooks, 114k Dataset, and Exported Models).
- deployment_hf/: Production Docker deployment container for Hugging Face.
Β© 2026 AxelS27 | Advanced Machine Learning Project