This project keeps the core decoder-only Transformer idea from traditional LLMs, but changes the tokenization and objective for numeric time-series forecasting.
- Token representation: Instead of text tokens, each timestep is a numeric feature vector (for example
close, with optional OHLC/volume-style channels). - Embedding step: A linear layer projects each timestep vector into
d_model, analogous to token embeddings in LLMs. - Positional information: Sinusoidal positional encodings inject temporal order into the sequence.
- Causal decoding: Stacked decoder blocks use masked self-attention so each position can only attend to current/past timesteps (no future leakage).
- Prediction head: The final projection maps hidden states to numeric output channels, producing next-step price prediction(s) rather than next-word probabilities.
- Training objective: The model is trained on sliding windows from historical CSV data with regression loss (MSE), not language-model cross-entropy.
Personal POC project for experimenting with a decoder-only Transformer on stock time-series data.
The current codebase focuses on:
- building sliding-window training samples from CSV price data
- training a Transformer to predict the next value (currently
closeby default) - validating saved checkpoints on a held-out CSV
- experimenting with data collection/generation scripts
.
├── main.py # training/validation entrypoint
├── model.py # decoder-only Transformer implementation
├── data_ingest.py # sliding-window Dataset for CSV files
├── requirements.txt # minimal dependencies list (currently incomplete)
├── test.py # scratch inference snippet
└── data_collection/
├── data_collection_fake.py # generates dummy CSV stock data
├── datal_collection_tv.py # TradingView websocket collector (experimental)
├── data_collection_finnhub.py # TODO scaffold
├── data_collection_twelve_data.py # TODO scaffold
└── data_collection_kaggle.py # TODO scaffold