Generative Stock Prediction

Methodology: Adapting an LLM Decoder to Stock Prices

This project keeps the core decoder-only Transformer idea from traditional LLMs, but changes the tokenization and objective for numeric time-series forecasting.

Token representation: Instead of text tokens, each timestep is a numeric feature vector (for example close, with optional OHLC/volume-style channels).
Embedding step: A linear layer projects each timestep vector into d_model, analogous to token embeddings in LLMs.
Positional information: Sinusoidal positional encodings inject temporal order into the sequence.
Causal decoding: Stacked decoder blocks use masked self-attention so each position can only attend to current/past timesteps (no future leakage).
Prediction head: The final projection maps hidden states to numeric output channels, producing next-step price prediction(s) rather than next-word probabilities.
Training objective: The model is trained on sliding windows from historical CSV data with regression loss (MSE), not language-model cross-entropy.

Personal POC project for experimenting with a decoder-only Transformer on stock time-series data.

The current codebase focuses on:

building sliding-window training samples from CSV price data
training a Transformer to predict the next value (currently close by default)
validating saved checkpoints on a held-out CSV
experimenting with data collection/generation scripts

Project Structure

.
├── main.py                         # training/validation entrypoint
├── model.py                        # decoder-only Transformer implementation
├── data_ingest.py                  # sliding-window Dataset for CSV files
├── requirements.txt                # minimal dependencies list (currently incomplete)
├── test.py                         # scratch inference snippet
└── data_collection/
    ├── data_collection_fake.py     # generates dummy CSV stock data
    ├── datal_collection_tv.py      # TradingView websocket collector (experimental)
    ├── data_collection_finnhub.py  # TODO scaffold
    ├── data_collection_twelve_data.py # TODO scaffold
    └── data_collection_kaggle.py   # TODO scaffold

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Generative Stock Prediction

Methodology: Adapting an LLM Decoder to Stock Prices

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data_collection		data_collection
.gitignore		.gitignore
README.md		README.md
data_ingest.py		data_ingest.py
main.py		main.py
model.py		model.py
requirements.txt		requirements.txt
test.py		test.py

Folders and files

Latest commit

History

Repository files navigation

Generative Stock Prediction

Methodology: Adapting an LLM Decoder to Stock Prices

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages