TinyGPT

A complete GPT transformer built from scratch in pure JavaScript. No frameworks. No libraries. No dependencies. Just math.

Train it on any text, watch it learn character by character in a live dashboard, then generate text interactively. The architecture is identical to GPT-2/3/4 at toy scale: multi-head causal attention, pre-norm residual blocks, GELU activations, AdamW optimizer. Everything a real transformer does — visible and hackable.

Quick Start

npx tinygpt

Opens a live training dashboard at http://localhost:8080. Training takes 2–5 minutes on a modern CPU.

Usage

npx tinygpt                     # Train on Shakespeare, open web dashboard
npx tinygpt mytext.txt          # Train on your own text
npx tinygpt --cli               # Train in terminal (no server)
npx tinygpt --cli mytext.txt    # CLI mode with custom data
npx tinygpt --port=3000         # Custom port for web dashboard

What You Get

Live training dashboard with real-time updates via SSE:

📈 Loss curve that drops as the model learns
✨ Generated samples updating every 10 steps so you can watch coherence emerge
🧠 Architecture panel with tooltips explaining every hyperparameter
🔤 Vocabulary display showing the character set the model learned

After training completes:

🎛️ Interactive prompt input with adjustable temperature (0.1 – 2.0)
🔥 Attention heatmaps per layer and head, showing which characters attend to which
📊 Next-character probability bars visualizing the model's confidence
🎯 Embedding scatter plot (PCA) showing how characters cluster in learned space
🌡️ Side-by-side final samples at different temperatures

Architecture

The model is a 4-layer decoder-only transformer (~214K parameters):

Component	Size
Embedding dimension	64
Attention heads	4 (16-dim each)
Transformer layers	4
Context window	128 characters
FFN intermediate	256
Vocabulary	Dynamic (unique chars in training text)

Each layer:

LayerNorm → Multi-Head Causal Attention → Residual
        → LayerNorm → FFN (GELU) → Residual

Training: AdamW optimizer, batch size 32, learning rate 3e-4, 1000 steps, cross-entropy loss.

This is the same architecture as GPT-2/3/4. The only difference is scale.

How It Works

characters → token embeddings + position embeddings
          → 4 × transformer block
          → final LayerNorm
          → output projection
          → softmax → loss / sample

Forward pass: characters → token embeddings + position embeddings → 4 transformer blocks → logits → softmax → loss.

Backward pass: full reverse-mode backpropagation through every layer, including the attention softmax Jacobian.

Generation: autoregressive sampling with temperature scaling and a sliding context window.

Training runs in a worker thread so the dashboard stays responsive. The main thread serves the UI and streams training updates over Server-Sent Events.

The code is thoroughly commented. Every function explains what it does and why. The WGSL shaders have descriptive variable names and ELI5 explanations.

WebGPU Acceleration

The web dashboard ships with 18 hand-written WGSL compute shaders for GPU-accelerated inference:

Matrix multiply, GELU, LayerNorm, attention scores, softmax, cross-entropy
Falls back to CPU automatically if WebGPU isn't available
Training always runs on CPU (worker thread); GPU is used for post-training generation

Zero Dependencies

This is a load-bearing claim. The runtime ships in two files (gpt.mjs + index.html) and pulls nothing at install time:

$ npm ls --prod
tinygpt@1.0.0
└── (empty)

CI enforces this on every push — if dependencies becomes non-empty, the build fails.

devDependencies (Playwright, used for headless integration tests) are only installed when you clone for development. End users running npx tinygpt get nothing but the two files.

Files

File	What
`gpt.mjs`	Node.js entry point — training loop, HTTP server, SSE, worker thread, CLI mode
`index.html`	Web dashboard — loss chart, visualizations, WebGPU shaders, interactive generation

Two files. That's it.

Building from Source

There's no build step. It's vanilla JavaScript.

git clone https://github.com/ellyseum/tinygpt.git
cd tinygpt
node gpt.mjs

Requires Node.js 18+ (uses worker threads, ESM).

Performance

Mode	Hardware	Approximate time
Training (1000 steps)	Modern CPU (8-core)	2–5 min
Inference (CPU)	Modern CPU	~50ms / token
Inference (WebGPU)	Modern discrete GPU	~5ms / token

The CPU↔GPU gap grows with vocabulary size and context length. For a ~214K-param model with a 128-char window, the GPU win is real but not dramatic — the point of WebGPU here is to demonstrate that the same forward pass is implementable in WGSL, not to chase records.

Contributing

PRs welcome. The bar:

Zero runtime dependencies — CI enforces this.
Two-file runtime — keep gpt.mjs + index.html as the deliverable.
Comment the math — if you add a layer or shader, explain why it works.

See CONTRIBUTING.md for the full checklist.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
.github/workflows		.github/workflows
.gitattributes		.gitattributes
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
gpt.mjs		gpt.mjs
index.html		index.html
package.json		package.json
screenshot.png		screenshot.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TinyGPT

Table of Contents

Quick Start

Usage

What You Get

Architecture

How It Works

WebGPU Acceleration

Zero Dependencies

Files

Building from Source

Performance

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

TinyGPT

Table of Contents

Quick Start

Usage

What You Get

Architecture

How It Works

WebGPU Acceleration

Zero Dependencies

Files

Building from Source

Performance

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages