Skip to content

ellyseum/tinygpt

Repository files navigation

TinyGPT

A complete GPT transformer built from scratch in pure JavaScript. No frameworks. No libraries. No dependencies. Just math.

CI License: MIT Node.js Zero Dependencies Parameters WebGPU PRs Welcome

Train it on any text, watch it learn character by character in a live dashboard, then generate text interactively. The architecture is identical to GPT-2/3/4 at toy scale: multi-head causal attention, pre-norm residual blocks, GELU activations, AdamW optimizer. Everything a real transformer does — visible and hackable.

TinyGPT Training Dashboard


Table of Contents

Quick Start

npx tinygpt

Opens a live training dashboard at http://localhost:8080. Training takes 2–5 minutes on a modern CPU.

Usage

npx tinygpt                     # Train on Shakespeare, open web dashboard
npx tinygpt mytext.txt          # Train on your own text
npx tinygpt --cli               # Train in terminal (no server)
npx tinygpt --cli mytext.txt    # CLI mode with custom data
npx tinygpt --port=3000         # Custom port for web dashboard

What You Get

Live training dashboard with real-time updates via SSE:

  • 📈 Loss curve that drops as the model learns
  • Generated samples updating every 10 steps so you can watch coherence emerge
  • 🧠 Architecture panel with tooltips explaining every hyperparameter
  • 🔤 Vocabulary display showing the character set the model learned

After training completes:

  • 🎛️ Interactive prompt input with adjustable temperature (0.1 – 2.0)
  • 🔥 Attention heatmaps per layer and head, showing which characters attend to which
  • 📊 Next-character probability bars visualizing the model's confidence
  • 🎯 Embedding scatter plot (PCA) showing how characters cluster in learned space
  • 🌡️ Side-by-side final samples at different temperatures

Architecture

The model is a 4-layer decoder-only transformer (~214K parameters):

Component Size
Embedding dimension 64
Attention heads 4 (16-dim each)
Transformer layers 4
Context window 128 characters
FFN intermediate 256
Vocabulary Dynamic (unique chars in training text)

Each layer:

LayerNorm → Multi-Head Causal Attention → Residual
        → LayerNorm → FFN (GELU) → Residual

Training: AdamW optimizer, batch size 32, learning rate 3e-4, 1000 steps, cross-entropy loss.

This is the same architecture as GPT-2/3/4. The only difference is scale.

How It Works

characters → token embeddings + position embeddings
          → 4 × transformer block
          → final LayerNorm
          → output projection
          → softmax → loss / sample

Forward pass: characters → token embeddings + position embeddings → 4 transformer blocks → logits → softmax → loss.

Backward pass: full reverse-mode backpropagation through every layer, including the attention softmax Jacobian.

Generation: autoregressive sampling with temperature scaling and a sliding context window.

Training runs in a worker thread so the dashboard stays responsive. The main thread serves the UI and streams training updates over Server-Sent Events.

The code is thoroughly commented. Every function explains what it does and why. The WGSL shaders have descriptive variable names and ELI5 explanations.

WebGPU Acceleration

The web dashboard ships with 18 hand-written WGSL compute shaders for GPU-accelerated inference:

  • Matrix multiply, GELU, LayerNorm, attention scores, softmax, cross-entropy
  • Falls back to CPU automatically if WebGPU isn't available
  • Training always runs on CPU (worker thread); GPU is used for post-training generation

Zero Dependencies

This is a load-bearing claim. The runtime ships in two files (gpt.mjs + index.html) and pulls nothing at install time:

$ npm ls --prod
tinygpt@1.0.0
└── (empty)

CI enforces this on every push — if dependencies becomes non-empty, the build fails.

devDependencies (Playwright, used for headless integration tests) are only installed when you clone for development. End users running npx tinygpt get nothing but the two files.

Files

File What
gpt.mjs Node.js entry point — training loop, HTTP server, SSE, worker thread, CLI mode
index.html Web dashboard — loss chart, visualizations, WebGPU shaders, interactive generation

Two files. That's it.

Building from Source

There's no build step. It's vanilla JavaScript.

git clone https://github.com/ellyseum/tinygpt.git
cd tinygpt
node gpt.mjs

Requires Node.js 18+ (uses worker threads, ESM).

Performance

Mode Hardware Approximate time
Training (1000 steps) Modern CPU (8-core) 2–5 min
Inference (CPU) Modern CPU ~50ms / token
Inference (WebGPU) Modern discrete GPU ~5ms / token

The CPU↔GPU gap grows with vocabulary size and context length. For a ~214K-param model with a 128-char window, the GPU win is real but not dramatic — the point of WebGPU here is to demonstrate that the same forward pass is implementable in WGSL, not to chase records.

Contributing

PRs welcome. The bar:

  1. Zero runtime dependencies — CI enforces this.
  2. Two-file runtime — keep gpt.mjs + index.html as the deliverable.
  3. Comment the math — if you add a layer or shader, explain why it works.

See CONTRIBUTING.md for the full checklist.

License

MIT © 2026 Ellyseum

About

Zero-dependency GPT transformer in pure JavaScript. ~214K params. Live training dashboard with WebGPU acceleration.

Resources

License

Contributing

Stars

Watchers

Forks

Packages

 
 
 

Contributors