Kernel Forge

Drop-in GPU kernel optimizer for PyTorch models.

Kernel Forge automatically generates and optimizes GPU kernels for PyTorch models with no kernel programming expertise required. It profiles your model at the operator level, uses an LLM to write a correct kernel, then searches for performance improvements using Monte Carlo Tree Search until the kernel beats PyTorch's baseline.

Who is this for?

ML engineers running models in production who want lower inference latency on specific hardware without writing CUDA or Triton by hand.
AI infrastructure teams targeting specific GPU hardware (NVIDIA CUDA or AMD ROCm) who need kernels tuned to that exact device.
Teams with remote GPU access who run optimization on a separate GPU server while managing projects locally.
Researchers benchmarking operator-level speedups across different LLM backends or optimization strategies.
Teams packaging models for deployment who want a self-contained inference artifact with kernels baked in and no runtime dependency on KernelForge.

Features

Automated kernel generation via LLM with compile-error feedback loop
MCTS-driven optimization - explores tiling, loop unrolling, vectorized memory access, and more
CUDA and Triton backends (NVIDIA and AMD ROCm)
Remote execution over SSH - no local GPU required
Multi-LLM support: Anthropic, OpenAI, Google
Web dashboard with live progress, speed charts, and MCTS tree inspector
Portable .anvil snapshots and self-contained .cast inference packages

Full feature details

Quick start

See system requirements before installing.

python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

cd frontend
jac install

Configure your LLM key in the settings panel after starting, or set ANTHROPIC_API_KEY, OPENAI_API_KEY, or GOOGLE_API_KEY before launch.

jac start main.jac

Open http://localhost:8000. Create a project, upload your model weights, and click Start Forge.

CLI

For headless or scripted runs, see docs/cli.md.

Name		Name	Last commit message	Last commit date
Latest commit History 604 Commits
docs		docs
frontend		frontend
kernels/generated		kernels/generated
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
run_cast.py		run_cast.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Kernel Forge

Who is this for?

Features

Quick start

CLI

Further reading

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 4

Languages

Folders and files

Latest commit

History

Repository files navigation

Kernel Forge

Who is this for?

Features

Quick start

CLI

Further reading

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 4

Languages

Packages