Skip to content

bcd532/bitc

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

bitc

Plain C inference for PrismML's 1-bit Bonsai Qwen3-based GGUF models. Reads Q1_0_g128 quantized weights and runs the forward pass. No dependencies beyond libc and libm.

What it do?

Nothing here is new. This is just a from scratch C implementation written to learn how inference works. The architecture is Qwen3, the file format is GGUF, the quantization scheme is Q1_0_g128, RoPE uses YaRN scaling. All of this exists in llama.cpp and is the source of truth. If you want a real engine, use that.

Build!

gcc -O3 -o bitc main.c gguf.c model.c inference.c tokenizer.c -lm

Run!

Drop a Bonsai-1.7B.gguf into models/ and run:

./bitc

It will prompt for input and stream a reply.

Whats next?

I have plans to learn more about making inferences and figuring out other weird things i can try.

Files!

  • gguf.{c,h}: GGUF parser
  • model.{c,h}: Qwen3 weight loading
  • inference.{c,h}: forward pass, RoPE, attention, SwiGLU, sampling
  • tokenizer.{c,h}: BPE merges + byte-level encoding
  • main.c: chat loop with the <|im_start|>/<|im_end|> template

About

Plain C inference for 1-bit Bonsai (Q1_0_g128) GGUF models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages