AtomicBot-ai / atomic-llama-cpp-turboquant Public

forked from TheTom/llama-cpp-turboquant

Notifications You must be signed in to change notification settings
Fork 31
Star 244

Code
Pull requests 13
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Pull requests
Actions
Projects
Security and quality
Insights

Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant

Labels 31 Milestones 0

New pull request New

13 Open 11 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

gemma4_assistant: protect n_layer_kv_from_start against shared_kv_layers == n_layer

#24 opened Jun 4, 2026 by PhilEgly

Loading…

[codex] Add a reserve-only Gemma4 MTP reserve context documentation

Improvements or additions to documentation

examples server

#23 opened Jun 1, 2026 by nycdubliner • Draft

[codex] Add speculative draft Prometheus metrics examples server

#22 opened May 31, 2026 by nycdubliner • Draft

docs(turbo-quant): TURBO_LAYER_ADAPTIVE mode 7 validation on Pi16 ARM documentation

Improvements or additions to documentation

#21 opened May 25, 2026 by WillowOneVision

Loading…

5 of 6 tasks

docs: Gemma-4-E4B-It perplexity benchmark — turbo4 vs F16 cross-corpus documentation

Improvements or additions to documentation

#20 opened May 24, 2026 by WillowOneVision

Loading…

3 tasks

Phase C.2 dispatch behavior: MTP+mmproj coexistence behind --allow-mtp-with-mmproj (5th first-in-world) examples server testing

#19 opened May 21, 2026 by WillowOneVision

Loading…

4 of 9 tasks

Phase C.2 foundational APIs: server_tokens coexistence + common_speculative_reset examples server testing

#18 opened May 21, 2026 by WillowOneVision

Loading…

5 of 6 tasks

fix(server): SEGV when --mtp-head + --mmproj are both passed examples server

#17 opened May 21, 2026 by WillowOneVision

Loading…

4 of 5 tasks

ggml: ARM NEON dequant kernel for turbo4 (vqtbl4q_u8 4-bit PolarQuant) ggml

#16 opened May 21, 2026 by WillowOneVision

Loading…

5 tasks done

fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error ggml

#12 opened May 13, 2026 by sujitvasanth

Loading…

feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone examples ggml server

#8 opened May 11, 2026 by sujitvasanth

Loading…

Enhance CUDA flash attention kernel selection for DKQ=512 with low gq… ggml Nvidia GPU

#6 opened May 8, 2026 by Ooooze

Loading…

Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86 documentation

Improvements or additions to documentation

#5 opened May 8, 2026 by jameseiten • Draft

ProTip! Mix and match filters to narrow down what you’re looking for.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!