forked from TheTom/llama-cpp-turboquant
-
Notifications
You must be signed in to change notification settings - Fork 31
Pull requests: AtomicBot-ai/atomic-llama-cpp-turboquant
Author
Label
Projects
Milestones
Reviews
Assignee
Sort
Pull requests list
gemma4_assistant: protect n_layer_kv_from_start against shared_kv_layers == n_layer
#24
opened Jun 4, 2026 by
PhilEgly
Loading…
[codex] Add a reserve-only Gemma4 MTP reserve context
documentation
Improvements or additions to documentation
examples
server
#23
opened Jun 1, 2026 by
nycdubliner
•
Draft
[codex] Add speculative draft Prometheus metrics
examples
server
#22
opened May 31, 2026 by
nycdubliner
•
Draft
docs(turbo-quant): TURBO_LAYER_ADAPTIVE mode 7 validation on Pi16 ARM
documentation
Improvements or additions to documentation
#21
opened May 25, 2026 by
WillowOneVision
Loading…
5 of 6 tasks
docs: Gemma-4-E4B-It perplexity benchmark — turbo4 vs F16 cross-corpus
documentation
Improvements or additions to documentation
#20
opened May 24, 2026 by
WillowOneVision
Loading…
3 tasks
Phase C.2 dispatch behavior: MTP+mmproj coexistence behind --allow-mtp-with-mmproj (5th first-in-world)
examples
server
testing
#19
opened May 21, 2026 by
WillowOneVision
Loading…
4 of 9 tasks
Phase C.2 foundational APIs: server_tokens coexistence + common_speculative_reset
examples
server
testing
#18
opened May 21, 2026 by
WillowOneVision
Loading…
5 of 6 tasks
fix(server): SEGV when --mtp-head + --mmproj are both passed
examples
server
#17
opened May 21, 2026 by
WillowOneVision
Loading…
4 of 5 tasks
ggml: ARM NEON dequant kernel for turbo4 (vqtbl4q_u8 4-bit PolarQuant)
ggml
#16
opened May 21, 2026 by
WillowOneVision
Loading…
5 tasks done
fix: add missing prototype for turbo_cpu_fwht_inverse to resolve -Wmissing-prototypes CI error
ggml
#12
opened May 13, 2026 by
sujitvasanth
Loading…
feat: one-sided target probability acceptance for MTP drafts increases acceptance rate and throughput compared to argmax alone
examples
ggml
server
#8
opened May 11, 2026 by
sujitvasanth
Loading…
Enhance CUDA flash attention kernel selection for DKQ=512 with low gq…
ggml
Nvidia GPU
#6
opened May 8, 2026 by
Ooooze
Loading…
Repro: MTP path on CUDA aborts at fattn.cu:109 (DKQ=512) for Gemma 4 — Blackwell sm_120 + Ampere sm_86
documentation
Improvements or additions to documentation
#5
opened May 8, 2026 by
jameseiten
•
Draft
ProTip!
Mix and match filters to narrow down what you’re looking for.