Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -100,6 +100,14 @@ User Query → [SONA Engine] → Model Response → User Feedback
| 8h | [**GraphMAE**](./crates/ruvector-gnn) | Graph Masked Autoencoder — self-supervised node representation learning with GAT encoder |
| 8i | [**TurboQuant**](./crates/ruvllm) | 2-4 bit asymmetric KV-cache quantization — 6-8x memory reduction, <0.5% perplexity loss, H2O/PyramidKV eviction |

**Continuous Training & Optimization** *(ADR-129)*
| # | Capability | What It Does |
|---|------------|--------------|
| 8j | [**Nightly training**](./scripts/training/) | Automated nightly LoRA fine-tuning from brain learnings — models improve every day |
| 8k | [**Release gates**](./scripts/training/release_gate.py) | 7 automated quality checks (code quality, routing accuracy, perplexity, speed, contamination) — prevents shipping regressions |
| 8l | [**TurboQuant profiling**](./crates/ruvllm/src/quantize/turboquant_profile.rs) | Per-layer KV-cache bit-width optimization with `.turboquant.json` sidecar configs |
| 8m | [**Training corpus**](./data/training/) | 230+ records from brain memories (pi.ruv.io) + architecture decisions + Claude routing examples |

**Distributed Systems**
| # | Capability | What It Does |
|---|------------|--------------|
Expand Down
6 changes: 3 additions & 3 deletions docs/adr/ADR-129-ruvltra-gcloud-training-turboquant.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,9 +16,9 @@ Accepted — Phase 1 (calibration) deployed and executing. Governance and releas
| **Cloud Run Jobs** | **3 deployed** | `ruvltra-calibration`, `ruvltra-nightly-train`, `ruvltra-benchmark` (all L4 GPU) |
| **Cloud Schedulers** | **2 enabled** | Nightly 03:00 UTC, Weekly benchmark Mon 06:00 UTC |
| **Phase 1: Calibration** | **Complete** | All 4 models calibrated on L4 GPU. TQ profiles + benchmarks uploaded to HuggingFace. Results: 75.4 tok/s (small), 62.6 tok/s (medium), 67.1 tok/s (claude-code) |
| **Phase 2: SFT** | **Ready** | Training corpus exported (230 records, 530K tokens), scripts ready |
| **Phase 3: Benchmarks** | **Partial** | Release gate automation implemented and tested; inference benchmarks running |
| **Phase 4: Publishing** | **Partial** | TurboQuant sidecar configs uploaded to all 4 HF models |
| **Phase 2: SFT** | **Executing** | LoRA SFT running on L4 GPU (rank-16, 2 epochs, lr=2e-5). Corpus: 230 records, 530K tokens |
| **Phase 3: Benchmarks** | **Executing** | Release gate automation tested. L4 GPU benchmark job running. Calibration benchmarks complete for all 4 models |
| **Phase 4: Publishing** | **Complete** | TurboQuant sidecar configs + benchmark results uploaded to all 4 HF models. Model card READMEs updated with benchmark tables |
| **Tooling** | **ruvllm-native** | Uses RuvltraQuantizer + TurboQuantProfile (Rust), gguf + llama-cpp-python (Python). No llama.cpp source compilation. |

## Context
Expand Down
Loading
Loading