⚡ Zero-Stall MoE Inference via Lookahead Prediction & Async DMA Prefetching. Optimized for SSD I/O with Hybrid MLA+Sliding Window Attention.
open-source artificial-intelligence lora high-throughput open-models mixture-of-experts llm generative-ai large-language-model streaming-llm predictive-inference sliding-window-attention io-latency-hiding async-dma ssd-offloading lookahead-routing mla-attention dual-layer-moe
-
Updated
Apr 23, 2026 - Python