SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.
For architecture details see ARCHITECTURE.md.
Add the core dependencies (Gradle Kotlin DSL):
dependencies {
implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
}val model = nn {
input(28 * 28)
dense(out = 128)
relu()
dense(out = 10)
}val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }
val c = a matMul b
val d = c.relu()// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
println("Tensors: ${reader.tensorCount}")
// Load specific tensor on demand (no whole-file loading)
val bytes = reader.loadTensor("token_embd.weight")
// Or get a TensorStorage descriptor with encoding/placement metadata
val storage = reader.loadTensorStorage("token_embd.weight")
}More examples: SKaiNET-examples | SKaiNET-notebook
SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:
| Project | Description |
|---|---|
| SKaiNET-LLM | Llama, Gemma, and BERT inference runtimes |
| SKaiNET-transformers | Pre-built transformer architectures and layers |
| SKaiNET-examples | Sample projects and integration demos |
| Goal | Start here |
|---|---|
| Examples and sample projects | SKaiNET-examples |
| Interactive notebooks | SKaiNET-notebook |
| LLM inference (Llama, Gemma) | SKaiNET-LLM |
- Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
- Single codebase shared across all platforms via Kotlin Multiplatform
- ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
- SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
- TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets:
safe-lowbit,balanced,experimental-max. SeeTurboQuantUsagefor integration guide.
- ComputeGraph: Unified framework for defining agentic workflows and tool-calling loops.
- Java facade:
JavaAgentLoop(inskainet-lang-java)
- Sequential:
nn { input(); dense(); relu(); dense() } - DAG / Graph: arbitrary wiring with
dag { }for ResNet, YOLO-style architectures - Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
- KAN (Kolmogorov–Arnold Networks) layer (experimental)
- Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers
- Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
- Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
- Type-safe transform DSL: resize, crop, normalize, toTensor
SKaiNETentry point,TensorJavaOps, builder-pattern model definition- Maven BOM (
sk.ainet:skainet-bom) for one-line version management
- Export trained models to standalone, optimized C99 with static memory allocation
- Ready-to-use Arduino library output
- Lower Kotlin DSL to MLIR StableHLO dialect
- Optimization passes: constant folding, operation fusion, dead code elimination
- Valid IREE-compilable output with streaming API and public
HloGenerator
- Qwen / GPT-2 Byte-Level BPE Tokenizer — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace
tokenizer.json; verified against Qwen2.5-0.5B reference token IDs. - LLaMA / SentencePiece Tokenizer — llama.cpp SPM pipeline with whitespace escape, score-priority BPE (SPM rule, opposite of GPT-2 merge-rank), and
<0xNN>byte fallback. Builds from GGUF (tokenizer.ggml.model == "llama") and HuggingFace Unigramtokenizer.json. TokenizerFactoryPer-Architecture Dispatch — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors.- Byte-Level BPE Fix for Qwen/GPT-2 — Previously these models encoded text into garbage tokens because
GgufModelMetadataignoredtokenizer.ggml.mergesentirely, blocking chat mode and tool calling. (#463) - LLaMA GGUF Tokenization Fix —
TokenizerFactorypreviously threwUnsupportedTokenizerExceptionfor LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464) - GGUF UInt Field Fix — UINT32 fields (e.g.
tokenizer.ggml.bos_token_id) are KotlinUIntvalue classes, not subclasses ofNumber, and were silently dropped byas? Numbercasts. Fixed via atoIntFlexiblehelper that handles every signed and unsigned numeric type GGUF can produce.
See CHANGELOG.md for the full release history.
- Q1 2026: Comprehensive documentation ✅
- Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
- Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
- Q4 2026: Federated learning support for multi-device training
We love contributions! Whether it's a new operator, documentation, or a bug fix:
- Read our Contribution Guide.
- Check the Good First Issues.
- Open a discussion or issue on GitHub.
Browse the full codebase documentation on DeepWiki.
- Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)
MIT — see LICENCE.
