SKaiNET/README.md at develop · SKaiNET-developers/SKaiNET

Vision

SKaiNET aims to democratize "Edge AI / On-device AI" by bridging the gap between high-level application development and low-level hardware optimization. We believe AI should be portable, type-safe, and developer-friendly, enabling seamless intelligence in everything from mobile apps to IoT devices without sacrificing performance.

For architecture details see ARCHITECTURE.md.

Quickstart

Add the core dependencies (Gradle Kotlin DSL):

dependencies {
    implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0")
    implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0")
}

Hello Neural Net

val model = nn {
    input(28 * 28)
    dense(out = 128)
    relu()
    dense(out = 10)
}

Core Tensor Ops

val a = tensor(shape(2, 2)) { float(1f, 2f, 3f, 4f) }
val b = tensor(shape(2, 2)) { float(5f, 6f, 7f, 8f) }

val c = a matMul b
val d = c.relu()

GGUF Model Loading

// Recommended: streaming reader — memory-efficient, supports quantized types
val source = JvmRandomAccessSource.open("model.gguf")
StreamingGGUFReader.open(source).use { reader ->
    println("Tensors: ${reader.tensorCount}")
    
    // Load specific tensor on demand (no whole-file loading)
    val bytes = reader.loadTensor("token_embd.weight")
    
    // Or get a TensorStorage descriptor with encoding/placement metadata
    val storage = reader.loadTensorStorage("token_embd.weight")
}

More examples: SKaiNET-examples | SKaiNET-notebook

Ecosystem

SKaiNET is a modular ecosystem. While this repository contains the core engine, specialized high-level libraries are maintained in standalone repositories:

Project	Description
SKaiNET-LLM	Llama, Gemma, and BERT inference runtimes
SKaiNET-transformers	Pre-built transformer architectures and layers
SKaiNET-examples	Sample projects and integration demos

Explore

Goal	Start here
Examples and sample projects	SKaiNET-examples
Interactive notebooks	SKaiNET-notebook
LLM inference (Llama, Gemma)	SKaiNET-LLM

Features

Kotlin Multiplatform

Targets: JVM, macOS (Native), JS, WASM (Browser + WasmWasi)
Single codebase shared across all platforms via Kotlin Multiplatform

Optimized Execution

ComputeGraphExecutor: Optimized engine with fusion passes and trace-to-DAG bridging.
SDPA & Gather: High-performance Scaled Dot-Product Attention and indexing operations.
TurboQuant: Runtime KV-cache compression (~8x at 4-bit) for long-context LLM inference. Presets: safe-lowbit, balanced, experimental-max. See TurboQuantUsage for integration guide.

Agentic AI Infrastructure

ComputeGraph: Unified framework for defining agentic workflows and tool-calling loops.
Java facade: JavaAgentLoop (in skainet-lang-java)

Neural Network DSL

Sequential: nn { input(); dense(); relu(); dense() }
DAG / Graph: arbitrary wiring with dag { } for ResNet, YOLO-style architectures
Layers: Dense, Conv1d/2d/3d, MaxPool, AvgPool, BatchNorm, Dropout, LeakyReLU, ELU
KAN (Kolmogorov–Arnold Networks) layer (experimental)
Autograd engine with reverse-mode gradients, SGD and Adam/AdamW optimizers

Data and I/O

Built-in loaders: MNIST, Fashion-MNIST, CIFAR-10
Formats: GGUF, ONNX, SafeTensors, JSON, Image (JPEG, PNG)
Type-safe transform DSL: resize, crop, normalize, toTensor

Java 21+ Support

SKaiNET entry point, TensorJavaOps, builder-pattern model definition
Maven BOM (sk.ainet:skainet-bom) for one-line version management

Edge AI: Arduino / C99 Export

Export trained models to standalone, optimized C99 with static memory allocation
Ready-to-use Arduino library output

Compiler: MLIR / StableHLO

Lower Kotlin DSL to MLIR StableHLO dialect
Optimization passes: constant folding, operation fusion, dead code elimination
Valid IREE-compilable output with streaming API and public HloGenerator

What's New in 0.19.0

Qwen / GPT-2 Byte-Level BPE Tokenizer — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace tokenizer.json; verified against Qwen2.5-0.5B reference token IDs.
LLaMA / SentencePiece Tokenizer — llama.cpp SPM pipeline with whitespace escape, score-priority BPE (SPM rule, opposite of GPT-2 merge-rank), and <0xNN> byte fallback. Builds from GGUF (tokenizer.ggml.model == "llama") and HuggingFace Unigram tokenizer.json.
TokenizerFactory Per-Architecture Dispatch — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors.
Byte-Level BPE Fix for Qwen/GPT-2 — Previously these models encoded text into garbage tokens because GgufModelMetadata ignored tokenizer.ggml.merges entirely, blocking chat mode and tool calling. (#463)
LLaMA GGUF Tokenization Fix — TokenizerFactory previously threw UnsupportedTokenizerException for LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464)
GGUF UInt Field Fix — UINT32 fields (e.g. tokenizer.ggml.bos_token_id) are Kotlin UInt value classes, not subclasses of Number, and were silently dropped by as? Number casts. Fixed via a toIntFlexible helper that handles every signed and unsigned numeric type GGUF can produce.

See CHANGELOG.md for the full release history.

Roadmap

Q1 2026: Comprehensive documentation ✅
Q2 2026: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0)
Q3 2026: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing)
Q4 2026: Federated learning support for multi-device training

Contributing & Community

We love contributions! Whether it's a new operator, documentation, or a bug fix:

Read our Contribution Guide.
Check the Good First Issues.
Open a discussion or issue on GitHub.

Browse the full codebase documentation on DeepWiki.

Contributors (0.14.0)

Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)

License

MIT — see LICENCE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vision

Quickstart

Hello Neural Net

Core Tensor Ops

GGUF Model Loading

Ecosystem

Explore

Features

Kotlin Multiplatform

Optimized Execution

Agentic AI Infrastructure

Neural Network DSL

Data and I/O

Java 21+ Support

Edge AI: Arduino / C99 Export

Compiler: MLIR / StableHLO

What's New in 0.19.0

Roadmap

Contributing & Community

Contributors (0.14.0)

License

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Vision

Quickstart

Hello Neural Net

Core Tensor Ops

GGUF Model Loading

Ecosystem

Explore

Features

Kotlin Multiplatform

Optimized Execution

Agentic AI Infrastructure

Neural Network DSL

Data and I/O

Java 21+ Support

Edge AI: Arduino / C99 Export

Compiler: MLIR / StableHLO

What's New in 0.19.0

Roadmap

Contributing & Community

Contributors (0.14.0)

License