diff --git a/CHANGELOG.md b/CHANGELOG.md index 8a82943d..3f607a29 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,17 +2,85 @@ ## [Unreleased] +## [0.19.0] - 2026-04-20 + ### Added -- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`. -- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (`▁`), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`. -- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors. + +#### Tokenizers +- **Qwen / GPT-2 Byte-Level BPE Tokenizer**: `QwenByteLevelBpeTokenizer` implements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (`fromGgufFields`) or a HuggingFace `tokenizer.json` (`fromTokenizerJson`). Verified against Qwen2.5-0.5B reference token IDs from HuggingFace `transformers`. (#463) +- **LLaMA / SentencePiece Tokenizer**: `SentencePieceTokenizer` implements the llama.cpp SPM pipeline — whitespace escape (`▁`), code-point symbol split, **score-priority** BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and `<0xNN>` byte fallback for unknown characters. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace `tokenizer.json` (`model.type == "Unigram"`). Verified against TinyLlama-1.1B reference token IDs from HuggingFace `transformers`. (#464) +- **`TokenizerFactory` with Per-Architecture Dispatch**: Tokenizer selection is now **per-architecture, not per file format**. `TokenizerFactory.fromGguf(fields)` and `.fromTokenizerJson(json)` inspect `tokenizer.ggml.model` / `model.type` and dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors. (#463) - **`Tokenizer` Interface**: Common surface implemented by `TekkenTokenizer`, `QwenByteLevelBpeTokenizer`, and `SentencePieceTokenizer` (`encode`, `decode`, `vocabSize`, `bosTokenId`, `eosTokenId`). - **GGUF Tokenizer Metadata**: `GgufModelMetadata` now exposes `tokenizerModel`, `tokenizerTokens`, `tokenizerMerges`, `tokenizerTokenTypes`, `bosTokenId`, and `eosTokenId` so callers can build a tokenizer without re-parsing the raw field map. +#### StableHLO → IREE compilation +- **Whisper Encoder E2E**: Whisper encoder now compiles end-to-end via SKaiNET → StableHLO → IREE. +- **Real StableHLO Lowerings**: `softmax`, `layerNorm`, and `rmsnorm` now lower to real StableHLO ops (reductions, `broadcast_in_dim`, standard ops) instead of `custom_call` stubs. (#467, #479, #480) +- **New Op Converters**: `gather` / `embedding`, and `concat` / `slice` / `cast` StableHLO converters. (#483, #489) +- **Activation Alias**: `silu` / `SiLU` registered as an alias for `swish` in `ActivationOperationsConverter`. (#484) +- **`ConstantMaterializationPolicy`**: Seam for externalizing large weight tensors out of the StableHLO module (enables `.irpa` externalization). (#524) +- **Splat Constant Folding**: Uniform-value tensor constants collapsed to `dense` splat instead of fully materialized arrays. (#522) +- **SSA Value Type Tracking**: Tracks SSA value types so `reshape` emits the operand's declared type, producing valid MLIR. (#521) +- **Tensor Encoding in Output**: `tensor_encoding` comments in StableHLO output and a top-level `skainet.tensor_encodings` module attribute. (#473, #477) + +#### IREE `.irpa` weight files +- **`skainet-io-iree-params` Module**: New module with `IrpaWriter` for writing IREE Parameter Archive (`.irpa`) files. Accepts `FileBacked` handles via mmap on JVM / Android for zero-copy weight export. (#523, #525, #528, #529) + +#### Backend API +- **`skainet-backend-api` Module**: New module cleanly separating backend contracts; CPU backend now depends on it. (#468) +- **`TensorEncoding` Metadata**: Accessor for `TensorSpec.metadata` and propagation through `TraceToGraphBuilder.finalize`, keeping quantization encoding visible end-to-end. (#469) + +#### Java API (0.19.0 surface polish) +- Annotated `StableHloConverterFactory` and `TokenizerFactory` for idiomatic Java call sites. (#400) +- Renamed `TensorSpecEncoding.kt` class for Java callers. (#400) +- Added `skainet-backend-api` to the BOM. (#400) +- New `ReleaseApiJavaTest` covering the 0.19.0 Java surface. (#400) + +#### Docs (Antora migration) +- **Antora + Diátaxis**: Migrated docs to Antora with Divio / Diátaxis layout (tutorials, how-tos, reference, explanation). (#494) +- **`skainet-docs-ui` v1.1.1**: Adopted the new theme with Diátaxis card-grid landing page. (#501) +- **Operator Coverage Matrix**: Emit cross-backend Operator Coverage Matrix generated from `TensorOps` surface scan. (#494, #511) +- **Ops Docs**: KDoc `@param` extraction, real version stamps, LaTeX rendering, fixed partials, and dropped void backend. (#511, #513) +- **Dokka API Bundle**: Wired into the Antora site build. (#494) +- **Local Mermaid**: Drop kroki, render Mermaid locally via `mmdc`. (#496) + +#### Platform targets +- **`androidNativeArm32`**: Added across core modules. (#503) + ### Fixed - **Byte-Level BPE Broken for Qwen/GPT-2 Models**: Previously there was no GPT-2-style byte-level BPE tokenizer in the repo, and `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely — so any Qwen / GPT-2 / Mistral-Nemo model encoded text into garbage tokens (byte-level chars instead of merged vocab IDs), blocking chat mode and tool calling. The new `QwenByteLevelBpeTokenizer` + `TokenizerFactory` dispatch fix the issue for both GGUF and SafeTensors sources. (#463) - **No SentencePiece Path for LLaMA-Family GGUF Models**: `TokenizerFactory` previously threw `UnsupportedTokenizerException` for `tokenizer.ggml.model == "llama"`, leaving LLaMA / TinyLlama / Gemma / Mistral-v0.1 GGUFs untokenizable. The new `SentencePieceTokenizer` closes that gap. (#464) - **GGUF UInt Fields Silently Dropped**: GGUF UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) arrive from `StreamingGGUFReader` as `kotlin.UInt`, which is a value class — *not* a subclass of `kotlin.Number` — so a plain `as? Number` cast was returning null. The new `toIntFlexible` helper handles every signed and unsigned numeric type GGUF can produce, restoring the BOS/EOS/UNK ids on the tokenizer builders. +- **Graph Conv Output Shape Inference**: `conv1d` / `conv2d` / `conv3d` operations in graph inference previously produced placeholder output shapes, breaking downstream shape-dependent passes. Graph ops now compute real output shapes. (#536, #537) +- **Conv1d/Conv3d Not Recorded**: `conv1d` and `conv3d` were not routed through the recording decorator, so they disappeared from traced computation graphs. (#532, #533) +- **Static Conv1d HLO Shape Crash**: Conv1d StableHLO lowering crashed when trace attributes were missing; now falls back to `TensorRef` shape / dtype. (#530, #531) +- **Flatten Hardcoded to MNIST Shape**: `NetworkBuilder.flatten()` returned a hardcoded `lastDimension = 1568` (the MNIST CNN value); any other architecture — e.g. a 64-channel CNN over 32×32 inputs — crashed with `ArrayIndexOutOfBoundsException` in the following `dense()` layer. The DSL now tracks per-sample shape through a new `input(IntArray)` overload, `conv1d` / `conv2d` / `conv3d`, `maxPool2d`, `avgPool2d`, and `upsample2d`, reusing the `ConvShapeUtils` arithmetic introduced in #537; `flatten()` reads the tracked shape and honors `startDim` / `endDim`, and `Conv*` layers can auto-infer `inChannels` from the declared input. (#535, #538) +- **StableHLO `transpose` / `dot_general` MLIR Emission**: Fixed malformed MLIR produced by `stablehlo.transpose` and `stablehlo.dot_general` that blocked IREE compilation. (#520) +- **WasmJS / JS / Native Compile**: Replaced JVM-only `putIfAbsent` with a common-stdlib idiom. (#485) +- **Antora Container**: `HOME=/tmp` so Chromium crashpad can launch during Mermaid rendering in CI. (#534) +- **`bundleDokkaIntoSite` CI Permission Failure**: Fixed docs pipeline permission error. (#496) +- **Pandoc Artifacts in Docs**: Stripped pandoc anchors and demoted heading levels in migrated pages. (#496) + +### Changed +- **`compile-hlo` Dependencies**: Dropped vestigial `skainet-backend-cpu` dependency from `compile-hlo` jvmMain. (#472) +- **Moved-LLM Docs**: Replaced relocated LLM pages with redirect stubs pointing at the standalone repo. (#499) +- **Maven Group / Version Refs**: Bumped stale version references and fixed Maven group coordinates. (#499) + +### Removed +- Stale `TURBOQUANT_ISSUES.md` tracker at the repo root. (#490) + +### Dependencies +- agp: 9.1.0 → 9.1.1. +- com.networknt:json-schema-validator: 3.0.1 → 3.0.2. +- org.jetbrains.kotlinx:kotlinx-serialization-json: bumped to 1.11.0. +- actions/checkout: 4 → 6. +- actions/upload-pages-artifact: 3 → 5. +- actions/cache: 4 → 5. +- actions/setup-java: 4 → 5. +- actions/deploy-pages: 4 → 5. +- actions/github-script: 8 → 9. +- docker/build-push-action: 5 → 7. +- docker/setup-buildx-action: 3 → 4. ## [0.18.0] - 2026-04-08 diff --git a/README.md b/README.md index d564b3e3..a91a0d13 100644 --- a/README.md +++ b/README.md @@ -19,8 +19,8 @@ Add the core dependencies (Gradle Kotlin DSL): ```kotlin dependencies { - implementation("sk.ainet.core:SKaiNET-lang-core:0.18.0") - implementation("sk.ainet.core:SKaiNET-backend-cpu:0.18.0") + implementation("sk.ainet.core:SKaiNET-lang-core:0.19.0") + implementation("sk.ainet.core:SKaiNET-backend-cpu:0.19.0") } ``` @@ -149,14 +149,14 @@ SKaiNET is a modular ecosystem. While this repository contains the core engine, --- -## What's New in 0.18.0 +## What's New in 0.19.0 -- **TurboQuant KV-Cache Compression** — Runtime KV-cache compression for LLM inference: ~8x memory reduction with 4-bit, works with any model (LLaMA, Mistral, Gemma, Qwen). One-line integration via `KvCacheStore.turboQuant("balanced", ...)`. -- **Memory Architecture Hardening** — First-class storage/placement abstractions (`TensorStorage`, `TensorEncoding`, `BufferHandle`, `Placement`), zero-copy ownership semantics, quantization-preserving loaders. -- **KV-Cache Subsystem** — Dedicated `KvCacheStore` with append-by-token writes, layer/head addressing, asymmetric K/V encoding policies, and `CompressedKvAttention` SDPA bridge. -- **Mistral Tokenizer** — Tekken (tiktoken-based BPE) tokenizer support for Mistral models. -- **Large Tensor Fix** — Fixed Int overflow in GGUF and SafeTensors loaders for tensors > 2 GB (Gemma 4 E4B support). -- **CPU SIMD Kernels** — Java Vector API acceleration for TurboQuant encode/decode/rotation operations. +- **Qwen / GPT-2 Byte-Level BPE Tokenizer** — Full GPT-2-style pipeline (byte-to-unicode, pretokenization regex, merge-rank BPE, atomic special-token splitting). Builds from GGUF metadata or HuggingFace `tokenizer.json`; verified against Qwen2.5-0.5B reference token IDs. +- **LLaMA / SentencePiece Tokenizer** — llama.cpp SPM pipeline with whitespace escape, **score-priority** BPE (SPM rule, opposite of GPT-2 merge-rank), and `<0xNN>` byte fallback. Builds from GGUF (`tokenizer.ggml.model == "llama"`) and HuggingFace Unigram `tokenizer.json`. +- **`TokenizerFactory` Per-Architecture Dispatch** — Tokenizer selection is now per-architecture, not per file format. Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece, regardless of whether weights come from GGUF or SafeTensors. +- **Byte-Level BPE Fix for Qwen/GPT-2** — Previously these models encoded text into garbage tokens because `GgufModelMetadata` ignored `tokenizer.ggml.merges` entirely, blocking chat mode and tool calling. (#463) +- **LLaMA GGUF Tokenization Fix** — `TokenizerFactory` previously threw `UnsupportedTokenizerException` for LLaMA-family GGUFs; the new SentencePiece path closes that gap. (#464) +- **GGUF UInt Field Fix** — UINT32 fields (e.g. `tokenizer.ggml.bos_token_id`) are Kotlin `UInt` value classes, not subclasses of `Number`, and were silently dropped by `as? Number` casts. Fixed via a `toIntFlexible` helper that handles every signed and unsigned numeric type GGUF can produce. See [CHANGELOG.md](CHANGELOG.md) for the full release history. @@ -165,7 +165,7 @@ See [CHANGELOG.md](CHANGELOG.md) for the full release history. ## Roadmap - **Q1 2026**: Comprehensive documentation ✅ -- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0) +- **Q2 2026**: TurboQuant KV-cache compression ✅ (shipped in 0.18.0); Qwen/LLaMA tokenizers ✅ (shipped in 0.19.0) - **Q3 2026**: Agentic AI enhancements ✅ (tool calling shipped in 0.13.0; ongoing) - **Q4 2026**: Federated learning support for multi-device training diff --git a/docs/modules/ROOT/pages/reference/operators/generated/index.adoc b/docs/modules/ROOT/pages/reference/operators/generated/index.adoc index bcda32ca..9eab83e0 100644 --- a/docs/modules/ROOT/pages/reference/operators/generated/index.adoc +++ b/docs/modules/ROOT/pages/reference/operators/generated/index.adoc @@ -1,6 +1,6 @@ = AI-NET Operators Reference -Generated from version `0.18.0` on 2026-04-15 +Generated from version `0.19.0` on 2026-04-15 == Operators by Modality diff --git a/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc b/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc index 5d93a07b..46ace7be 100644 --- a/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc +++ b/docs/modules/ROOT/pages/reference/ops-status-matrix.adoc @@ -1,7 +1,7 @@ = Operator Coverage Matrix :description: Cross-backend status for every operator function in SKaiNET. -Generated from `operators.json` version `0.18.0` on 2026-04-15. +Generated from `operators.json` version `0.19.0` on 2026-04-15. Rows are `Operator.function` pairs; columns are backends that appear in any function's `statusByBackend` map. A missing entry means the backend makes no claim about the function — treat it as "unknown", not "not supported". diff --git a/gradle.properties b/gradle.properties index 66cf78d4..1a0becc4 100644 --- a/gradle.properties +++ b/gradle.properties @@ -1,5 +1,5 @@ GROUP=sk.ainet.core -VERSION_NAME=0.18.0 +VERSION_NAME=0.19.0 POM_DESCRIPTION=SKaiNET POM_URL=https://github.com/SKaiNET-developers/skainet/