- Broken POM for
skainet-backend-cpu: The 0.19.0 POM forsk.ainet.core:skainet-backend-cpu-*declared a runtime dependency onsk.ainet:skainet-backend-api-jvm:unspecified— wrong group coordinate and no valid version, becauseskainet-backend-apiwas not configured to publish and the rootallprojects { group = "sk.ainet" }disagreed with theGROUP=sk.ainet.coreused by vanniktech's maven publish plugin. Consumers pulling 0.19.0 hit unresolved-dependency errors. Fixed by:- Applying
vanniktech.mavenPublishand settingPOM_ARTIFACT_ID=skainet-backend-apionskainet-backend-apiso it is actually published alongside the BOM entry that already referenced it. - Aligning
allprojects { group = "sk.ainet.core" }with theGROUPproperty and pinningversionfromVERSION_NAMEsoproject(...)coordinates in generated POMs are consistent.
- Applying
- CI guard: New
verify-published-pomsjob publishes to the local Maven repository and fails the build if any generated.pomcontains<version>unspecified</version>or references a project-local group outsidesk.ainet.core, preventing a regression of this class of coordinate bug.
- Qwen / GPT-2 Byte-Level BPE Tokenizer:
QwenByteLevelBpeTokenizerimplements the full GPT-2-style pipeline — byte-to-unicode mapping, GPT-2 pretokenization regex, merge-rank BPE, and atomic special-token splitting. Builds from either GGUF metadata (fromGgufFields) or a HuggingFacetokenizer.json(fromTokenizerJson). Verified against Qwen2.5-0.5B reference token IDs from HuggingFacetransformers. (#463) - LLaMA / SentencePiece Tokenizer:
SentencePieceTokenizerimplements the llama.cpp SPM pipeline — whitespace escape (▁), code-point symbol split, score-priority BPE (the SPM rule, opposite of the merge-rank rule used for GPT-2 BPE), and<0xNN>byte fallback for unknown characters. Builds from GGUF (tokenizer.ggml.model == "llama") and HuggingFacetokenizer.json(model.type == "Unigram"). Verified against TinyLlama-1.1B reference token IDs from HuggingFacetransformers. (#464) TokenizerFactorywith Per-Architecture Dispatch: Tokenizer selection is now per-architecture, not per file format.TokenizerFactory.fromGguf(fields)and.fromTokenizerJson(json)inspecttokenizer.ggml.model/model.typeand dispatch to the right implementation — Qwen/GPT-2 → byte-level BPE, LLaMA/Gemma/TinyLlama → SentencePiece — regardless of whether weights come from GGUF or SafeTensors. (#463)TokenizerInterface: Common surface implemented byTekkenTokenizer,QwenByteLevelBpeTokenizer, andSentencePieceTokenizer(encode,decode,vocabSize,bosTokenId,eosTokenId).- GGUF Tokenizer Metadata:
GgufModelMetadatanow exposestokenizerModel,tokenizerTokens,tokenizerMerges,tokenizerTokenTypes,bosTokenId, andeosTokenIdso callers can build a tokenizer without re-parsing the raw field map.
- Whisper Encoder E2E: Whisper encoder now compiles end-to-end via SKaiNET → StableHLO → IREE.
- Real StableHLO Lowerings:
softmax,layerNorm, andrmsnormnow lower to real StableHLO ops (reductions,broadcast_in_dim, standard ops) instead ofcustom_callstubs. (#467, #479, #480) - New Op Converters:
gather/embedding, andconcat/slice/castStableHLO converters. (#483, #489) - Activation Alias:
silu/SiLUregistered as an alias forswishinActivationOperationsConverter. (#484) ConstantMaterializationPolicy: Seam for externalizing large weight tensors out of the StableHLO module (enables.irpaexternalization). (#524)- Splat Constant Folding: Uniform-value tensor constants collapsed to
dense<v>splat instead of fully materialized arrays. (#522) - SSA Value Type Tracking: Tracks SSA value types so
reshapeemits the operand's declared type, producing valid MLIR. (#521) - Tensor Encoding in Output:
tensor_encodingcomments in StableHLO output and a top-levelskainet.tensor_encodingsmodule attribute. (#473, #477)
skainet-io-iree-paramsModule: New module withIrpaWriterfor writing IREE Parameter Archive (.irpa) files. AcceptsFileBackedhandles via mmap on JVM / Android for zero-copy weight export. (#523, #525, #528, #529)
skainet-backend-apiModule: New module cleanly separating backend contracts; CPU backend now depends on it. (#468)TensorEncodingMetadata: Accessor forTensorSpec.metadataand propagation throughTraceToGraphBuilder.finalize, keeping quantization encoding visible end-to-end. (#469)
- Annotated
StableHloConverterFactoryandTokenizerFactoryfor idiomatic Java call sites. (#400) - Renamed
TensorSpecEncoding.ktclass for Java callers. (#400) - Added
skainet-backend-apito the BOM. (#400) - New
ReleaseApiJavaTestcovering the 0.19.0 Java surface. (#400)
- Antora + Diátaxis: Migrated docs to Antora with Divio / Diátaxis layout (tutorials, how-tos, reference, explanation). (#494)
skainet-docs-uiv1.1.1: Adopted the new theme with Diátaxis card-grid landing page. (#501)- Operator Coverage Matrix: Emit cross-backend Operator Coverage Matrix generated from
TensorOpssurface scan. (#494, #511) - Ops Docs: KDoc
@paramextraction, real version stamps, LaTeX rendering, fixed partials, and dropped void backend. (#511, #513) - Dokka API Bundle: Wired into the Antora site build. (#494)
- Local Mermaid: Drop kroki, render Mermaid locally via
mmdc. (#496)
androidNativeArm32: Added across core modules. (#503)
- Byte-Level BPE Broken for Qwen/GPT-2 Models: Previously there was no GPT-2-style byte-level BPE tokenizer in the repo, and
GgufModelMetadataignoredtokenizer.ggml.mergesentirely — so any Qwen / GPT-2 / Mistral-Nemo model encoded text into garbage tokens (byte-level chars instead of merged vocab IDs), blocking chat mode and tool calling. The newQwenByteLevelBpeTokenizer+TokenizerFactorydispatch fix the issue for both GGUF and SafeTensors sources. (#463) - No SentencePiece Path for LLaMA-Family GGUF Models:
TokenizerFactorypreviously threwUnsupportedTokenizerExceptionfortokenizer.ggml.model == "llama", leaving LLaMA / TinyLlama / Gemma / Mistral-v0.1 GGUFs untokenizable. The newSentencePieceTokenizercloses that gap. (#464) - GGUF UInt Fields Silently Dropped: GGUF UINT32 fields (e.g.
tokenizer.ggml.bos_token_id) arrive fromStreamingGGUFReaderaskotlin.UInt, which is a value class — not a subclass ofkotlin.Number— so a plainas? Numbercast was returning null. The newtoIntFlexiblehelper handles every signed and unsigned numeric type GGUF can produce, restoring the BOS/EOS/UNK ids on the tokenizer builders. - Graph Conv Output Shape Inference:
conv1d/conv2d/conv3doperations in graph inference previously produced placeholder output shapes, breaking downstream shape-dependent passes. Graph ops now compute real output shapes. (#536, #537) - Conv1d/Conv3d Not Recorded:
conv1dandconv3dwere not routed through the recording decorator, so they disappeared from traced computation graphs. (#532, #533) - Static Conv1d HLO Shape Crash: Conv1d StableHLO lowering crashed when trace attributes were missing; now falls back to
TensorRefshape / dtype. (#530, #531) - Flatten Hardcoded to MNIST Shape:
NetworkBuilder.flatten()returned a hardcodedlastDimension = 1568(the MNIST CNN value); any other architecture — e.g. a 64-channel CNN over 32×32 inputs — crashed withArrayIndexOutOfBoundsExceptionin the followingdense()layer. The DSL now tracks per-sample shape through a newinput(IntArray)overload,conv1d/conv2d/conv3d,maxPool2d,avgPool2d, andupsample2d, reusing theConvShapeUtilsarithmetic introduced in #537;flatten()reads the tracked shape and honorsstartDim/endDim, andConv*layers can auto-inferinChannelsfrom the declared input. (#535, #538) - StableHLO
transpose/dot_generalMLIR Emission: Fixed malformed MLIR produced bystablehlo.transposeandstablehlo.dot_generalthat blocked IREE compilation. (#520) - WasmJS / JS / Native Compile: Replaced JVM-only
putIfAbsentwith a common-stdlib idiom. (#485) - Antora Container:
HOME=/tmpso Chromium crashpad can launch during Mermaid rendering in CI. (#534) bundleDokkaIntoSiteCI Permission Failure: Fixed docs pipeline permission error. (#496)- Pandoc Artifacts in Docs: Stripped pandoc anchors and demoted heading levels in migrated pages. (#496)
compile-hloDependencies: Dropped vestigialskainet-backend-cpudependency fromcompile-hlojvmMain. (#472)- Moved-LLM Docs: Replaced relocated LLM pages with redirect stubs pointing at the standalone repo. (#499)
- Maven Group / Version Refs: Bumped stale version references and fixed Maven group coordinates. (#499)
- Stale
TURBOQUANT_ISSUES.mdtracker at the repo root. (#490)
- agp: 9.1.0 → 9.1.1.
- com.networknt:json-schema-validator: 3.0.1 → 3.0.2.
- org.jetbrains.kotlinx:kotlinx-serialization-json: bumped to 1.11.0.
- actions/checkout: 4 → 6.
- actions/upload-pages-artifact: 3 → 5.
- actions/cache: 4 → 5.
- actions/setup-java: 4 → 5.
- actions/deploy-pages: 4 → 5.
- actions/github-script: 8 → 9.
- docker/build-push-action: 5 → 7.
- docker/setup-buildx-action: 3 → 4.
- TurboQuant KV-Cache Compression: Runtime KV-cache compression for LLM inference using rotation-based quantization (Google Research TurboQuant paper). Supports PolarOnly and PolarPlusQjl variants with 2/3/4/8-bit encoding.
TurboQuantCodec: End-to-end encode/decode pipeline (random rotation, scalar quantization, QJL residual, bit-packing).TurboQuantKvCacheStore: Compressed KV cache with per-head TurboQuant blocks and asymmetric K/V policies.TurboQuantPresets: Named presets —safe-lowbit(Q8_0-K + TQ4-V),balanced(TQ4/TQ4),experimental-max(TQ3/TQ3).KvCacheStore.turboQuant("balanced", ...): One-line factory for skainet-transformers integration.CompressedKvAttention: SDPA bridge with FULL_TILE and RAW_STORAGE dequant strategies.@KvCacheand@KvCacheBypassDSL annotations for declarative KV cache configuration.KvCacheAnnotationResolver: Resolve annotations to cache instances.TurboQuantUsage: Documented integration guide with compilable examples.
- Memory Architecture Hardening: First-class storage and placement abstractions for zero-copy, quantization-preserving tensor management.
TensorStorage: Runtime descriptor replacing ad-hoc array passing (logical type, physical encoding, buffer ownership, placement).TensorEncoding: Sealed hierarchy —Dense,Q4_K,Q8_0,TernaryPacked,TurboQuantPolar,TurboQuantPolarQjl,Opaque.BufferHandle: Five ownership modes —Owned,Borrowed,Aliased,FileBacked,DeviceResident.Placement: Device/memory-domain intent with fallback policies (CPU_HEAP,MMAP_WEIGHTS,GPU_PREFERRED).LogicalDType: Semantic numeric types separate from physical encoding.PackedBlockStorage: Unified contract for all packed quantized formats.MemoryPlanner,MemoryTracker,ActiveMemoryTracker: Placement resolution and copy diagnostics.
- KV-Cache Subsystem:
KvCacheStoreinterface with append-by-token writes, layer/head addressing, eviction, andDefaultKvCacheStore(dense FP32 baseline). - Quantization-Preserving Loaders:
StreamingGGUFReaderandStreamingSafeTensorsReaderproduceTensorStoragewithFileBackedorBorrowedhandles (no forced densification).StorageAwareSafeTensorsLoader: Zero-copy file-backed SafeTensors loading.- Completed
Quants.ktport:byteShapeToQuantShape,quantByteSize,isBlockQuantized,validateQuantizedBytes.
- Tekken Tokenizer: Mistral Tekken (tiktoken-based BPE) tokenizer support.
- CPU SIMD TurboQuant Kernels:
JvmTurboQuantKernelswith Java Vector API acceleration for abs-max, quantize, dequantize, and Walsh-Hadamard butterfly. - JMH Benchmarks: TurboQuant encode/decode throughput, bit-packing, rotation, and KV cache append/read benchmarks (
TurboQuantBenchmarks.kt). - Storage Benchmarks: Dequantization throughput (Q4_K, Q8_0, Ternary), buffer accessor, and TensorData bridge benchmarks (
StorageBenchmarks.kt). - New Ops:
sin,cos,tanh,convTranspose1d. - New Layers:
TransposedConv1d,Snakeactivation,LayerScale.
- Streaming GGUF as Default:
StreamingGGUFReaderis now the recommended GGUF loading path (memory-efficient, supports quantized types). - DSL Annotations: Extended
PlacementAnnotations.ktwith@KvCache(preset=...)and@KvCacheBypassfor TurboQuant configuration.
- Int Overflow for Large Tensors: Fixed
StreamingTensorInfo.nBytesandStreamingSafeTensorInfo.sizeInBytesfromInttoLong, preventing silent overflow for tensors > 2 GB. Fixes loading of Gemma 4 E4B and future large models. (#452) - Legacy GGUFReader Overflow Guard: Added explicit overflow check with actionable error message for tensors > 2 GB in the legacy eager loader.
- io.github.kotest:kotest: 6.1.9 → 6.1.11.
- com.squareup:kotlinpoet: 2.2.0 → 2.3.0.
- Core Engine Focus: Refactored the repository to focus on the core
ComputeGraphframework, compiler, and backends. - Standalone Ecosystem: Extracted high-level LLM and transformer implementations to dedicated repositories (SKaiNET-LLM and SKaiNET-transformers).
- LLM-as-DSL: High-level DSL for defining and running LLM architectures within the core
ComputeGraphframework. - ComputeGraphExecutor: New optimized executor with support for fusion passes and trace-to-DAG bridging.
- SDPA & Gather: Implementation of Scaled Dot-Product Attention (SDPA) and
gather/indexSelectops across backends. - EmbeddingAdapter: Streamlined embedding layer integration for transformer models.
- Optimized LLM execution: Integrated fusion passes for faster inference on supported backends.
- Improved Tensor API: Refined
Tensorinterface and updatedComputeGraphExecutorfor better type safety and performance. - Dependency Cleanups: Removed stale references to LLM and transformer code already moved to the standalone
skainet-transformersrepository.
- Embedding Padding: Fixed
paddingIdxhandling in embedding layers. - Concatenation: Resolved rank-specific issues in tensor concatenation (rank > 1).
- Compilation: Fixed various build and compilation errors after module migrations.
- Deduplicated LLM infrastructure: unified
KvCache,softmax,RoPE, andsamplinglogic across modules for improved maintainability. - Updated skainet-bom: Refactored the Bill of Materials (BOM) to use local
project()references for better build consistency.
- LLM Module Extraction: Extracted and moved core LLM modules to the standalone SKaiNET-LLM repository to reduce core codebase footprint.
- Transformer Code Cleanup: Removed redundant code that has been moved to the SKaiNET-transformers repository.
- Dependency Graph: Resolved inverted dependency issues in the LLM infrastructure.
- System Prompt Support (Java): Added
systemPromptsupport toKLlamaJavaandKLlamaSessionfor prepending system instructions to conversations. - Model Module Extraction: Extracted model-specific code into dedicated
skainet-modelsmodules for better separation of concerns and maintainability. - Enhanced Smoke Tests: Refactored
smoke-test.shto support multiple runners via JSON configuration and improved LLM loading verification.
- Whisper HLO Generation: Fixed StableHLO MLIR generation for Whisper models.
- Compilation: Fixed various Kotlin/JVM compilation errors.
- First-Class Java 21+ Support: Complete Java API surface with
SKaiNETentry point,TensorJavaOps, builder-pattern model definition (SequentialModelBuilder),KLlamaJava/KBertJavafacades,JavaAgentLoopfor tool-calling agents, andTrainingLoopbuilder. - Maven BOM: New
sk.ainet:skainet-bomartifact for one-line version management across all modules. - Java Documentation: Added Getting Started, LLM Inference, and Model Training guides.
- Java 25 Performance Documentation: Added documentation for JVM CPU backend performance advantages.
- WasmWasi Target: Added
wasmWasitarget support across all KMP modules. - StableHLO MLIR Streaming API: New
HloGeneratorpublic API with generic Model + Tensor interface and streaming MLIR output. - ReductionOperationsConverter: Added support for reduction operations in StableHLO export.
- JVM Performance (Jlama Techniques): MemorySegment-based tensors, SIMD GEMM kernels, paged KV cache, batch attention for prompt prefill, fused QKV projections, and cached quantized weights.
- Native RandomAccessSource: POSIX
pread()-based source for memory-efficient GGUF parsing. - MemorySegment Weight Conversion: New
NATIVE_OPTIMIZEDquant policy andMemSegWeightConverterpipeline with Arena lifecycle management. - Lazy Transpose: Added lazy transpose for Q4/Q8 MemorySegment tensors and MemSeg FP32 transpose.
- Java CLI App: New Java-based KLlama CLI application.
- Android KMP Plugin Migration: Migrated Android subprojects to
androidMultiplatformLibraryplugin for AGP 9 compatibility. - Refactored Model Loading: Extracted shared dequantization, registry, tensor naming, and decoder runtime into reusable components.
- JDK Requirement Relaxed: Allow JDK >= 21 instead of requiring exactly JDK 21.
- Gradle Upgrade: Updated to Gradle 9.3.1.
- Kotlin Upgrade: Bumped Kotlin from 2.2.21 to 2.3.10.
- Kotlin Compile Testing: Replaced abandoned
kotlin-compile-testingwithkctforkfor Kotlin 2.3.0 compatibility.
- StableHLO MLIR Export: Fixed MLIR export to produce valid IREE-compilable output.
- OOM in Dequantization Benchmark: Fixed out-of-memory in
DEQUANTIZE_TO_FP32E2E benchmark test. - Quantized MatMul: Fixed block offset calculation in quantized matrix multiplication.
- CI Stability: Fixed AAPT2 daemon crashes and improved Android build stability.
- Documentation CI: Fixed workflow permissions for PR comments.
- Deprecated API Usage: Fixed
createTempDir()deprecation in data-simple integration tests.
- com.gradleup.shadow: 9.3.1 → 9.3.2.
- com.fasterxml.jackson.core:jackson-databind: 2.21.0 → 2.21.1.
- ch.qos.logback:logback-classic: 1.5.27 → 1.5.32.
- io.github.kotest:kotest: 6.1.3 → 6.1.4.
- org.jetbrains.kotlinx:kotlinx-io-core: 0.8.2 → 0.9.0.
- com.vanniktech.maven.publish: → 0.36.0.
- org.jetbrains.kotlinx.kover: → 0.9.7.
- actions/setup-node: 4 → 6.
- actions/upload-artifact: 6 → 7.
- actions/download-artifact: 7 → 8.
- junit-platform-launcher added for CI test execution.
Thank you to the following contributors for their work on this release:
- Dhia Chemingui (@dhiaspaner) — Android KMP plugin migration (#385, #386)
- Tool Calling: Added support for tool calling in KLlama, including a new
skainet-kllama-agentmodule. - Gemma 3n Support: New
skainet-kgemmamodule for Google's Gemma 3n E2B multimodal models. - Extended SafeTensors Support: Added SafeTensors weight loading support for both KLlama CLI and Gemma models.
- HuggingFace Tokenizer: Initial support for HuggingFace-style tokenizers in Gemma models.
- Named Arguments: Refactored various internal APIs to use named arguments for better optional parameter support.
- System Prompt Handling: Improved system prompt formatting and handling in agentic workflows.
- BERT Support: Full support for BERT-based models with
SafeTensorsweight loading. - kbert-cli: New CLI tool for running BERT inference, supporting text encoding and cosine similarity computation.
- WordPiece Tokenizer: Implementation of WordPiece tokenizer for BERT models.
- TinyFoA Support: Implemented missing operators (
abs,sign,clamp,lt,ge,narrow,pad2d,unfold) to support TinyFoA (AAAI 2025) training pipeline for memory-efficient on-device learning. - Multi-platform KLlama: Added macOS target support for the KLlama runtime.
- Custom Backends Documentation: Added detailed guide and examples for injecting custom backends into KLlama.
- Improved robustness of TinyFoA operations with comprehensive unit tests.
- Benchmarking DSL: New
BenchmarkDslandBenchmarkRunnerfor measuring model performance and latency. - Execution Observers: Added
ExecutionObserverAPI withLatencyExecutionObserverandMemorySnapshotObserverfor profiling. - New Layers: Added
RMSNormalizationlayer support. - KLlama Enhancements: Improved weight loading and initial support for GPU-accelerated attention (experimental).
- Refactored
ExecutionContextto support execution observers and better phase management. - Updated KLlama runtime with improved ingestion and benchmarking utilities.
- Generative AI Section: New README section with simple code for GGUF text generation.
- Tokenizer Strategies: Automatic detection of tokenizer type (SentencePiece, BPE, WordPiece) from GGUF metadata.
- Improved Token Decoding: Support for multi-byte UTF-8 character decoding from byte tokens.
- Llama Runtime: Rewritten
matmulNoBiasfor better performance and support for row-major weights. - GGUF Loading: Improved dequantization for Q2_K, Q4_K, Q5_K, and Q6_K formats matching llama.cpp logic.
- GGUF Storage Order: Fixed critical bug with column-major storage in GGUF files by implementing proper transposition during loading.
- Llama Attention: Fixed missing attention output projection (wo) in the runtime.
- Tokenizer: Fixed BOS token handling and multi-byte character reconstruction.
- SafeTensors Support: Initial implementation of
skainet-io-safetensorsfor reading SafeTensors format. - Generalized I/O & Weight Mapping:
- New
WeightMapperandWeightLoaderAPIs for unified model parameter loading across formats. LoadingProgressAPI for tracking model loading state.GgufModelMetadataandOnnxModelMetadatafor better inspection of model files.
- New
- JVM Performance: Enhanced
DefaultCpuOpsJvmwithJvmVectorKernelsfor SIMD-accelerated tensor operations using the Java Vector API. - Llama Enhancements:
- Added
GGUFTokenizerfor better text processing. - Improved
LlamaIngestionand ingestion pipelines.
- Added
- Improved GGUF/ONNX Loading: Robust weight loading and metadata parsing for GGUF and ONNX models.
- Streamlined CLI: Removed unfinished CLI samples and reorganized
skainet-tensor-tools. - Documentation Cleanup: Removed outdated technical docs and consolidated architecture information.
- Improved robustness of GGUF and ONNX streaming readers.
- Fixed various issues in WASM/JS weight parsing.
- Updated version to 0.8.3.
- KLlama (Llama 2 port): Initial version ported from
llama2-kmp, supporting GGUF models. - GGUF Enhancements:
- Support for
mmapfor zero-copy GGUF tensor loading. - Embedded tokenizer support in GGUF.
- New quantization formats:
Q8_0,Q4_K, and BitNet/Ternary support (TQ1_0,TQ2_0). - Improved loading and bug fixes for quantization and mapping.
- Added
int64support for GGUF. - Improved GGUF metadata loading.
- Support for
- Streaming Support: Added streaming support for GGUF and ONNX models.
- Advanced Operations:
- New activations:
LeakyReLU,ELU. - New pooling:
AvgPool2d. - New convolutions:
Conv1d,Conv3d.
- New activations:
- Optimizers & Training:
- Added
AdamandAdamWoptimizers. - Comprehensive loss function library.
- New
Metricinterface withAccuracyimplementation. - KSP-based DSL generator for Network activations.
- Added
- Data & Datasets:
- Support for
CIFAR-10andFashion-MNISTdatasets. - New
Data Transform APIandImage Transform DSL.
- Support for
- Testing & Documentation:
skainet-test-groundtruthmodule for validation against PyTorch.- Integration tests for quantized inference and
KvCache. - Shadow JAR support for JVM fat JAR builds.
- New documentation for testing architecture with Mermaid diagrams.
- WASM/JS: Initial version of a simple WASM/JS sample.
- Simplified model support to GGUF-only (removed legacy Karpathy
.binformat support). - Improved KLlama loading and robustness.
- Updated roadmap with Phase 1 completion and multi-backend storage abstraction plans.
- Improved I/O system and overall robustness.
- Fixed various bugs in quantization and memory mapping.
- Resolved compilation errors and failing tests in CIFAR-10 support.
- Fixed KSP and TracingWrapperProcessor tests to match updated log messages.
- Fixed GGUF metadata loading issues.
- Initial release of 0.8.x series.
- Sine Approximation CLI (
skainet-sine-approx-cli) as a new example application for training models. TapeRecordingStrategyto handle different recording behaviors for prediction and backpropagation.- Comprehensive E2E tests for training sine wave approximations.
- New documentation:
autograd-basic.mdexplaining the autograd engine.
- Refined
Linear,Flatten,Inputmodules andreluactivation to better support gradient tracking and context propagation. - Improved
DefaultExecutionTapeandDefaultGraphExecutionContextfor more robust computation tracing. - Optimized internal
OpSinkandTraceSessionhandling.
- Infinite loop error during backpropagation tracing by implementing specialized tape recording strategies.
- Context mismatch errors in backpropagation tracing.
- Broken testing in the sinus sample application.
- Initial Autograd engine (
DefaultGradientTape) for automatic differentiation and reverse-mode gradients. - Optimizer API with
SgdOptimizerimplementation for training neural networks. - Loss functions module including
MSELossandCrossEntropyLosswith configurable reductions (MEAN, SUM, NONE). - Training DSL and helper utilities for building training loops (
trainStep,evaluateLoss). - Improved Graph DSL with better context propagation and support for recording computation traces.
- Updated dependency versions and refined internal execution context APIs to support gradient tracking.
- Refactored
skainet-compile-dagto support autograd and graph inversion.
- StableHLO implementation and E2E CLI app for compiling models to CUDA via IREE.
ArduinoCodegenfor exporting models to standalone C99 code with static memory allocation, optimized for Arduino.- KSP-based generation of
TracingOpsfor automated recording pipeline updates. - Initial implementation of
skainet-compile-hlofor high-level optimization.
- Improved CUDA backend strategy and IREE integration.
- Optimized long-running property tests for C code generation.
- Refactored
TracingTensorOpsto use execution context for code generation.
- Common I/O abstraction with
ModelReaderandTensorInfoinskainet-io-corefor unified model loading. - Efficient memory handling with non-copying
sliceviews inMemoryChunk. - Unified
skainet-tensor-toolsCLI combining ONNX and GGUF utilities. OnnxStatsClitool for analyzing ONNX model parameters and structure.
- Migrated project to
SKaiNET-developersorganization; updated repository URLs and deployment configurations. - Standardized artifact naming in documentation (e.g.,
SKaiNET-lang-core). - Improved
GGUFReaderwith better alignment parsing and tensor data handling. - Optimized test infrastructure: increased heap size to 8GB for large model tests and added
ReadmeSnippetsTestfor documentation verification.
- Legacy standalone applications and tools:
skainet-KGPChat,skainet-mnist, and separate ONNX/GGUF tool modules.
- ONNX import module (
skainet-io-onnx) with pbandk-generated proto surface, loader utilities, and importer that maps ONNX graphs into SKaiNET compute graphs, plus doc and tests. - CLI tooling:
skainet-onnx-toolsto export ONNX initializers to JSON andskainet-onnx-detectCLI to run YOLO detections from ONNX weights. - YOLOv8 model upgrades: depth/width scaling, decoupled heads with DFL projection, class-name parsing, and detection helpers to align with ONNX exports.
- Image IO module now published with explicit API surface for bitmap <-> tensor conversions across platforms.
- BatchNorm now reshapes stats for broadcasting and exercises JVM/native tests; CPU backend implements
sqrtto support it.
- Added pbandk runtime 0.16.0 for ONNX protobuf decoding.
- Recording/tracing pipeline for tensor ops (RecordingExecution/TracingTensorOps) and compute-graph DAG under
sk.ainet.lang.graph, including tape-to-graph conversion and GraphViz export helpers/tests. - JSON export proof of concept via new
skainet-compile-jsonmodule with serialization models,exportJsonCLI, and tiny graph golden fixtures. - Multiplatform image IO module to convert platform bitmaps <-> tensors and RGB byte arrays; includes macOS implementation fixes.
- Dedicated YOLOv8 model module (
skainet-models:skainet-model-yolo) with graph assembly, config/pre/post-processing, and missing upsample/concat ops required by the model. - NN DSL additions: multi-input
Functionalwrapper, newUpsample2d/Softmax helpers, scalar DSL builder plus tensor/number operator overloads, and extra tensor view/pprint utilities.
- Graph DSL relocated into the lang namespace with refreshed default execution tape/graph context wiring; removed unused integration module scaffolding.
- Removed committed MNIST training assets; rely on download at runtime.
- Added scalar arithmetic support across backends and void ops to match new operator overloads.
- Corrected unsqueeze view handling and data DSL dtype reuse; stabilized tracing/JSON/tape tests.
- Fixed macOS image conversion path and cleaned duplicate files in the new IO/image pipeline.
- io.ktor client 3.3.3 (from 3.3.2).
- logback-classic 1.5.21 (from 1.5.20).
- Kolmogorov–Arnold Network (KAN/AKN) module and DSL support, including public factory and aliases for direct construction. Introduces
Akn/AknConfigandcreateAknmirroring DSL defaults. - Example KAN models and graphs (e.g., Sine function examples and pretrained variant) with tests and Graphviz export.
- Additional NN DSL conveniences around initialization scopes (weights/basis/bias) and activation hooks used by KAN.
- Minor API refinements in lang/nn DSL to better align with execution context usage for new KAN modules.
- Stabilized integration tests for KAN modules and examples.
- Minor initialization performance tweaks for new modules.
- Updated docs and samples to include KAN usage and references.
- Initial support for model code sharing API (model definition, execution, loading). Implements #196, related to #169.
- Batch Normalization layer. Implements #193.
- Forward hooks and simple tape recording for NN. Implements #190, related to #104.
- Common traversal base for modules, with tests; Embedding implementation with dual value types; switched EEmbeddings to DualModule implementation.
- Dropout (initial implementation) and phase support (training/eval) in execution context so modules can behave differently by phase. Related to #5.
trilop (initial version).- MaxPool op with DSL support; Conv2D DSL support.
- Data API: initial version including MNIST data loader; JSON loading support (renamed loader classes from CSV to JSON) with tests. Implements #180, #181; related to #176, #179.
- GGUF model loading implementation (initial import and working version). Implements #178, #182; related to #176, #177.
- MatMul support in backends.
- Nested data blocks support in DSL (data block returns a tensor); contexts for creating and collecting tensors (returning last or all created tensors).
- JVM Ops using the Java Vector API (initial implementation) and SIMD Vector API acceleration.
- JMH benchmarks (JVM module) and additional benchmarks.
- Sample showing general tensor calculations (e.g., image color transformations).
- NN DSL refactored to use
ExecutionContext; addedExecutionContextparameter toforwardfunctions. - Models and data APIs improved; unified tensor value creation in DSL; moved tensor creation context for safer vector/matrix/tensor creation.
- Default CPU compute used for JS target.
- JS and WASM Kotlin targets aligned for library packaging.
- Gradle updated to 9.0.0; Android target namespaces fixed.
- Crash in schema validation task; added Kotlin compiler plugin configuration for expect/actual.
- Activation not applied in Dense layer (fixed).
- JVM target issues; fixed failing JVM tests; added regression tests; stabilized platform matching test (temporarily ignored) and additional general test fixes.
- Miscellaneous build-signing validation added to avoid CI failures.
- SIMD/Java Vector API acceleration for JVM backend operations.
- com.vanniktech.maven.publish: 0.34.0 → 0.35.0.
- io.ktor (android, cio, content-negotiation, core, darwin, js, logging): 3.3.1 → 3.3.2.
- com.fasterxml.jackson.core:jackson-databind: 2.15.2 → 2.20.0 → 2.20.1.
- GitHub Actions: use Java 22.
- Bump actions/checkout from v4 to v5.
- Add Gradle local caches to .gitignore.
- Preparations for 0.2.0 release and ability to build local Maven version of the upcoming release.
- Added hint/reference on normalization layer paper. Related to #192.
- Initial public release of SKaiNET 0.1.0.