From 4a7456241a2b3d16ffcc6c7c367e99b7ddc5c459 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@googlemail.com>
Date: Mon, 13 Apr 2026 16:02:38 +0200
Subject: [PATCH 1/4] Define component attributes for operator-design article
 (#496 step 1)

Adds an `asciidoc.attributes` block to `docs/antora.yml`
defining the four attributes `operator-design.adoc` references
but nobody declares:

    framework_name      = SKaiNET
    ksp_version         = 2.2.21-2.0.5
    dokka_version       = 2.1.0
    asciidoctorj_version = 3.0.0

Antora treats component-level attributes as defaults for every
page in the component, so the seven `{FRAMEWORK_NAME}` /
`{KSP_VERSION}` / `{DOKKA_VERSION}` / `{ASCIIDOCTORJ_VERSION}`
references across lines 1, 8, 30, 78, 176, 177, 178, 215 of
`operator-design.adoc` now resolve to real values instead of
falling back to the literal attribute-name placeholder and
producing a warning.

Net warning count dropped from 13 to 7. The remaining 6 are
the pandoc section-level artifacts in `skainet-for-ai.adoc` and
`arduino-c-codegen.adoc` (commit 2) plus the kroki mermaid
400 on the large `hlo-getting-started.adoc` diagram (commit 3).

First step of the Antora migration polish pass. See #496.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/antora.yml | 12 ++++++++++++
 1 file changed, 12 insertions(+)

diff --git a/docs/antora.yml b/docs/antora.yml
index 05bf9566..947dcb18 100644
--- a/docs/antora.yml
+++ b/docs/antora.yml
@@ -3,3 +3,15 @@ title: SKaiNET
 version: ~
 nav:
   - modules/ROOT/nav.adoc
+
+# Component-level attributes flow to every page. Defined here so the
+# operator-design article (and any future page) can reference them
+# without each page declaring its own attributes block. If you need
+# to override a value on a per-page basis, declare it above the
+# first section heading on that page.
+asciidoc:
+  attributes:
+    framework_name: SKaiNET
+    ksp_version: 2.2.21-2.0.5
+    dokka_version: 2.1.0
+    asciidoctorj_version: 3.0.0

From 7a459bf33814a0713348560e9f9b1b08f88bf406 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@googlemail.com>
Date: Mon, 13 Apr 2026 16:17:33 +0200
Subject: [PATCH 2/4] Strip pandoc anchors + demote pandoc heading levels (#496
 step 2)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Clears 6 section-title-out-of-sequence warnings across
`skainet-for-ai.adoc` (5 occurrences) and `arduino-c-codegen.adoc`
(1 occurrence) that were left over from the pandoc markdown ->
asciidoc conversion in #494.

Two interacting issues:

1. Pandoc generated 20 anchor lines of the form
   `[[1-tape-based-tracing]]`, `[[2-type-safe-tensor-creation-dsl]]`
   etc. These are standalone block anchors sitting ABOVE their
   section heading. In this position Asciidoctor treats them as
   bibliographic block markers that bind to the next block —
   which prevents the following `==` / `===` from registering as
   a section-opening heading, so the parser's section-level
   counter drifts and every subsequent nested heading trips the
   "expected level N, got level N+1" validator.

   The anchors are all auto-generated slug form of the heading
   text they precede. Asciidoctor auto-generates equivalent
   id-from-title anchors for every heading. Deleting these
   20 anchors sacrifices nothing — the id format is the same,
   the #fragment URLs stay stable.

2. Pandoc converts markdown `#` to asciidoc `==` rather than the
   more idiomatic `=` (page title). That made every converted
   page "off by one" with no level-0 title. Demoting every
   heading by one step (remove one `=`) fixes this: the page
   now starts with `= Title` and section levels cascade
   naturally from there.

Applied via `sed -E -i '' 's/^=(=+ )/\1/'` on the two affected
files — matches `^=` followed by one-or-more additional `=`
followed by a space, preserves block delimiters like a bare
`====$` that aren't headings. Applied only to files that were
flagged; the rest of the migration's converted files had clean
hierarchies already.

Net: warning count drops from 7 to 1. The remaining warning
is the kroki mermaid 400 on the large diagram in
`hlo-getting-started.adoc` which commit 3 will handle.

Second step of the Antora migration polish pass. See #496.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .../explanation/perf/java-25-cpu-backend.adoc |  1 -
 .../pages/explanation/skainet-for-ai.adoc     | 31 +++++++------------
 .../ROOT/pages/how-to/arduino-c-codegen.adoc  | 27 +++++++---------
 .../pages/how-to/java-model-training.adoc     |  2 --
 .../pages/tutorials/hlo-getting-started.adoc  |  3 --
 .../pages/tutorials/java-getting-started.adoc |  2 --
 6 files changed, 23 insertions(+), 43 deletions(-)

diff --git a/docs/modules/ROOT/pages/explanation/perf/java-25-cpu-backend.adoc b/docs/modules/ROOT/pages/explanation/perf/java-25-cpu-backend.adoc
index 2b74c01c..c3183e2d 100644
--- a/docs/modules/ROOT/pages/explanation/perf/java-25-cpu-backend.adoc
+++ b/docs/modules/ROOT/pages/explanation/perf/java-25-cpu-backend.adoc
@@ -21,7 +21,6 @@ Required flags remain:
 --enable-preview --add-modules jdk.incubator.vector
 ....
 
-[[jit--c2-improvements-mapped-to-skainet-ops]]
 ===== JIT / C2 improvements mapped to SKaiNET ops
 
 These are automatic — the JIT produces better native code for existing bytecode.
diff --git a/docs/modules/ROOT/pages/explanation/skainet-for-ai.adoc b/docs/modules/ROOT/pages/explanation/skainet-for-ai.adoc
index 102aa5ac..365a6a3b 100644
--- a/docs/modules/ROOT/pages/explanation/skainet-for-ai.adoc
+++ b/docs/modules/ROOT/pages/explanation/skainet-for-ai.adoc
@@ -1,10 +1,8 @@
-[[skainet-core-technology-tensor--data-guide]]
-== SKaiNET Core Technology: Tensor & Data Guide
+= SKaiNET Core Technology: Tensor & Data Guide
 
 This document provides technical instructions for AI agents and developers on using SKaiNET's Tensor and Data API as a modern, type-safe replacement for NDArray or Python's NumPy library.
 
-[[1-fundamental-architecture-tensor-composition]]
-=== 1. Fundamental Architecture: Tensor Composition
+== 1. Fundamental Architecture: Tensor Composition
 
 Unlike traditional libraries where a Tensor is a monolithic object, SKaiNET adopts a *compositional architecture*. A `Tensor++<++T, V++>++` is composed of two primary components:
 
@@ -24,12 +22,11 @@ interface Tensor<T : DType, V> {
 }
 ----
 
-[[2-type-safe-tensor-creation-dsl]]
-=== 2. Type-Safe Tensor Creation (DSL)
+== 2. Type-Safe Tensor Creation (DSL)
 
 SKaiNET provides a powerful Type-Safe DSL for tensor creation. It ensures that the data provided matches the specified `DType` at compile-time (or through the DSL's internal validation).
 
-==== Creation with `ExecutionContext`
+=== Creation with `ExecutionContext`
 
 Tensors are always created within an `ExecutionContext`, which provides the necessary `TensorOps` and `TensorDataFactory`.
 
@@ -41,7 +38,7 @@ val ones = ctx.ones(Shape(1, 10), Int32::class)
 val full = ctx.full(Shape(5, 5), FP32::class, 42.0f)
 ----
 
-==== Expressive Tensor DSL
+=== Expressive Tensor DSL
 
 For more complex initializations, use the `tensor` DSL:
 
@@ -66,17 +63,16 @@ val customInit = tensor(ctx, Int32::class) {
 }
 ----
 
-[[3-slicing-dsl-api]]
-=== 3. Slicing DSL API
+== 3. Slicing DSL API
 
 SKaiNET offers a sophisticated Slicing DSL that allows for creating views or copies of tensor segments with high precision and readability.
 
-==== `sliceView` vs `sliceCopy`
+=== `sliceView` vs `sliceCopy`
 
 * *`sliceView`*: Creates a `TensorView`, which is a window into the original data (no data copying).
 * *`sliceCopy`*: Creates a new `Tensor` with a copy of the sliced data.
 
-==== Slicing DSL Syntax
+=== Slicing DSL Syntax
 
 The `SegmentBuilder` provides several ways to define slices for each dimension:
 
@@ -98,8 +94,7 @@ val view = source.sliceView {
 }
 ----
 
-[[4-core-operations-tensorops]]
-=== 4. Core Operations (`TensorOps`)
+== 4. Core Operations (`TensorOps`)
 
 All mathematical operations are dispatched through the `TensorOps` interface. SKaiNET supports:
 
@@ -109,7 +104,7 @@ All mathematical operations are dispatched through the `TensorOps` interface. SK
 * *Reductions*: `sum`, `mean`, `variance`.
 * *Shape Ops*: `reshape`, `flatten`, `concat`, `squeeze`, `unsqueeze`.
 
-==== Operator Overloading
+=== Operator Overloading
 
 When a tensor is "bound" to ops (e.g., via `OpsBoundTensor`), you can use standard Kotlin operators:
 
@@ -119,8 +114,7 @@ val c = a + b  // Calls ops.add(a, b)
 val d = a * 10 // Calls ops.mulScalar(a, 10)
 ----
 
-[[5-summary-table-skainet-vs-numpy]]
-=== 5. Summary Table: SKaiNET vs NumPy
+== 5. Summary Table: SKaiNET vs NumPy
 
 [cols="<,<,<",options="header",]
 |===
@@ -133,8 +127,7 @@ val d = a * 10 // Calls ops.mulScalar(a, 10)
 |*Reshape* |`a.reshape(new++_++shape)` |`ctx.ops.reshape(a, Shape(new++_++shape))`
 |===
 
-[[6-best-practices-for-ai-integration]]
-=== 6. Best Practices for AI Integration
+== 6. Best Practices for AI Integration
 
 [arabic]
 . *Context Awareness*: Always pass the `ExecutionContext` to functions that create or manipulate tensors.
diff --git a/docs/modules/ROOT/pages/how-to/arduino-c-codegen.adoc b/docs/modules/ROOT/pages/how-to/arduino-c-codegen.adoc
index 7ef1165c..feb0bf13 100644
--- a/docs/modules/ROOT/pages/how-to/arduino-c-codegen.adoc
+++ b/docs/modules/ROOT/pages/how-to/arduino-c-codegen.adoc
@@ -1,12 +1,12 @@
-== Arduino C Code Generation
+= Arduino C Code Generation
 
 SKaiNET provides a specialized compiler backend for exporting trained neural networks to highly optimized, standalone C99 code suitable for microcontrollers like Arduino.
 
-=== Overview
+== Overview
 
 The Arduino C code generation process transforms a high-level Kotlin model into a memory-efficient C implementation. It prioritizes static memory allocation, minimal overhead, and numerical consistency with the original model.
 
-==== Codegen Pipeline
+=== Codegen Pipeline
 
 [mermaid]
 ----
@@ -21,18 +21,16 @@ graph TD
     H --> I[Generated .h/.c files]
 ----
 
-=== Technical Deep Dive
+== Technical Deep Dive
 
-[[1-tape-based-tracing]]
-==== 1. Tape-based Tracing
+=== 1. Tape-based Tracing
 
 Instead of static analysis of the Kotlin code, SKaiNET uses a dynamic tracing mechanism. When you call `exportToArduinoLibrary`, the framework executes a single forward pass of your model using a specialized `RecordingContext`.
 
 * Every operation (Dense, ReLU, etc.) is recorded onto an *Execution Tape*.
 * This approach handles Kotlin's language features (loops, conditionals) naturally, as it only records the actual operations that were executed.
 
-[[2-compute-graph-construction]]
-==== 2. Compute Graph Construction
+=== 2. Compute Graph Construction
 
 The execution tape is converted into a directed acyclic graph (DAG) called `ComputeGraph`.
 
@@ -40,12 +38,11 @@ The execution tape is converted into a directed acyclic graph (DAG) called `Comp
 * Edges represent data flow (Tensors).
 * During this phase, the compiler performs *Shape Inference* to ensure every tensor has a fixed, known size.
 
-[[3-static-memory-management]]
-==== 3. Static Memory Management
+=== 3. Static Memory Management
 
 Microcontrollers typically have very limited RAM and lack robust heap management. SKaiNET uses a *Ping-Pong Buffer Strategy* to eliminate dynamic memory allocation (`malloc`/`free`) during inference.
 
-===== Ping-Pong Buffer Strategy
+==== Ping-Pong Buffer Strategy
 
 The compiler calculates the maximum size required for any intermediate tensor in the graph and allocates exactly two static buffers of that size.
 
@@ -66,8 +63,7 @@ sequenceDiagram
 * *Buffer Reuse*: Instead of allocating space for every layer's output, buffers are reused.
 * *Direct Output Optimization*: The first layer reads from the input pointer, and the last layer writes directly to the output pointer, avoiding unnecessary copies.
 
-[[4-code-generation-emission]]
-==== 4. Code Generation (Emission)
+=== 4. Code Generation (Emission)
 
 The `CCodeGenerator` emits C99-compatible code using templates.
 
@@ -80,15 +76,14 @@ The `CCodeGenerator` emits C99-compatible code using templates.
 int model_inference(const float* input, float* output);
 ----
 
-[[5-validation]]
-==== 5. Validation
+=== 5. Validation
 
 The generator performs post-generation validation:
 
 * *Static Allocation Check*: Ensures no dynamic allocation is present in the generated source.
 * *Buffer Alternation Check*: Verifies that the ping-pong strategy is correctly implemented without data races or overwrites.
 
-=== Performance and Constraints
+== Performance and Constraints
 
 * *Floating Point*: Currently optimized for `FP32`.
 * *Supported Ops*: `Dense`, `ReLU`, `Sigmoid`, `Tanh`, `Add`, `MatMul`.
diff --git a/docs/modules/ROOT/pages/how-to/java-model-training.adoc b/docs/modules/ROOT/pages/how-to/java-model-training.adoc
index 2abf7d17..ddb82976 100644
--- a/docs/modules/ROOT/pages/how-to/java-model-training.adoc
+++ b/docs/modules/ROOT/pages/how-to/java-model-training.adoc
@@ -173,7 +173,6 @@ float loss = loop.step(inputBatch, targetBatch);
 System.out.printf("Step loss: %.4f%n", loss);
 ----
 
-[[full-training-with-train]]
 ==== Full Training with `.train()`
 
 `train()` accepts a `Supplier` that produces an `Iterator` of `(input, target)` pairs for each epoch:
@@ -194,7 +193,6 @@ System.out.printf("Trained %d epochs, final loss: %.4f%n",
 
 Each call to the supplier should return a fresh iterator over the training batches for that epoch. This allows reshuffling between epochs.
 
-[[async-training-with-trainasync]]
 ==== Async Training with `.trainAsync()`
 
 `trainAsync()` runs the training loop on a virtual thread and returns a `CompletableFuture++<++TrainingResult++>++`:
diff --git a/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
index d7d47a92..65e70b84 100644
--- a/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
+++ b/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
@@ -98,7 +98,6 @@ flowchart LR
 
 === Building Blocks
 
-[[1-hlo-converters]]
 ==== 1. HLO Converters
 
 Converters transform SKaiNET operations into StableHLO operations:
@@ -109,7 +108,6 @@ Converters transform SKaiNET operations into StableHLO operations:
 * *NeuralNetOperationsConverter*: High-level NN operations
 * *ConstantOperationsConverter*: Constant value operations
 
-[[2-type-system]]
 ==== 2. Type System
 
 HLO uses a strict type system for tensors:
@@ -123,7 +121,6 @@ Tensor<Float32, Shape4D> // Batch, Channel, Height, Width
 tensor<1x3x224x224xf32> // StableHLO representation
 ----
 
-[[3-optimization-framework]]
 ==== 3. Optimization Framework
 
 The optimization pipeline includes:
diff --git a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc
index 003a6d46..becdecee 100644
--- a/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc
+++ b/docs/modules/ROOT/pages/tutorials/java-getting-started.adoc
@@ -21,7 +21,6 @@ For Maven Surefire / exec-maven-plugin, add them to `++<++jvmArgs++>++`. For Gra
 
 === Maven Setup
 
-[[1-import-the-bom]]
 ==== 1. Import the BOM
 
 The `skainet-bom` manages all SKaiNET module versions so you never have to keep them in sync manually. Add it to your `++<++dependencyManagement++>++` section:
@@ -80,7 +79,6 @@ The `skainet-bom` manages all SKaiNET module versions so you never have to keep
 </project>
 ----
 
-[[2-add-more-modules-as-needed]]
 ==== 2. Add More Modules as Needed
 
 Because the BOM is imported, you can add any module without repeating the version:

From ca06cf515ef1251113df2556a953334057ef979f Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@googlemail.com>
Date: Mon, 13 Apr 2026 16:19:11 +0200
Subject: [PATCH 3/4] Fix CI permission failure on bundleDokkaIntoSite (#496
 step 3)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

First run of the #494 docs.yml workflow on CI failed with:

    > Task :bundleDokkaIntoSite FAILED
    > Failed to create directory
      '/home/runner/work/SKaiNET/SKaiNET/docs/build/site/api'

Root cause: the Antora step ran the node:20-alpine container as
root (the default), so `docs/build/site/` and everything under
it was owned by root. The subsequent Gradle `bundleDokkaIntoSite`
step runs on the runner host as the `runner` user — which cannot
create a subdirectory inside a root-owned tree.

Two coupled fixes, both necessary:

1. `.github/workflows/docs.yml`: add `--user $(id -u):$(id -g)`
   to the `docker run` invocation. The container process now
   writes as the runner user and everything under
   `docs/build/site/` is owned correctly when Gradle takes over.

2. `docs/antora-playbook.yml`: add a `runtime.cache_dir:
   ./.cache/antora` setting. Without --user the default
   $HOME/.cache/antora resolution worked; with --user the
   container process has no matching passwd entry and $HOME
   falls back to `/`, so Antora would fail with
   `Failed to create content cache directory /.cache/antora;
   EACCES: permission denied`. Pointing cache_dir at a path
   under the mounted workspace makes it writable by the
   non-root user. The `.cache/` path is already gitignored
   via the pre-staged `## antora` section in the repo root
   .gitignore, so the cache never gets committed.

Verified end-to-end locally with the CI flow:

    rm -rf docs/build/site docs/.cache
    docker run --rm --user "$(id -u):$(id -g)" \
        -v "$PWD:/antora" -w /antora \
        skainet-antora:local docs/antora-playbook.yml
    ./gradlew --no-daemon bundleDokkaIntoSite

docs/build/site/ owned by $USER:$GROUP, api/ subtree populated
with the Dokka aggregate.

Third step of the Antora migration polish pass. This commit is
independent of the earlier warning-clearance work — it unblocks
CI regardless of what other polish happens next. See #496.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 .github/workflows/docs.yml |  8 ++++++++
 docs/antora-playbook.yml   | 10 ++++++++++
 2 files changed, 18 insertions(+)

diff --git a/.github/workflows/docs.yml b/.github/workflows/docs.yml
index 32a7ed1e..56426f25 100644
--- a/.github/workflows/docs.yml
+++ b/.github/workflows/docs.yml
@@ -87,8 +87,16 @@ jobs:
           cache-to: type=gha,mode=max
 
       - name: Build Antora site
+        # Run the container as the runner user (not root) so the
+        # files under docs/build/site/ are owned by the same user
+        # that the subsequent Gradle `bundleDokkaIntoSite` step runs
+        # as. Without this the Copy task fails with
+        # "Failed to create directory docs/build/site/api" because
+        # the Antora container otherwise writes the site tree as
+        # root and Gradle running as runner can't mkdir inside it.
         run: |
           docker run --rm \
+            --user "$(id -u):$(id -g)" \
             -v "${{ github.workspace }}:/antora" \
             --workdir /antora/docs \
             skainet-antora:local \
diff --git a/docs/antora-playbook.yml b/docs/antora-playbook.yml
index 4c7b9bca..42efb873 100644
--- a/docs/antora-playbook.yml
+++ b/docs/antora-playbook.yml
@@ -2,6 +2,16 @@ site:
   title: SKaiNET
   start_page: skainet::index.adoc
 
+# Keep Antora's content cache inside the project tree so the
+# container can be run as a non-root user (via `docker run --user
+# $(id -u):$(id -g)`). Without this, Antora defaults to
+# `$HOME/.cache/antora` which is unwritable when the container
+# process has no matching passwd entry and $HOME falls back to `/`.
+# The `.cache/` path is already gitignored via the pre-staged
+# `## antora` section in the repo root .gitignore.
+runtime:
+  cache_dir: ./.cache/antora
+
 content:
   sources:
     - url: /antora

From 724f72bdae4637dc90aaec701d2934c97bea9b05 Mon Sep 17 00:00:00 2001
From: Michal Harakal <michal.harakal@googlemail.com>
Date: Mon, 13 Apr 2026 16:28:28 +0200
Subject: [PATCH 4/4] Drop kroki, render mermaid locally via mmdc (#496 step 4)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Replaces the asciidoctor-kroki dependency with a small local
Asciidoctor block processor that invokes the @mermaid-js/mermaid-cli
binary baked into the Antora Docker image directly. Eliminates the
last build warning AND removes the build-time network dependency
on kroki.io entirely.

## Why

asciidoctor-kroki sends the diagram source to kroki.io (by default
via GET with the source encoded into the URL). The GET path has a
4 KB URL length limit, so larger diagrams come back with HTTP 400
and the block is silently dropped. Switching the extension to POST
did not help — kroki.io also rejected the content for a different
reason, with an empty response body and no diagnostic. Turning
around each mermaid block through a network round trip for every
build was already a sore point; finding that the only path to
reliable rendering was "give up on the external service entirely"
made the decision clear.

The new pipeline is purely local:

    [mermaid]         -->  local-mermaid-extension.js
    ----              -->  writes source to /tmp/skainet-mm-*/in.mmd
    source            -->  exec mmdc -i in.mmd -o out.svg
    ----              -->  reads out.svg
                      -->  emits as a `pass` block (inline SVG)

mermaid-cli was already in the image from day one for the
asciidoctor-kroki "local fetch" path. Removing kroki and wiring
mermaid-cli directly via a 70-line extension is a strictly smaller
build dependency tree and strictly more reliable: no network, no
rate limits, no URL length caps, no flakes on CI, deterministic
outputs.

## Changes

1. `docs/.docker/Dockerfile`:
   - Drop `asciidoctor-kroki@0.18` from the npm install list.
   - `COPY local-mermaid-extension.js /opt/antora/` so the
     playbook can reference it by absolute path without any
     volume-mount gymnastics at run time.
   - Update the image description label.

2. `docs/.docker/local-mermaid-extension.js` (new):
   Asciidoctor.js block processor mirroring the shape used by
   asciidoctor-kroki (same onContext / process / createBlock
   pattern) but dispatching to /opt/antora/node_modules/.bin/mmdc
   via child_process.execSync with the Puppeteer config the image
   already writes at /opt/antora/puppeteer-config.json. Renders
   to a temp dir, reads the SVG, returns it inline via a `pass`
   block. Cleans the temp dir in a finally. On render failure
   emits a literal block containing the original mermaid source
   + the stderr from mmdc and logs a warning, matching the
   degradation style of the upstream kroki extension.

3. `docs/antora-playbook.yml`:
   - Swap `asciidoctor-kroki` extension for
     `/opt/antora/local-mermaid-extension.js`.
   - Drop the `kroki-fetch-diagram` and `kroki-http-method`
     attributes — both dead code now.

4. `docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc`:
   The first render against real mermaid-cli surfaced a
   previously-hidden authoring bug: one of the `sequenceDiagram`
   participants was aliased as `Opt`, and `opt` is a mermaid
   sequenceDiagram keyword (for optional blocks). Mermaid's
   parser matches keywords case-insensitively and was treating
   `Opt` as the start of an opt-block, producing:

       Parse error on line 12: ...HLO->>Opt: Unoptimized IR
       Expecting '+', '-', '()', 'ACTOR', got 'opt'

   Rename the alias to `Optimizer` and drop the `as` clause.
   Kroki had been silently rejecting this diagram for a
   different reason the whole time; local rendering surfaced
   the actual bug.

## Verification

    docker build --no-cache -t skainet-antora:local docs/.docker
    rm -rf docs/build/site docs/.cache
    docker run --rm --user "$(id -u):$(id -g)" \
        -v "$PWD:/antora" -w /antora \
        skainet-antora:local docs/antora-playbook.yml
    grep -c "<svg" docs/build/site/skainet/tutorials/hlo-getting-started.html
    # => 3  (one inline SVG per [mermaid] block, all three diagrams)
    ./gradlew --no-daemon bundleDokkaIntoSite
    ls docs/build/site/api   # full Dokka aggregate present

Antora warnings + errors on the full build: 0 + 0. Down from the
13 warnings the Antora migration landed with in #494.

Fourth and final step of the Antora migration polish pass. See #496.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---
 docs/.docker/Dockerfile                       | 21 +++--
 docs/.docker/local-mermaid-extension.js       | 91 +++++++++++++++++++
 docs/antora-playbook.yml                      | 13 +--
 .../pages/tutorials/hlo-getting-started.adoc  | 10 +-
 4 files changed, 118 insertions(+), 17 deletions(-)
 create mode 100644 docs/.docker/local-mermaid-extension.js

diff --git a/docs/.docker/Dockerfile b/docs/.docker/Dockerfile
index 67c21ba6..fecaca3c 100644
--- a/docs/.docker/Dockerfile
+++ b/docs/.docker/Dockerfile
@@ -1,8 +1,8 @@
 FROM node:20-alpine
 
 LABEL org.opencontainers.image.title="SKaiNET Antora" \
-      org.opencontainers.image.description="Antora site generator with built-in Mermaid rendering" \
-      org.opencontainers.image.source="https://github.com/SKaiNET-developers/SKaiNET-transformers"
+      org.opencontainers.image.description="Antora site generator with direct local Mermaid rendering (no Kroki round trip)" \
+      org.opencontainers.image.source="https://github.com/SKaiNET-developers/SKaiNET"
 
 # Chromium for mermaid-cli (puppeteer)
 RUN apk add --no-cache chromium font-noto
@@ -10,25 +10,34 @@ RUN apk add --no-cache chromium font-noto
 ENV PUPPETEER_EXECUTABLE_PATH=/usr/bin/chromium-browser \
     PUPPETEER_SKIP_DOWNLOAD=true
 
-# Install Antora + extensions to /opt/antora (not /antora which gets volume-mounted)
+# Install Antora + mermaid-cli into /opt/antora (not /antora which gets
+# volume-mounted at run time). asciidoctor-kroki is intentionally NOT
+# installed — it depends on a Kroki HTTP server (kroki.io or local)
+# which returns 400 for large diagrams when using GET and has no
+# offline fallback. We render mermaid directly via mermaid-cli through
+# the local-mermaid-extension.js asciidoctor block processor.
 WORKDIR /opt/antora
 RUN npm init -y && npm i --save-exact \
       @antora/cli@3.1 \
       @antora/site-generator@3.1 \
-      asciidoctor-kroki@0.18 \
       @mermaid-js/mermaid-cli@11 \
     && npm cache clean --force
 
 # Make installed modules visible when workdir is the mounted project
 ENV NODE_PATH=/opt/antora/node_modules
 
-# Mermaid-cli config
+# Mermaid-cli config — used by the local-mermaid-extension to drive
+# Puppeteer against the pre-installed Alpine Chromium.
 RUN echo '{ \
   "executablePath": "/usr/bin/chromium-browser", \
   "args": ["--no-sandbox", "--disable-gpu", "--disable-dev-shm-usage"] \
 }' > /opt/antora/puppeteer-config.json
 
-# Verify mermaid works
+# Bake the local mermaid extension in at an absolute path so the
+# Antora playbook can reference it without any volume-mount gymnastics.
+COPY local-mermaid-extension.js /opt/antora/local-mermaid-extension.js
+
+# Verify mermaid-cli works end to end at image build time.
 RUN echo 'graph TD; A-->B;' > /tmp/test.mmd \
     && npx mmdc -i /tmp/test.mmd -o /tmp/test.svg -p /opt/antora/puppeteer-config.json \
     && rm /tmp/test.mmd /tmp/test.svg
diff --git a/docs/.docker/local-mermaid-extension.js b/docs/.docker/local-mermaid-extension.js
new file mode 100644
index 00000000..35b4c776
--- /dev/null
+++ b/docs/.docker/local-mermaid-extension.js
@@ -0,0 +1,91 @@
+'use strict'
+
+/*
+ * Local mermaid block processor for Asciidoctor.js.
+ *
+ * Replaces the asciidoctor-kroki dependency on kroki.io (and its
+ * GET URL length limit / 400 rejections on large diagrams) with a
+ * direct invocation of `mmdc` — the @mermaid-js/mermaid-cli binary
+ * that the SKaiNET Antora Docker image already bakes in for its
+ * Chromium-backed Puppeteer rendering path.
+ *
+ * The extension is registered via the Antora playbook's
+ * `asciidoc.extensions` list and gets passed the Asciidoctor.js
+ * `registry` object. For every `[mermaid]\n----\n...\n----` block
+ * in any page, we:
+ *
+ *   1. write the source to a temp file
+ *   2. exec `mmdc -i in.mmd -o out.svg -p puppeteer-config.json`
+ *      (synchronous — Antora processes one page at a time and the
+ *      mermaid-cli call is fast enough that sync is fine)
+ *   3. read the produced SVG
+ *   4. inline it via a `pass` block so Asciidoctor emits the raw
+ *      SVG markup straight into the HTML output
+ *
+ * On render failure we fall back to a literal block containing
+ * the original source plus the error message, matching the
+ * degradation mode asciidoctor-kroki uses.
+ */
+
+const { execSync } = require('child_process')
+const { mkdtempSync, writeFileSync, readFileSync, rmSync } = require('fs')
+const { tmpdir } = require('os')
+const { join } = require('path')
+
+// Absolute paths baked into /opt/antora at image build time.
+// These have to match the Dockerfile that installs mermaid-cli and
+// writes the puppeteer config.
+const MMDC_BIN = '/opt/antora/node_modules/.bin/mmdc'
+const PUPPETEER_CONFIG = '/opt/antora/puppeteer-config.json'
+
+function renderMermaidToSvg (source) {
+  const dir = mkdtempSync(join(tmpdir(), 'skainet-mm-'))
+  const inputPath = join(dir, 'in.mmd')
+  const outputPath = join(dir, 'out.svg')
+  writeFileSync(inputPath, source, 'utf8')
+  try {
+    execSync(
+      `${MMDC_BIN} -i ${inputPath} -o ${outputPath} -p ${PUPPETEER_CONFIG} --quiet`,
+      { stdio: ['ignore', 'ignore', 'pipe'] }
+    )
+    return readFileSync(outputPath, 'utf8')
+  } finally {
+    try { rmSync(dir, { recursive: true, force: true }) } catch (_) { /* noop */ }
+  }
+}
+
+function mermaidBlockFactory () {
+  return function () {
+    const self = this
+    self.named('mermaid')
+    self.onContext(['listing', 'literal'])
+    self.process((parent, reader, attrs) => {
+      const source = reader.$read()
+      try {
+        const svg = renderMermaidToSvg(source)
+        return self.createBlock(parent, 'pass', svg, attrs)
+      } catch (err) {
+        const logger = parent.getDocument().getLogger()
+        logger.warn(`local-mermaid-extension: failed to render block — ${err.message}`)
+        const role = attrs.role
+        attrs.role = role ? `${role} mermaid-error` : 'mermaid-error'
+        return self.createBlock(
+          parent,
+          'literal',
+          `Error rendering mermaid diagram:\n${err.message}\n\n${source}`,
+          attrs
+        )
+      }
+    })
+  }
+}
+
+module.exports.register = function register (registry) {
+  if (typeof registry.register === 'function') {
+    registry.register(function () {
+      this.block('mermaid', mermaidBlockFactory())
+    })
+  } else if (typeof registry.block === 'function') {
+    registry.block('mermaid', mermaidBlockFactory())
+  }
+}
diff --git a/docs/antora-playbook.yml b/docs/antora-playbook.yml
index 42efb873..5c9493cf 100644
--- a/docs/antora-playbook.yml
+++ b/docs/antora-playbook.yml
@@ -20,12 +20,13 @@ content:
 
 asciidoc:
   extensions:
-    - asciidoctor-kroki
-  attributes:
-    # Use local mermaid-cli via Kroki (no external server needed when
-    # built with the custom Docker image in docs/.docker/Dockerfile —
-    # copied verbatim from SKaiNET-transformers).
-    kroki-fetch-diagram: true
+    # Local mermaid block processor — renders every `[mermaid]` block
+    # inline by invoking the @mermaid-js/mermaid-cli binary baked into
+    # the Docker image at /opt/antora/node_modules/.bin/mmdc. Replaces
+    # asciidoctor-kroki so builds don't depend on kroki.io at all,
+    # which eliminates the GET-URL length limit (4 KB) that was
+    # rejecting the large diagrams in hlo-getting-started.adoc.
+    - /opt/antora/local-mermaid-extension.js
 
 ui:
   bundle:
diff --git a/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc b/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
index 65e70b84..cad26ee6 100644
--- a/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
+++ b/docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc
@@ -168,15 +168,15 @@ sequenceDiagram
     participant DAG as Compute Graph
     participant Conv as HLO Converter
     participant HLO as StableHLO IR
-    participant Opt as Optimizer
-    
+    participant Optimizer
+
     DSL->>DAG: rgb2GrayScaleMatMul()
     DAG->>Conv: MatMul + Transpose ops
     Conv->>HLO: stablehlo.dot_general
     Conv->>HLO: stablehlo.transpose
-    HLO->>Opt: Unoptimized IR
-    Opt->>HLO: Optimized IR
-    
+    HLO->>Optimizer: Unoptimized IR
+    Optimizer->>HLO: Optimized IR
+
     Note over Conv,HLO: Type inference:<br/>tensor<BxCxHxWxf32> → tensor<Bx1xHxWxf32>
 ----