Docs: refresh DSL→DAG→StableHLO→IREE pipeline + beginner tutorial

## Problem

The Antora docs describe the compilation chain as targeting **XLA** (`docs/modules/ROOT/pages/tutorials/hlo-getting-started.adoc`), but the real target is **IREE** — and the recent `#520`–`#525` work has changed how weights, type annotations, and module headers are emitted. A beginner reading the docs today gets an outdated mental model and no runnable end-to-end example.

`docs/modules/ROOT/pages/reference/architecture.adoc` is 11 lines pointing at the SVG diagram and explicitly inviting a deeper write-up. Good place to add the new content.

## What's stale / missing

### In `hlo-getting-started.adoc`
- "Hardware Target Compilation via XLA" section should be rewritten for IREE. Mention `iree-compile`, VMFB output, target backends like `vulkan-spirv`, `llvm-cpu`, `cuda`. The prerequisites section currently installs XLA / CUDA / ROCm toolchains — should point at IREE.
- Pipeline diagram stops at "XLA Compiler → Hardware Executables". Extend: `StableHLO MLIR → iree-compile → .vmfb + (optional) .irpa → iree-run-module`.
- The rgb2grayscale StableHLO example (line ~189) is accurate post-#520 (`contracting_dims = [3] x [0]`, `(inputType) -> outputType`) but doesn't show the newer constant-materialization paths.

### Missing sections entirely
- **Weight externalization** (PRs B–E, #523): the `ConstantMaterializationPolicy` seam, `.irpa` sidecar packaging, `#flow.parameter.named<"scope"::"key">` MLIR idiom, when to stay inline vs go external. Reference example of the emitted MLIR with both paths side-by-side.
- **`skainet-io-iree-params`**: document `IrpaWriter`, how to pair it with `StableHloConverter.convert(...).externalParameters`, and the `iree-compile --iree-opt-import-parameters=...` CLI invocation.
- **Type annotation emission** (post-#520 / #521): transpose and dot_general now emit `(inputType) -> outputType`; reshape pulls source type from the SSA value map. Worth a short "what the converter guarantees about emitted syntax" section because it's a contract callers depend on.
- **Mobile GPU target** (Amlogic / Mali Valhall / Android 14): this is the real driver for #518 / #519 / #523. Currently invisible in the docs.

### In `reference/architecture.adoc`
- Almost empty. Load-bearing content from this issue should land here:
  - The layering: `skainet-lang` → `skainet-compile-dag` → `skainet-compile-hlo` → `skainet-io-iree-params` → `iree-compile` → runtime.
  - Where `TensorSpec` / `TensorEncoding` / `BufferHandle` fit (they are the shared vocabulary across DSL, compile, and IO).
  - The policy seam and why the converter never writes numerical bytes.

## Beginner tutorial — proposed structure

Split `hlo-getting-started.adoc` or add a new `tutorials/first-model-to-iree.adoc`. Staircase from minimum to meaningful:

### Stage 1 — element-wise \`y = 2x + 1\`
- Smallest possible graph (one input, two ops, one output). No weights.
- Show: DSL Kotlin, the emitted StableHLO text, `iree-compile --iree-hal-target-backends=llvm-cpu` invocation, `iree-run-module` call with a hand-crafted input, expected output. Reader can reproduce in under 5 minutes on a laptop.

### Stage 2 — linear layer \`y = Wx + b\` with tiny inline weights
- 2×2 weight matrix, size-2 bias. Introduces `parameter`/`weight` ops but keeps them inline (policy defaults to `InlineAlways`). Reader sees a real `stablehlo.constant dense<[[...]]>` in the emitted MLIR.
- Hand-verify the math: `[1, 2] × [[1, 0], [0, 1]] + [0, 0] = [1, 2]`.

### Stage 3 — same linear layer with external weights
- Same Kotlin DSL, single-line change to `createBasic(ExternalAlways())`. Shows how `externalParameters` comes back populated, write with `IrpaWriter`, invoke `iree-compile --iree-opt-import-parameters=my.irpa`. Same runtime behavior, but now the `.mlir` text no longer contains weight values.
- Builds intuition for why this matters at scale (Whisper-tiny: 151 MB → <1 MB text).

### Stage 4 (optional, later) — rgb2grayscale revisited
- The existing example, but end-to-end runnable against IREE on CPU. Keep it as the "second example" for readers who want something slightly more meaningful than a scalar op.

## Out of scope for this issue

- Mobile deployment tutorial (separate issue when the Android runtime story is ready).
- Training tutorials — SKaiNET is inference-first today.
- Full Diátaxis reorganization. This issue refreshes three specific pages; broader IA work is a separate thread.

## Suggested commit cadence

Three small PRs beat one big one here:

1. **Refresh `hlo-getting-started.adoc`** — rewrite XLA references to IREE, fix pipeline diagrams, update prerequisites. No new content.
2. **Add beginner tutorial** — `tutorials/first-model-to-iree.adoc` with Stages 1–3 above. New page, linked from the tutorials index and from `hlo-getting-started.adoc`.
3. **Flesh out `reference/architecture.adoc`** — layering, shared vocabulary, policy seam. Link to #523 for the weight-externalization rationale.

## Context / references

- Recently merged in the DSL→IREE chain: #520 (transpose / dot_general syntax), #521 (input-type tracking), #522 (splat collapse), #524 (ConstantMaterializationPolicy), #525 (IrpaWriter + `#flow.parameter.named` emission).
- Architecture design: #523.
- Original bug reports that drove the changes: #518, #519.
- Runner-side context: `feature/iree-vulkan-gpu` branch in skainet-whisper (out-of-repo, MagentaTV One target: Amlogic Mali Valhall, Android 14).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docs: refresh DSL→DAG→StableHLO→IREE pipeline + beginner tutorial #526

Problem

What's stale / missing

In `hlo-getting-started.adoc`

Missing sections entirely

In `reference/architecture.adoc`

Beginner tutorial — proposed structure

Stage 1 — element-wise `y = 2x + 1`

Stage 2 — linear layer `y = Wx + b` with tiny inline weights

Stage 3 — same linear layer with external weights

Stage 4 (optional, later) — rgb2grayscale revisited

Out of scope for this issue

Suggested commit cadence

Context / references

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Docs: refresh DSL→DAG→StableHLO→IREE pipeline + beginner tutorial #526

Description

Problem

What's stale / missing

In hlo-getting-started.adoc

Missing sections entirely

In reference/architecture.adoc

Beginner tutorial — proposed structure

Stage 1 — element-wise `y = 2x + 1`

Stage 2 — linear layer `y = Wx + b` with tiny inline weights

Stage 3 — same linear layer with external weights

Stage 4 (optional, later) — rgb2grayscale revisited

Out of scope for this issue

Suggested commit cadence

Context / references

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

In `hlo-getting-started.adoc`

In `reference/architecture.adoc`