Skip to content

HLO: add ConstantMaterializationPolicy seam (PR B of #523)#524

Merged
michalharakal merged 1 commit intodevelopfrom
feature/hlo-constant-materialization-policy
Apr 18, 2026
Merged

HLO: add ConstantMaterializationPolicy seam (PR B of #523)#524
michalharakal merged 1 commit intodevelopfrom
feature/hlo-constant-materialization-policy

Conversation

@michalharakal
Copy link
Copy Markdown
Contributor

Summary

  • Adds ConstantMaterializationPolicy (InlineAlways / ExternalAlways / SizeThreshold) and ExternalParameterRef types.
  • Teaches ConstantOperationsConverter to consult the policy — external path emits util.global private @<key> at module scope + util.global.load @<key> in the function body, and records an ExternalParameterRef on the module.
  • Default InlineAlwayszero behavior change for every existing caller. The external path is strictly opt-in.

Why this shape

The HLO converter never writes numerical weight bytes; it emits symbolic references and hands a BufferHandle through to a downstream packager. This keeps weight-format concerns out of the converter and fits naturally alongside IREE's .irpa mechanism. Full architecture in #523.

What's NOT here (by design)

  • No .irpa writer — that's PR C (skainet-io-iree-params module, peer of skainet-io-gguf).
  • No caller flipped to external — that's PR D (skainet-whisper wiring).
  • No mmap BufferHandles — that's PR E (zero-copy for gguf + safetensors).

This PR lands the seam that unblocks C/D/E to proceed in parallel afterward.

Design notes worth a review eye

  • SizeThreshold measures logical bytes (TensorEncoding.physicalBytes(elementCount)), not MLIR text size — decision is independent of downstream splat / dense formatting.
  • Byte serializer is in commonMain using toRawBits (no JVM-only streams), little-endian. FP32/FP64/I32/I64 supported; unsupported dtypes fall back to inline with a comment — policy cannot make the IR worse than the default.
  • StableHloConverter.convert() was refactored to assemble content at end of conversion so util.global decls slot between module { ... } and func.func cleanly; no string surgery.
  • @JvmOverloads on ConversionContext and StableHloConverter constructors keeps every existing Java caller compiling.

Test plan

  • testDefaultPolicyIsInlineAlways — zero-behavior-change invariant.
  • testExternalAlwaysEmitsUtilGlobalAndRegistersRef — end-to-end: util.global private, util.global.load, ExternalParameterRef{scope, key, 16 bytes}.
  • testSizeThresholdSplitsBySize — 2x2 f32 (16B) inline, 4x4 f32 (64B) external, at threshold=32B.
  • testModuleAttrsHeaderStillEmittedAboveUtilGlobal — pins emission ordering so IREE's parser gets a valid module shape.
  • :skainet-compile:skainet-compile-hlo:jvmTest — all existing tests still green.
  • :skainet-compile:skainet-compile-hlo:apiCheck — public API diff captured in api/jvm/skainet-compile-hlo.api.

Relates to #523 (design). Part of the architectural answer to #519.

🤖 Generated with Claude Code

Introduces the policy seam that lets callers lift large constant
tensors out of inline `stablehlo.constant dense<...>` and into
`util.global` module declarations backed by a downstream IREE
parameter archive. Default policy is InlineAlways, so every existing
caller gets byte-for-byte identical MLIR — the external path is
strictly opt-in.

This is PR B of the architecture tracked in #523. It lands the
typing, wiring, and emission logic; PR C adds the .irpa packager
that consumes [StableHloModule.externalParameters]; PR D flips the
policy in skainet-whisper; PR E teaches the gguf / safetensors
loaders to back the BufferHandles with mmap for zero-copy.

Pieces:

- `ConstantMaterializationPolicy` sealed interface: InlineAlways,
  ExternalAlways(scope), SizeThreshold(bytes, scope). Size measured
  in logical bytes via `TensorEncoding.physicalBytes`, independent
  of MLIR text formatting.
- `ExternalParameterRef`: scope + key + TensorEncoding + BufferHandle.
  The converter never copies bytes; it hands the source handle through
  so mmap-backed callers get zero-copy.
- `StableHloModule.externalParameters`: new field surfacing every
  externalized ref.
- `ConversionContext` grows a module-scope declaration buffer and a
  ref registry. `StableHloConverter.convert()` now assembles content
  at end of conversion so `util.global` decls slot between
  `module { ... }` and `func.func` without string surgery.
- Byte serializer (`numberListToLittleEndianBytes`) materializes
  `values` / `initial_value` lists into little-endian bytes for
  FP32 / FP64 / I32 / I64. Pads under-filled lists with zeros. Falls
  back to inline on unsupported dtypes so the default path stays safe.
- `ConstantOperationsConverter.convertTensorConstant` and
  `convertParameterConstant` consult the policy at their first step
  via a shared `tryMaterializeExternal` helper.
- Factory methods (createBasic / createExtended / createFast /
  createCustom) accept the policy via `@JvmOverloads`, preserving the
  argless call site for every existing caller.

Tests:

- `testDefaultPolicyIsInlineAlways` — zero-behavior-change invariant.
- `testExternalAlwaysEmitsUtilGlobalAndRegistersRef` — end-to-end:
  module decl + util.global.load + ExternalParameterRef with the
  right scope, key, and byte count.
- `testSizeThresholdSplitsBySize` — hybrid policy correctly routes
  small tensors inline and large ones external.
- `testModuleAttrsHeaderStillEmittedAboveUtilGlobal` — pins the
  emission ordering so IREE's parser sees a valid module shape.

Relates to #523 (design), supersedes part of #519.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@michalharakal michalharakal merged commit 81526a0 into develop Apr 18, 2026
4 checks passed
@michalharakal michalharakal deleted the feature/hlo-constant-materialization-policy branch April 18, 2026 19:04
michalharakal added a commit that referenced this pull request Apr 19, 2026
Writes IREE parameter archives (.irpa) consumable by
`iree-compile --iree-opt-import-parameters=<path>` and by
`iree-run-module --parameters=<scope>=<path>`.

The archive format per IREE's `parameter_archive.h`:

  +--- 40 B ----+ fixed header (magic "IRPA", version, counts)
  +--- 48 B ----+ three segment references
  +--- pad 16 --+
  +-------------+ entry segment: 80-byte DATA records
  +-------------+ metadata segment: concatenated key bytes
  +--- pad 64 --+
  +-------------+ storage segment: raw tensor bytes per entry

All u16/u32/u64 values little-endian. The C entry-header struct has
an implicit 4-byte pad after `u32 type` to align the following `u64
flags` — the writer emits that pad explicitly and a byte-level test
pins the 80-byte layout so future changes can't silently re-break it.

No scope column in the archive itself: scope is a runtime binding
(`--parameters=<scope>=<file>`). Callers with multiple scopes group
via `IrpaWriter.groupByScope(refs)` and emit one .irpa per scope.

The writer delegates byte sourcing to `BufferHandle`; Owned and
Borrowed variants are wired today, with Mapped / FileBacked landing
in PR E (#523) to give the gguf and safetensors loaders a zero-copy
path into the archive.

### Companion fix in skainet-compile-hlo

PR B (#524) emitted `util.global private @key : type` without an
initializer, which iree-compile treats as uninitialized — it would
not import anything from a .irpa. PR C completes the emission:

    util.global private @key = #flow.parameter.named<"scope"::"key"> : type
    %r = util.global.load @key : type

`MlirValidator` was also taught to accept module-scope `util.global`
assignments: the `@`-prefixed symbol is a global, not an SSA value,
and must not trip the existing `%`-only SSA-format check.

### Tests

- IrpaWriterTest pins the byte layout against the IREE spec —
  header magic/version, segment offsets, entry-record fields, key
  concatenation, data placement with 64-byte per-entry alignment,
  Owned / Borrowed handle support, groupByScope ordering, empty-
  input rejection.
- Existing ConstantMaterializationPolicyTest updated to assert the
  new `#flow.parameter.named<...>` initializer on every externalized
  global.
- Full `:skainet-compile:skainet-compile-hlo:jvmTest` and
  `:skainet-io:skainet-io-iree-params:jvmTest` pass.

### Deferred verification

A real `iree-compile --iree-opt-import-parameters=<written.irpa>`
round-trip test will land once CI has an IREE toolchain available.
Byte-level layout tests are a close proxy — any deviation from the
reference C format breaks both.

Part of #523.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant