HLO: add ConstantMaterializationPolicy seam (PR B of #523)#524
Merged
michalharakal merged 1 commit intodevelopfrom Apr 18, 2026
Merged
Conversation
Introduces the policy seam that lets callers lift large constant tensors out of inline `stablehlo.constant dense<...>` and into `util.global` module declarations backed by a downstream IREE parameter archive. Default policy is InlineAlways, so every existing caller gets byte-for-byte identical MLIR — the external path is strictly opt-in. This is PR B of the architecture tracked in #523. It lands the typing, wiring, and emission logic; PR C adds the .irpa packager that consumes [StableHloModule.externalParameters]; PR D flips the policy in skainet-whisper; PR E teaches the gguf / safetensors loaders to back the BufferHandles with mmap for zero-copy. Pieces: - `ConstantMaterializationPolicy` sealed interface: InlineAlways, ExternalAlways(scope), SizeThreshold(bytes, scope). Size measured in logical bytes via `TensorEncoding.physicalBytes`, independent of MLIR text formatting. - `ExternalParameterRef`: scope + key + TensorEncoding + BufferHandle. The converter never copies bytes; it hands the source handle through so mmap-backed callers get zero-copy. - `StableHloModule.externalParameters`: new field surfacing every externalized ref. - `ConversionContext` grows a module-scope declaration buffer and a ref registry. `StableHloConverter.convert()` now assembles content at end of conversion so `util.global` decls slot between `module { ... }` and `func.func` without string surgery. - Byte serializer (`numberListToLittleEndianBytes`) materializes `values` / `initial_value` lists into little-endian bytes for FP32 / FP64 / I32 / I64. Pads under-filled lists with zeros. Falls back to inline on unsupported dtypes so the default path stays safe. - `ConstantOperationsConverter.convertTensorConstant` and `convertParameterConstant` consult the policy at their first step via a shared `tryMaterializeExternal` helper. - Factory methods (createBasic / createExtended / createFast / createCustom) accept the policy via `@JvmOverloads`, preserving the argless call site for every existing caller. Tests: - `testDefaultPolicyIsInlineAlways` — zero-behavior-change invariant. - `testExternalAlwaysEmitsUtilGlobalAndRegistersRef` — end-to-end: module decl + util.global.load + ExternalParameterRef with the right scope, key, and byte count. - `testSizeThresholdSplitsBySize` — hybrid policy correctly routes small tensors inline and large ones external. - `testModuleAttrsHeaderStillEmittedAboveUtilGlobal` — pins the emission ordering so IREE's parser sees a valid module shape. Relates to #523 (design), supersedes part of #519. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
michalharakal
added a commit
that referenced
this pull request
Apr 19, 2026
Writes IREE parameter archives (.irpa) consumable by `iree-compile --iree-opt-import-parameters=<path>` and by `iree-run-module --parameters=<scope>=<path>`. The archive format per IREE's `parameter_archive.h`: +--- 40 B ----+ fixed header (magic "IRPA", version, counts) +--- 48 B ----+ three segment references +--- pad 16 --+ +-------------+ entry segment: 80-byte DATA records +-------------+ metadata segment: concatenated key bytes +--- pad 64 --+ +-------------+ storage segment: raw tensor bytes per entry All u16/u32/u64 values little-endian. The C entry-header struct has an implicit 4-byte pad after `u32 type` to align the following `u64 flags` — the writer emits that pad explicitly and a byte-level test pins the 80-byte layout so future changes can't silently re-break it. No scope column in the archive itself: scope is a runtime binding (`--parameters=<scope>=<file>`). Callers with multiple scopes group via `IrpaWriter.groupByScope(refs)` and emit one .irpa per scope. The writer delegates byte sourcing to `BufferHandle`; Owned and Borrowed variants are wired today, with Mapped / FileBacked landing in PR E (#523) to give the gguf and safetensors loaders a zero-copy path into the archive. ### Companion fix in skainet-compile-hlo PR B (#524) emitted `util.global private @key : type` without an initializer, which iree-compile treats as uninitialized — it would not import anything from a .irpa. PR C completes the emission: util.global private @key = #flow.parameter.named<"scope"::"key"> : type %r = util.global.load @key : type `MlirValidator` was also taught to accept module-scope `util.global` assignments: the `@`-prefixed symbol is a global, not an SSA value, and must not trip the existing `%`-only SSA-format check. ### Tests - IrpaWriterTest pins the byte layout against the IREE spec — header magic/version, segment offsets, entry-record fields, key concatenation, data placement with 64-byte per-entry alignment, Owned / Borrowed handle support, groupByScope ordering, empty- input rejection. - Existing ConstantMaterializationPolicyTest updated to assert the new `#flow.parameter.named<...>` initializer on every externalized global. - Full `:skainet-compile:skainet-compile-hlo:jvmTest` and `:skainet-io:skainet-io-iree-params:jvmTest` pass. ### Deferred verification A real `iree-compile --iree-opt-import-parameters=<written.irpa>` round-trip test will land once CI has an IREE toolchain available. Byte-level layout tests are a close proxy — any deviation from the reference C format breaks both. Part of #523. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConstantMaterializationPolicy(InlineAlways / ExternalAlways / SizeThreshold) andExternalParameterReftypes.ConstantOperationsConverterto consult the policy — external path emitsutil.global private @<key>at module scope +util.global.load @<key>in the function body, and records anExternalParameterRefon the module.InlineAlways→ zero behavior change for every existing caller. The external path is strictly opt-in.Why this shape
The HLO converter never writes numerical weight bytes; it emits symbolic references and hands a
BufferHandlethrough to a downstream packager. This keeps weight-format concerns out of the converter and fits naturally alongside IREE's.irpamechanism. Full architecture in #523.What's NOT here (by design)
.irpawriter — that's PR C (skainet-io-iree-paramsmodule, peer ofskainet-io-gguf).This PR lands the seam that unblocks C/D/E to proceed in parallel afterward.
Design notes worth a review eye
SizeThresholdmeasures logical bytes (TensorEncoding.physicalBytes(elementCount)), not MLIR text size — decision is independent of downstream splat / dense formatting.commonMainusingtoRawBits(no JVM-only streams), little-endian. FP32/FP64/I32/I64 supported; unsupported dtypes fall back to inline with a comment — policy cannot make the IR worse than the default.StableHloConverter.convert()was refactored to assemble content at end of conversion soutil.globaldecls slot betweenmodule { ... }andfunc.funccleanly; no string surgery.@JvmOverloadsonConversionContextandStableHloConverterconstructors keeps every existing Java caller compiling.Test plan
testDefaultPolicyIsInlineAlways— zero-behavior-change invariant.testExternalAlwaysEmitsUtilGlobalAndRegistersRef— end-to-end:util.global private,util.global.load,ExternalParameterRef{scope, key, 16 bytes}.testSizeThresholdSplitsBySize— 2x2 f32 (16B) inline, 4x4 f32 (64B) external, at threshold=32B.testModuleAttrsHeaderStillEmittedAboveUtilGlobal— pins emission ordering so IREE's parser gets a valid module shape.:skainet-compile:skainet-compile-hlo:jvmTest— all existing tests still green.:skainet-compile:skainet-compile-hlo:apiCheck— public API diff captured inapi/jvm/skainet-compile-hlo.api.Relates to #523 (design). Part of the architectural answer to #519.
🤖 Generated with Claude Code