Skip to content

HLO: inline dense constants for large tensors — scale / compile-time blocker #519

@michalharakal

Description

@michalharakal

Module

skainet-compile-hlo / ConstantOperationsConverter

Problem

All weight tensors are emitted as inline stablehlo.constant dense<[[...]]> literals. For Whisper-tiny.en, this produces a 151 MB MLIR file where most of the content is floating-point numbers in text form.

Impact

  • Extremely slow to parse (minutes for iree-compile to read).
  • SSA type tracking in post-processors fails on multi-megabyte lines.
  • Not scalable to larger models — Whisper small/medium would be gigabytes of text MLIR.

Suggested fix options

  1. External weight files: emit stablehlo.constant with a reference to an external binary file (IREE supports #util.byte_pattern or resource loading).
  2. Splat constants for zeros: when VoidTensorOps produces zero tensors, emit dense<0.0> splat instead of spelling out every element.
  3. Separate weight serialization: emit the MLIR structure with placeholder constants and load weights at compile time via IREE's parameter mechanism.

Context

Filed from skainet-whisper IREE GPU bring-up (2026-04-18) on branch feature/iree-vulkan-gpu targeting SKaiNET 0.18.0. The ONNX path (ONNX → iree-import-onnx → MLIR → iree-compile → VMFB) works on device today because iree-import-onnx uses external resources for weights. The native SKaiNET DSL path will need an equivalent mechanism to be scalable beyond tiny models.

Test to reproduce:

./gradlew :SKaiNET-voice:testDebugUnitTest --tests "*WhisperHloExportTest*"
# produces SKaiNET-voice/build/iree/encoder_skainet.mlir
ls -lh SKaiNET-voice/build/iree/encoder_skainet.mlir

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions