Skip to content

skainet-io: Int overflow in StreamingGGUFReader for tensors > 2GB #455

@michalharakal

Description

@michalharakal

Summary

StreamingTensorInfo.nBytes is Int, which overflows for tensors whose byte
size exceeds Int.MAX_VALUE (2,147,483,647 bytes ≈ 2 GB). This blocks loading
any GGUF model that contains a single tensor larger than ~2 GB, including
Gemma 4 E4B (per_layer_token_embd.weight is ~1.4 GB in Q4_K_M but the
intermediate LongInt cast overflows earlier in the arithmetic chain).

Reproduction

gemma-4-E4B-it-Q4_K_M.gguf  (5.0 GB)
Exception in thread "main" java.lang.IllegalArgumentException:
  Length must be non-negative: -1982857216
    at sk.ainet.io.JvmRandomAccessSource.readAt(JvmRandomAccessSource.kt:30)
    at sk.ainet.io.gguf.StreamingGGUFReader.loadTensorData(StreamingGGUFReader.kt:84)

Root cause

File: skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt

Line ~270 in the tensor size calculation:

val numBlocks = nElements / blockSize       // Long
(numBlocks * typeSize).toInt()              // ← overflow here

numBlocks * typeSize produces a Long that is silently truncated to Int.
For the per_layer_token_embd.weight tensor (262144 × 10752 elements in
Q4_K_M), this exceeds Int.MAX_VALUE and wraps negative.

File: skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt

StreamingTensorInfo.nBytes is declared as Int:

data class StreamingTensorInfo(
    ...
    val nBytes: Int,        // ← should be Long
    ...
)

File: skainet-io/skainet-io-core/src/jvmMain/kotlin/sk/ainet/io/JvmRandomAccessSource.kt

readAt only accepts Int length:

fun readAt(position: Long, length: Int): ByteArray

Proposed fix

1. StreamingTensorInfo — change nBytes to Long

// Before:
val nBytes: Int,

// After:
val nBytes: Long,

2. StreamingGGUFReader — keep computation in Long

// Before (line ~270):
(numBlocks * typeSize).toInt()

// After:
numBlocks * typeSize

3. StreamingGGUFReader.loadTensorData — handle large tensors

For tensors > 2 GB, readAt needs a Long length overload, or load in
chunks. A simple approach:

public fun loadTensorData(tensor: StreamingTensorInfo): ByteArray {
    require(tensor.nBytes <= Int.MAX_VALUE) {
        "Tensor '${tensor.name}' is ${tensor.nBytes} bytes (> 2 GB). " +
        "Use chunked loading or memory-mapped I/O."
    }
    return source.readAt(tensor.absoluteDataOffset, tensor.nBytes.toInt())
}

This makes the overflow explicit rather than silent, and for most quantized
models (Q4_K_M, Q8_0) individual tensors are well under 2 GB.

4. RandomAccessSource / JvmRandomAccessSource — add Long length overload (optional)

For full large-tensor support, add:

fun readAt(position: Long, length: Long): ByteArray

Using FileChannel.map() for memory-mapped reads, or chunked read() calls.

Impact

  • Gemma 4 E4B (5 GB Q4_K_M): blocked — per_layer_token_embd.weight overflows
  • Gemma 4 E2B (smaller): likely works (smaller PLE table)
  • All existing models (Llama, Qwen, Gemma 3n): unaffected — no individual tensor > 2 GB
  • Future large models: any model with a single tensor > 2 GB will hit this

Affected repos

  • SKaiNET/skainet-io (the fix)
  • SKaiNET-transformers (consumer — blocked on Gemma 4 E4B GGUF loading)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions