Summary
StreamingTensorInfo.nBytes is Int, which overflows for tensors whose byte
size exceeds Int.MAX_VALUE (2,147,483,647 bytes ≈ 2 GB). This blocks loading
any GGUF model that contains a single tensor larger than ~2 GB, including
Gemma 4 E4B (per_layer_token_embd.weight is ~1.4 GB in Q4_K_M but the
intermediate Long → Int cast overflows earlier in the arithmetic chain).
Reproduction
gemma-4-E4B-it-Q4_K_M.gguf (5.0 GB)
Exception in thread "main" java.lang.IllegalArgumentException:
Length must be non-negative: -1982857216
at sk.ainet.io.JvmRandomAccessSource.readAt(JvmRandomAccessSource.kt:30)
at sk.ainet.io.gguf.StreamingGGUFReader.loadTensorData(StreamingGGUFReader.kt:84)
Root cause
File: skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt
Line ~270 in the tensor size calculation:
val numBlocks = nElements / blockSize // Long
(numBlocks * typeSize).toInt() // ← overflow here
numBlocks * typeSize produces a Long that is silently truncated to Int.
For the per_layer_token_embd.weight tensor (262144 × 10752 elements in
Q4_K_M), this exceeds Int.MAX_VALUE and wraps negative.
File: skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt
StreamingTensorInfo.nBytes is declared as Int:
data class StreamingTensorInfo(
...
val nBytes: Int, // ← should be Long
...
)
File: skainet-io/skainet-io-core/src/jvmMain/kotlin/sk/ainet/io/JvmRandomAccessSource.kt
readAt only accepts Int length:
fun readAt(position: Long, length: Int): ByteArray
Proposed fix
1. StreamingTensorInfo — change nBytes to Long
// Before:
val nBytes: Int,
// After:
val nBytes: Long,
2. StreamingGGUFReader — keep computation in Long
// Before (line ~270):
(numBlocks * typeSize).toInt()
// After:
numBlocks * typeSize
3. StreamingGGUFReader.loadTensorData — handle large tensors
For tensors > 2 GB, readAt needs a Long length overload, or load in
chunks. A simple approach:
public fun loadTensorData(tensor: StreamingTensorInfo): ByteArray {
require(tensor.nBytes <= Int.MAX_VALUE) {
"Tensor '${tensor.name}' is ${tensor.nBytes} bytes (> 2 GB). " +
"Use chunked loading or memory-mapped I/O."
}
return source.readAt(tensor.absoluteDataOffset, tensor.nBytes.toInt())
}
This makes the overflow explicit rather than silent, and for most quantized
models (Q4_K_M, Q8_0) individual tensors are well under 2 GB.
4. RandomAccessSource / JvmRandomAccessSource — add Long length overload (optional)
For full large-tensor support, add:
fun readAt(position: Long, length: Long): ByteArray
Using FileChannel.map() for memory-mapped reads, or chunked read() calls.
Impact
- Gemma 4 E4B (5 GB Q4_K_M): blocked —
per_layer_token_embd.weight overflows
- Gemma 4 E2B (smaller): likely works (smaller PLE table)
- All existing models (Llama, Qwen, Gemma 3n): unaffected — no individual tensor > 2 GB
- Future large models: any model with a single tensor > 2 GB will hit this
Affected repos
SKaiNET/skainet-io (the fix)
SKaiNET-transformers (consumer — blocked on Gemma 4 E4B GGUF loading)
Summary
StreamingTensorInfo.nBytesisInt, which overflows for tensors whose bytesize exceeds
Int.MAX_VALUE(2,147,483,647 bytes ≈ 2 GB). This blocks loadingany GGUF model that contains a single tensor larger than ~2 GB, including
Gemma 4 E4B (
per_layer_token_embd.weightis ~1.4 GB in Q4_K_M but theintermediate
Long→Intcast overflows earlier in the arithmetic chain).Reproduction
Root cause
File:
skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.ktLine ~270 in the tensor size calculation:
numBlocks * typeSizeproduces aLongthat is silently truncated toInt.For the
per_layer_token_embd.weighttensor (262144 × 10752 elements inQ4_K_M), this exceeds
Int.MAX_VALUEand wraps negative.File:
skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.ktStreamingTensorInfo.nBytesis declared asInt:File:
skainet-io/skainet-io-core/src/jvmMain/kotlin/sk/ainet/io/JvmRandomAccessSource.ktreadAtonly acceptsIntlength:Proposed fix
1.
StreamingTensorInfo— changenBytestoLong2.
StreamingGGUFReader— keep computation inLong3.
StreamingGGUFReader.loadTensorData— handle large tensorsFor tensors > 2 GB,
readAtneeds aLonglength overload, or load inchunks. A simple approach:
This makes the overflow explicit rather than silent, and for most quantized
models (Q4_K_M, Q8_0) individual tensors are well under 2 GB.
4.
RandomAccessSource/JvmRandomAccessSource— addLonglength overload (optional)For full large-tensor support, add:
Using
FileChannel.map()for memory-mapped reads, or chunkedread()calls.Impact
per_layer_token_embd.weightoverflowsAffected repos
SKaiNET/skainet-io(the fix)SKaiNET-transformers(consumer — blocked on Gemma 4 E4B GGUF loading)