skainet-io: Int overflow in StreamingGGUFReader for tensors > 2GB

## Summary

`StreamingTensorInfo.nBytes` is `Int`, which overflows for tensors whose byte
size exceeds `Int.MAX_VALUE` (2,147,483,647 bytes ≈ 2 GB).  This blocks loading
any GGUF model that contains a single tensor larger than ~2 GB, including
Gemma 4 E4B (`per_layer_token_embd.weight` is ~1.4 GB in Q4_K_M but the
intermediate `Long` → `Int` cast overflows earlier in the arithmetic chain).

## Reproduction

```
gemma-4-E4B-it-Q4_K_M.gguf  (5.0 GB)
```

```
Exception in thread "main" java.lang.IllegalArgumentException:
  Length must be non-negative: -1982857216
    at sk.ainet.io.JvmRandomAccessSource.readAt(JvmRandomAccessSource.kt:30)
    at sk.ainet.io.gguf.StreamingGGUFReader.loadTensorData(StreamingGGUFReader.kt:84)
```

## Root cause

**File:** `skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt`

Line ~270 in the tensor size calculation:

```kotlin
val numBlocks = nElements / blockSize       // Long
(numBlocks * typeSize).toInt()              // ← overflow here
```

`numBlocks * typeSize` produces a `Long` that is silently truncated to `Int`.
For the `per_layer_token_embd.weight` tensor (262144 × 10752 elements in
Q4_K_M), this exceeds `Int.MAX_VALUE` and wraps negative.

**File:** `skainet-io/skainet-io-gguf/src/commonMain/kotlin/sk/ainet/io/gguf/StreamingGGUFReader.kt`

`StreamingTensorInfo.nBytes` is declared as `Int`:

```kotlin
data class StreamingTensorInfo(
    ...
    val nBytes: Int,        // ← should be Long
    ...
)
```

**File:** `skainet-io/skainet-io-core/src/jvmMain/kotlin/sk/ainet/io/JvmRandomAccessSource.kt`

`readAt` only accepts `Int` length:

```kotlin
fun readAt(position: Long, length: Int): ByteArray
```

## Proposed fix

### 1. `StreamingTensorInfo` — change `nBytes` to `Long`

```kotlin
// Before:
val nBytes: Int,

// After:
val nBytes: Long,
```

### 2. `StreamingGGUFReader` — keep computation in `Long`

```kotlin
// Before (line ~270):
(numBlocks * typeSize).toInt()

// After:
numBlocks * typeSize
```

### 3. `StreamingGGUFReader.loadTensorData` — handle large tensors

For tensors > 2 GB, `readAt` needs a `Long` length overload, or load in
chunks.  A simple approach:

```kotlin
public fun loadTensorData(tensor: StreamingTensorInfo): ByteArray {
    require(tensor.nBytes <= Int.MAX_VALUE) {
        "Tensor '${tensor.name}' is ${tensor.nBytes} bytes (> 2 GB). " +
        "Use chunked loading or memory-mapped I/O."
    }
    return source.readAt(tensor.absoluteDataOffset, tensor.nBytes.toInt())
}
```

This makes the overflow explicit rather than silent, and for most quantized
models (Q4_K_M, Q8_0) individual tensors are well under 2 GB.

### 4. `RandomAccessSource` / `JvmRandomAccessSource` — add `Long` length overload (optional)

For full large-tensor support, add:

```kotlin
fun readAt(position: Long, length: Long): ByteArray
```

Using `FileChannel.map()` for memory-mapped reads, or chunked `read()` calls.

## Impact

- **Gemma 4 E4B** (5 GB Q4_K_M): blocked — `per_layer_token_embd.weight` overflows
- **Gemma 4 E2B** (smaller): likely works (smaller PLE table)
- **All existing models** (Llama, Qwen, Gemma 3n): unaffected — no individual tensor > 2 GB
- **Future large models**: any model with a single tensor > 2 GB will hit this

## Affected repos

- `SKaiNET/skainet-io` (the fix)
- `SKaiNET-transformers` (consumer — blocked on Gemma 4 E4B GGUF loading)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

skainet-io: Int overflow in StreamingGGUFReader for tensors > 2GB #455

Summary

Reproduction

Root cause

Proposed fix

1. `StreamingTensorInfo` — change `nBytes` to `Long`

2. `StreamingGGUFReader` — keep computation in `Long`

3. `StreamingGGUFReader.loadTensorData` — handle large tensors

4. `RandomAccessSource` / `JvmRandomAccessSource` — add `Long` length overload (optional)

Impact

Affected repos

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

skainet-io: Int overflow in StreamingGGUFReader for tensors > 2GB #455

Description

Summary

Reproduction

Root cause

Proposed fix

1. StreamingTensorInfo — change nBytes to Long

2. StreamingGGUFReader — keep computation in Long

3. StreamingGGUFReader.loadTensorData — handle large tensors

4. RandomAccessSource / JvmRandomAccessSource — add Long length overload (optional)

Impact

Affected repos

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. `StreamingTensorInfo` — change `nBytes` to `Long`

2. `StreamingGGUFReader` — keep computation in `Long`

3. `StreamingGGUFReader.loadTensorData` — handle large tensors

4. `RandomAccessSource` / `JvmRandomAccessSource` — add `Long` length overload (optional)