perf: image-heavy DOCX OOM is driven by eager full-image decode at import, before virtualization (complements #2879)

### What happened?

### Type
- [x] Bug / performance

### Summary
This complements #2879 (browser crashes on large DOCX). That issue attributes the
OOM to (1) the full ProseMirror tree in memory and (2) synchronous single-pass
layout — both **page-count** driven. For **image-heavy** documents there is a
third, often *dominant* driver that #2879 doesn't mention:

**All embedded images are eagerly decoded into the JS heap at import — before the
PM tree is built and before page virtualization can help.** The OOM predictor is
**decoded-bitmap memory ≈ Σ(width × height × 4)** across all images — not file
size, and not page count.

### Root cause (with evidence)
- In super-editor's `DocxZipper.getDocxData()`, the unzip loop iterates **every**
  `word/media/*` entry and, per image, eagerly builds **two** in-heap copies:
  a base64 data URI (`this.mediaFiles[name] = "data:...;base64,..."`) **and** an
  object URL (`this.media[name] = URL.createObjectURL(...)`). It is unconditional —
  no size/lazy gating (grep for `lazy|defer|limit|threshold` in that loop returns
  nothing). *(v1.41 source: `super-editor/src/editors/v1/core/DocxZipper.js`, the
  `word/media` branch of the `getDocxData` loop.)*
- Load order is `Editor.loadXmlData()` (decodes **all** media) → build PM doc →
  `mount()` (DOM). Decode completes **before** the PM tree exists and **before any
  `<img>` mounts**, so the ~5-page virtualization window cannot release it — the
  memory peak happens before the first page renders. PM image nodes store only the
  media *path*; `renderDOM` maps path → decoded URL at render time, so the decoded
  payload for **all** images stays resident regardless of how many pages are mounted.

### Why this matters (real-world data)
Three real Chinese bid documents (anonymized; decoded-bitmap = Σ pixels × 4):

| Doc | File size | Images | Σ pixels | Decoded bitmap | Result |
|-----|-----------|--------|----------|----------------|--------|
| A   | 77 MB     | 333    | 480 M    | **~1.92 GB**   | opens |
| B   | 58 MB     | 243    | 533 M    | **~2.13 GB**   | crashed on v1.41 (opens on v1.42, but with image-render glitches) |
| C   | 329 MB    | 854    | 2243 M   | **~9.0 GB**    | far beyond any tab budget |

Note the **inversion**: Doc A is *larger*, has *more* images and *more* pages than
Doc B, yet A opens and the *smaller* B crashed — because the only metric that orders
them correctly is total decoded pixels, and both sit right at a ~2 GB renderer
ceiling. For Doc C, even downsampling every image to ≤1600px long edge still leaves
~4.7 GB decoded — i.e. with hundreds of images, per-image size reduction can't get
under the ceiling; the **count** dominates.

**Empirical confirmation that image decode (not the PM tree) is the driver here:**
pre-processing the .docx to downsample only the embedded images — same media paths,
`document.xml` byte-identical — cuts decoded-bitmap memory ~2× and makes
previously-crashing files open, **without touching the PM tree or layout**. So for
image-heavy docs the bottleneck is media decode, separate from #2879's page/layout
root cause.

Expected

Importing an image-heavy DOCX should not hold every image's decoded bitmap in heap
at once. Image materialization should be bounded (ideally tied to the same
viewport/virtualization window the renderer already uses).

Possible directions (non-prescriptive)

- Lazy / per-page image decode: only build object URLs for images on pages near
the viewport; release off-screen ones — reuse the existing virtualization window.
- Stop double-holding each image: currently every image is kept as both a base64
data URI (~1.33× the bytes, as a giant string) and a blob/objectURL. Keeping only
URL.createObjectURL(blob) roughly halves the per-image heap and avoids huge strings.
- Optionally a documented import cap / downsample option for huge media payloads.

### Steps to reproduce

### Steps to reproduce
1. Generate an image-heavy .docx (script below; no real data — random-noise images):
   `npm i docx sharp && node repro-gen.mjs`
2. Load `image-heavy-repro.docx` in a SuperDoc editor (e.g. the v1.42 React demo).
3. Watch JS heap during import (DevTools → Memory / Performance). It climbs to
   multiple GB **before** the first page renders; at high image counts the tab OOMs.

<details><summary>repro-gen.mjs</summary>

```js
import sharp from 'sharp';
import { Document, Packer, Paragraph, ImageRun } from 'docx';
import { writeFileSync } from 'fs';
import { randomFillSync } from 'crypto';

const N = 200, W = 1600, H = 2200; // 200 full-page images ≈ 2.8 GB decoded
const noiseJpeg = async (w, h) => {
  const raw = Buffer.allocUnsafe(w * h * 3); randomFillSync(raw);
  return sharp(raw, { raw: { width: w, height: h, channels: 3 } }).jpeg({ quality: 85 }).toBuffer();
};
const children = [];
for (let i = 0; i < N; i++) {
  const jpg = await noiseJpeg(W, H);
  children.push(new Paragraph({ children: [new ImageRun({ type: 'jpg', data: jpg, transformation: { width: 600, height: 825 } })] }));
}
writeFileSync('image-heavy-repro.docx', await Packer.toBuffer(new Document({ sections: [{ children }] })));
console.log('Σ decoded ≈', (N * W * H * 4 / 1e9).toFixed(1), 'GB');
```
(Real photos behave the same with a smaller file — noise just makes decoded ≈ file size; lower N to stay near the ~2 GB cliff.)
</details>

### SuperDoc version

superdoc core v1.42.0 / @superdoc-dev/react v1.13.0

### Browser

Chrome

### Additional context

Environment

- superdoc (core) v1.42.0, @superdoc-dev/react v1.13.0
- Chrome (latest), macOS
- Related: #2879 (page-count root cause; this report adds the media-decode dimension)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: image-heavy DOCX OOM is driven by eager full-image decode at import, before virtualization (complements #2879) #3763

What happened?

Type

Summary

Root cause (with evidence)

Why this matters (real-world data)

Steps to reproduce

Steps to reproduce

SuperDoc version

Browser

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Doc	File size	Images	Σ pixels	Decoded bitmap	Result
A	77 MB	333	480 M	~1.92 GB	opens
B	58 MB	243	533 M	~2.13 GB	crashed on v1.41 (opens on v1.42, but with image-render glitches)
C	329 MB	854	2243 M	~9.0 GB	far beyond any tab budget

Uh oh!

perf: image-heavy DOCX OOM is driven by eager full-image decode at import, before virtualization (complements #2879) #3763

Description

What happened?

Type

Summary

Root cause (with evidence)

Why this matters (real-world data)

Steps to reproduce

Steps to reproduce

SuperDoc version

Browser

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions