Skip to content

PDF.load() throws ObjectParseError on encrypted linearized PDFs instead of returning isEncrypted: true, isAuthenticated: false` #69

@joeuy-dev

Description

@joeuy-dev

PDF.load() throws ObjectParseError on encrypted linearized PDFs instead of returning isEncrypted: true, isAuthenticated: false

Repo: https://github.com/LibPDF-js/core/issues
Version: @libpdf/core@0.3.4

Summary

When loading an encrypted PDF whose xref is stored as an encrypted object stream (common for linearized / PDF 1.5+ files), PDF.load() throws ObjectParseError: Invalid object stream index at entry 0: expected object number, got eof instead of returning a PDF instance with isEncrypted: true and isAuthenticated: false.

This makes it impossible to detect password protection through the documented API: the thrown error is the generic parse error, not a SecurityError, so callers cannot distinguish "encrypted, needs password" from "corrupt file."

Reproduction

import { PDF } from '@libpdf/core';
import { readFile } from 'node:fs/promises';

const bytes = await readFile('encrypted-linearized.pdf');
const pdf = await PDF.load(bytes); // throws

The PDF in question:

  • %PDF-1.7, linearized
  • Trailer contains /Encrypt 1431 0 R
  • Cross-reference is itself an /XRef stream encrypted under the file key

Stack trace

ObjectParseError: Invalid object stream index at entry 0: expected object number, got eof
    at ObjectStreamParser.parseIndex (libpdf_core.js:59991:48)
    at ObjectStreamParser.parse (libpdf_core.js:59978:23)
    at ObjectStreamParser.getObject (libpdf_core.js:60007:10)
    at Object.getObject (libpdf_core.js:60939:30)
    at ObjectRegistry.resolver (libpdf_core.js:68007:42)
    at ObjectRegistry.resolve (libpdf_core.js:57860:24)
    at walk (libpdf_core.js:64625:20)
    at PDFPageTree.load (libpdf_core.js:64639:5)
    at PDF.load (libpdf_core.js:68014:42)

What appears to happen

  1. PDF.load() parses the trailer and discovers /Encrypt.
  2. The empty-password attempt for the Standard security handler fails (the file requires a real user password).
  3. Without a valid file key, decrypting the object streams produces garbage.
  4. The page-tree walker (PDFPageTree.load) eagerly resolves refs that point into those streams, and the object-stream parser sees garbage bytes and throws.
  5. The throw escapes before the PDF instance is returned, so pdf.isEncrypted / pdf.isAuthenticated / pdf.getSecurity() are never reachable.

{ lenient: true } is the default (ParseOptions) but doesn't recover here. Passing credentials: "<wrong>" produces the same failure.

Expected behavior

When the trailer contains /Encrypt and authentication fails, PDF.load() should resolve with a PDF instance where:

  • isEncrypted === true
  • isAuthenticated === false

…and defer (or skip) eager parsing of objects that live in encrypted streams. Callers can then either prompt the user for a password and call PDF.load(bytes, { credentials }), or treat the file as protected and stop.

Alternatively, throw a SecurityError with code NOT_AUTHENTICATED / INVALID_PASSWORD so callers have a typed signal.

Current workaround

Catch the throw and scan the trailer bytes for the literal /Encrypt keyword, since the trailer is never inside an encrypted stream:

export async function isProtected(bytes: Uint8Array): Promise<boolean> {
  try {
    const pdf = await PDF.load(bytes);
    return pdf.isEncrypted && !pdf.isAuthenticated;
  } catch {
    const tail = bytes.subarray(Math.max(0, bytes.length - 8192));
    // search for "/Encrypt" in tail
    // ...
  }
}

This works but bypasses the library entirely for the detection path, which is what the API should be handling.

Sample file

Happy to share a redacted reproduction PDF privately — let me know the preferred channel.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions