Skip to content

Notion skill cannot read PDF attachment contents #345

@sentry-junior

Description

@sentry-junior

When querying Notion compliance pages (e.g. SOC 2 & HIPAA, Policy Central), PDF attachments hosted in Notion cannot be read or summarized. The Notion MCP tools return file reference metadata but not the document content, so questions that require information inside those PDFs hit a dead end.

  • Notion search and fetch work for page text, but PDFs attached to pages (e.g. SOC 2 Type 2 reports, policy summaries) are opaque
  • This blocks use cases like auditing compliance controls, answering security policy questions, or summarizing audit findings without a human manually reading the PDF
  • Affects any Notion page that stores key information as PDF attachments rather than inline page content

Example: The SOC2 & HIPAA page has yearly SOC 2 reports as PDF files — Junior can see the file names but cannot extract or summarize their contents.

Options:

  • Download and parse PDFs via the Notion file URL, then feed extracted text to the query pipeline
  • Use an OCR/PDF extraction tool as a post-processing step after Notion fetch
  • Investigate whether Notion's API exposes any content extraction for file blocks

Action taken on behalf of David Cramer.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions