Paper Markdown Extractor

A small Qt desktop GUI for converting academic PDFs into clean Markdown. It extracts:

Native PDF text with heading heuristics
Tables, using PyMuPDF table detection
Images into an adjacent images/ directory
Display equations cropped as broad row images for higher visual fidelity, including right-edge equation numbers, with text math blocks available as a fallback
Optional OCR fallback for scanned pages when Tesseract is installed
Integrated Markdown preview with local images, equation crops, and tables rendered from the generated output

The fast path is built for digitally generated academic papers. A 10-page text PDF should usually finish well under one minute on a modern laptop. Scanned PDFs are slower because OCR requires page rendering.

Setup

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt

Optional OCR engine for scanned PDFs:

brew install tesseract

Run

python -m paper_md_extractor

Choose a PDF, choose an output folder, then click Convert. The app previews the generated Markdown and writes:

<paper-name>.md
images/ with extracted page images

Screenshots

Notes

PDFs do not reliably encode equations, reading order, or table structure semantically. This app favors a fast deterministic extraction pipeline and keeps the output editable. For image-only scanned papers, enable OCR and expect slower conversion.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.github/workflows		.github/workflows
paper_md_extractor		paper_md_extractor
screenshots		screenshots
scripts		scripts
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
pysidedeploy.spec		pysidedeploy.spec
quick_text.py		quick_text.py
requirements.txt		requirements.txt
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Paper Markdown Extractor

Setup

Run

Screenshots

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Paper Markdown Extractor

Setup

Run

Screenshots

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages