Automatically detect and remove blank areas from images or PDFs, for reducing printing costs.
- Detect and remove long blank regions (by brightness threshold).[New - We may now disregard marginal scan edges! (in advanced option)]
- Split pages into content segments(pre line) and save them as individual images.
- Optional Word export: insert segments into a
.docxtemplate. - [New Update!]Automatic image width fitting for Word documents (python-docx), reducing the need for post-processing macros.
This project was developed on Windows and assumes Microsoft Word is available for macro injection features.
Python packages:
pip install PyMuPDF Pillow python-docx tkinterdnd2 pywin32pywin32is required only if you want the program to inject/run VBA macros in Word (Windows only).- If you only need image output or python-docx insertion, Word and
pywin32are optional.
- Install dependencies (see Requirements).
- Run the GUI:
python gui.pywOr on Windows you can double-click gui.pyw.
- Select a PDF file, single image, or a folder containing images.
- Tune parameters in Step 2 (threshold, DPI, min_height, blank_height).
- Choose a
.docxtemplate in Step 3 if you want Word output. - Start processing. Output files:
- Images / segments are saved under
_temp/<input>_segments. - If Word insertion is enabled, the final document is written to
__Output/output_<basename>.docx.
threshold(20–255): Brightness threshold for blank-line detection. Larger values are more conservative.dpi(72–600): Resolution used when converting PDF pages to images.min_height: Minimum segment height (px) to keep a segment.blank_height: Number of consecutive blank rows that separate paragraphs.lr_threshold_enabled: Enable left/right margin trimming for blank detection only.lr_threshold_percent: Percent width on each side to ignore when detecting blanks.no_insert_word: When set, only images are saved and Word insertion is skipped.
- The app can optionally inject and run a VBA macro (from the
_Macrosfolder) after producing the.docxfile. That requires Microsoft Word andpywin32. - For macro injection to work, enable Trust access to the VBA project object model in Word:
File → Options → Trust Center → Trust Center Settings → Macro Settings → Trust access to the VBA project object model. - Note: the program now attempts to auto-fit images to the document/column width using
python-docx. The macro remains available as an optional post-processing step but is no longer strictly required for common cases.
- If Word macro injection fails with an access error, verify the VBA trust setting described above.
- If Word is not installed or
pywin32is missing, use the Advanced optionDo not insert Wordto only save images. - For scanned documents with slanted pages, run a deskew/preprocessing step first.
MIT License
- This version introduces batch image insertion, automatic width-fit using
python-docx, and progress throttling to improve performance and UX. - Contributions and bug reports are welcome.

