Skip to content

Conversation

@donaldgray
Copy link
Member

Issue raised where rasterized image doesn't reflect what is shown when looking at the origin PDF through a standard viewer.

This appears to be due to pdf2image library using the media_box by default, where crop_box seems more appropriate.

From online article on PDFs:

The MediaBox is the largest page box in a PDF. The other page boxes can equal the size of the MediaBox but they cannot be larger.
The CropBox defines the region to which the page contents are to be clipped. Acrobat uses this size for screen display and printing.

use_cropbox is an available pdf2image option but it defaults to False (so uses mediabox). This change introduces a new envvar, PDF_RASTERIZER_USE_CROPBOX, to control this, maintaining default to avoid any unintented changes in behaviour.

@donaldgray donaldgray requested a review from fmcc July 18, 2025 15:59
Without this hitting No module named 'pkg_resources' error
@donaldgray donaldgray merged commit 7b92f10 into main Jul 21, 2025
1 check passed
@donaldgray donaldgray deleted the feature/cropbox branch July 21, 2025 08:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants