Skip to content

Conversation

@PeterStaar-IBM
Copy link
Member

@PeterStaar-IBM PeterStaar-IBM commented Jan 12, 2026

The Problem

Your original change prioritized the encoding (MacRoman, WinAnsi, Standard, etc.) over the base font's built-in character mapping for ALL cases. This broke character resolution for fonts where the encoding was not explicitly specified in the PDF but defaulted to STANDARD.

For example, character code 16 in font F17 should map to ( using the font's built-in encoding, but your change was forcing it through StandardEncoding where character 16 is undefined, resulting in GLYPH<16>.

The Solution

The fix implements the correct PDF font encoding resolution order:

  1. Differences array (highest priority - explicit character remapping)
  2. ToUnicode CMap (if present)
  3. Explicit encoding (if /Encoding or /BaseEncoding was specified in the PDF) ← Only when explicitly specified
  4. Font's built-in encoding (for standard fonts when no explicit encoding)
  5. Fallback to standard encoding

Changes Made

  1. Added has_explicit_encoding flag to track whether encoding was found in the PDF
  2. Updated init_encoding() in src/v2/pdf_resources/page_font.h:650 to set this flag
  3. Updated get_correct_character() in src/v2/pdf_resources/page_font.h:493 to only prioritize encoding when it was explicitly specified

The test suite now passes completely, correctly handling both cases:

  • Fonts with explicit encodings use the specified encoding
  • Fonts without explicit encodings use their built-in character mappings

Fixes:

Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@mergify
Copy link

mergify bot commented Jan 12, 2026

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

  • title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert)(?:\(.+\))?(!)?:

@github-actions
Copy link
Contributor

github-actions bot commented Jan 12, 2026

DCO Check Passed

Thanks @PeterStaar-IBM, all your commits are properly signed off. 🎉

@PeterStaar-IBM PeterStaar-IBM changed the title feat: updated the font-parsing fix: updated the font-parsing Jan 12, 2026
@PeterStaar-IBM PeterStaar-IBM marked this pull request as draft January 12, 2026 16:16
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
Signed-off-by: Peter Staar <taa@zurich.ibm.com>
@PeterStaar-IBM PeterStaar-IBM merged commit ec6149e into main Jan 13, 2026
38 of 44 checks passed
@PeterStaar-IBM PeterStaar-IBM deleted the fix/adopt-new-fontparser branch January 13, 2026 08:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants