Skip to content

Incorrect beginbfrange count declaration causes PDFium to reject ToUnicode CMap, resulting in garbled text when copying #1659

@markbruce

Description

@markbruce

Bug Report

Description of the problem

When generating PDFs with more than 256 unique characters, PDFKit 0.17.2 generates a ToUnicode CMap with multiple bfrange entries (correctly split at 256-character boundaries), but the beginbfrange declaration is hardcoded to 1 instead of the actual number of ranges.

Impact:

  • PDFs display correctly in all viewers
  • Text copying works in WPS Office
  • Text copying fails in Chrome/Edge (PDFium-based browsers) - produces garbled text even for numbers
  • PDFium strictly validates PDF specifications and rejects the entire ToUnicode CMap when the count doesn't match

Root Cause:
In lib/font/embedded.js (or js/pdfkit.js in compiled version), the toUnicodeCmap() method correctly splits Unicode mappings into chunks of 256 characters, but hardcodes 1 beginbfrange instead of using ${ranges.length}.

Example:
When a PDF contains 377 unique characters:

  • Code generates: 2 bfrange entries (0x0000-0x00ff and 0x0100-0x0178)
  • CMap declares: 1 beginbfrange
  • PDFium detects mismatch and rejects the CMap
  • Result: Text copying uses fallback encoding → garbled output

Related Issue:
This appears to be a regression introduced while fixing issue #1498 (256-character boundary problem). The fix correctly implemented chunking but forgot to update the count declaration.

Code sample

Minimal Reproduction

const PDFDocument = require('pdfkit');
const fs = require('fs');

// Create a PDF with more than 256 unique characters
const doc = new PDFDocument();
doc.pipe(fs.createWriteStream('test.pdf'));

// Register a font (SimSun or any Unicode font)
doc.registerFont('SimSun', './path/to/SimSun.ttf');

doc.font('SimSun')
   .fontSize(12)
   .text('测试文本:' + 'A'.repeat(300), 100, 100); // More than 256 chars

doc.end();

Generated ToUnicode CMap (incorrect)

1 beginbfrange
<0000> <00ff> [<...>]
<0100> <0178> [<...>]
endbfrange

Problem: Declares 1 but has 2 entries.

Expected ToUnicode CMap (correct)

2 beginbfrange
<0000> <00ff> [<...>]
<0100> <0178> [<...>]
endbfrange

Fix

In lib/font/embedded.js, line ~2587 (or equivalent in compiled version):

Before:

cmap.end(`\
...
1 beginbfrange
${ranges.join('\n')}
endbfrange
...
`);

After:

cmap.end(`\
...
${ranges.length} beginbfrange
${ranges.join('\n')}
endbfrange
...
`);

Verification

  1. Generate PDF with >256 unique characters
  2. Open in Chrome
  3. Try to copy text
  4. Expected: Text copies correctly
  5. Actual: Text is garbled

To verify the CMap issue:

# Extract and decompress PDF streams
# Search for "beginbfrange" in decompressed content
# Count actual entries vs declared count

Your environment

  • pdfkit version: 0.17.2
  • Node version: v18.18.2
  • Browser version (if applicable):
    • Chrome 120+ (PDFium)
    • Edge 120+ (Chromium-based, PDFium)
  • Operating System: macOS 25.2.0 (Darwin)

Additional Information

PDF Specification Reference

According to PDF specification (ISO 32000-1:2008), the number after beginbfrange must exactly match the number of bfrange entries that follow:

The number after beginbfrange indicates how many bfrange entries follow. This number must match the actual count of entries.

Workaround

Temporary workaround: Patch node_modules/pdfkit/js/pdfkit.js:

  • Find: 1 beginbfrange
  • Replace: ${ranges.length} beginbfrange

Suggested Fix

Change line ~2587 in lib/font/embedded.js:

- 1 beginbfrange
+ ${ranges.length} beginbfrange

This ensures the count always matches the actual number of ranges generated.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions