-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Bug Report
Description of the problem
When generating PDFs with more than 256 unique characters, PDFKit 0.17.2 generates a ToUnicode CMap with multiple bfrange entries (correctly split at 256-character boundaries), but the beginbfrange declaration is hardcoded to 1 instead of the actual number of ranges.
Impact:
- PDFs display correctly in all viewers
- Text copying works in WPS Office
- Text copying fails in Chrome/Edge (PDFium-based browsers) - produces garbled text even for numbers
- PDFium strictly validates PDF specifications and rejects the entire ToUnicode CMap when the count doesn't match
Root Cause:
In lib/font/embedded.js (or js/pdfkit.js in compiled version), the toUnicodeCmap() method correctly splits Unicode mappings into chunks of 256 characters, but hardcodes 1 beginbfrange instead of using ${ranges.length}.
Example:
When a PDF contains 377 unique characters:
- Code generates: 2 bfrange entries (0x0000-0x00ff and 0x0100-0x0178)
- CMap declares:
1 beginbfrange - PDFium detects mismatch and rejects the CMap
- Result: Text copying uses fallback encoding → garbled output
Related Issue:
This appears to be a regression introduced while fixing issue #1498 (256-character boundary problem). The fix correctly implemented chunking but forgot to update the count declaration.
Code sample
Minimal Reproduction
const PDFDocument = require('pdfkit');
const fs = require('fs');
// Create a PDF with more than 256 unique characters
const doc = new PDFDocument();
doc.pipe(fs.createWriteStream('test.pdf'));
// Register a font (SimSun or any Unicode font)
doc.registerFont('SimSun', './path/to/SimSun.ttf');
doc.font('SimSun')
.fontSize(12)
.text('测试文本:' + 'A'.repeat(300), 100, 100); // More than 256 chars
doc.end();Generated ToUnicode CMap (incorrect)
1 beginbfrange
<0000> <00ff> [<...>]
<0100> <0178> [<...>]
endbfrange
Problem: Declares 1 but has 2 entries.
Expected ToUnicode CMap (correct)
2 beginbfrange
<0000> <00ff> [<...>]
<0100> <0178> [<...>]
endbfrange
Fix
In lib/font/embedded.js, line ~2587 (or equivalent in compiled version):
Before:
cmap.end(`\
...
1 beginbfrange
${ranges.join('\n')}
endbfrange
...
`);After:
cmap.end(`\
...
${ranges.length} beginbfrange
${ranges.join('\n')}
endbfrange
...
`);Verification
- Generate PDF with >256 unique characters
- Open in Chrome
- Try to copy text
- Expected: Text copies correctly
- Actual: Text is garbled
To verify the CMap issue:
# Extract and decompress PDF streams
# Search for "beginbfrange" in decompressed content
# Count actual entries vs declared countYour environment
- pdfkit version: 0.17.2
- Node version: v18.18.2
- Browser version (if applicable):
- Chrome 120+ (PDFium)
- Edge 120+ (Chromium-based, PDFium)
- Operating System: macOS 25.2.0 (Darwin)
Additional Information
PDF Specification Reference
According to PDF specification (ISO 32000-1:2008), the number after beginbfrange must exactly match the number of bfrange entries that follow:
The number after
beginbfrangeindicates how manybfrangeentries follow. This number must match the actual count of entries.
Workaround
Temporary workaround: Patch node_modules/pdfkit/js/pdfkit.js:
- Find:
1 beginbfrange - Replace:
${ranges.length} beginbfrange
Suggested Fix
Change line ~2587 in lib/font/embedded.js:
- 1 beginbfrange
+ ${ranges.length} beginbfrangeThis ensures the count always matches the actual number of ranges generated.