fix: Remove overly broad NRP and PERSON entities from default PII detection (#47) #51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Purpose of this PR
The
NRPandPERSONPII entities use regex patterns that are way too broad and cause massive false positives in production. This especially breaks non-English language support and makes the pre-flight masking mode basically unusable.Current broken behavior
Why these patterns are problematic
NRP pattern:
/\b[A-Za-z]+ [A-Za-z]+\b/gPERSON pattern:
/\b[A-Z][a-z]+ [A-Z][a-z]+\b/gImpact
Solution
1. Remove from default entity list
Keep the patterns available but exclude them from defaults:
This makes the default config actually usable while maintaining backward compatibility.
2. Add deprecation warnings
When users explicitly include these entities, show a clear warning:
The warning only shows once per entity per session to avoid log spam.
3. Update documentation
Added clear documentation explaining:
Why this works long-term
Backward compatibility: No breaking changes. Users with explicit entity configurations continue to work exactly as before.
Better defaults: The default configuration now works for international applications without masking normal text.
Clear migration path: Users who actually need person name detection or national registration numbers have better alternatives:
Prevents future issues: Documentation and warnings educate users upfront about the limitations.
Testing
All tests pass (27/27 including 8 new tests):
Manual verification with examples from the issue
Files changed
src/checks/pii.ts- Core implementation with new defaults and deprecation warningssrc/__tests__/unit/checks/pii.test.ts- Added 8 comprehensive testsdocs/ref/checks/pii.md- Updated with migration guideCloses #47