-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Fix: add lemma immunity in WordCoherencyRule to prevent false positiv… #11568
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Fix: add lemma immunity in WordCoherencyRule to prevent false positiv… #11568
Conversation
…es for inflected forms
WalkthroughAdds a public Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (1)
🧰 Additional context used🧬 Code graph analysis (1)languagetool-language-modules/en/src/main/java/org/languagetool/rules/en/WordCoherencyRule.java (2)
🔇 Additional comments (11)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
@coderabbitai generate docstrings |
✅ Actions performedInitiated docstring generation; will generate only if new commits exist. |
|
Note Docstrings generation - SUCCESS |
Docstrings generation was requested by @minizhiren. * #11568 (comment) The following files were modified: * `languagetool-language-modules/en/src/main/java/org/languagetool/rules/en/WordCoherencyRule.java`
🧩 Summary
This PR fixes a false positive in the English Word Coherency Rule where certain inflected forms of words triggered unwanted "mixed spelling variant" warnings.
🐛 Problem
The rule incorrectly flagged these word:
doggies
doggier
doggiest
as inconsistent with the base forms doggy / doggie.
During testing (EnglishTest.testLanguage), this caused an assertion failure in the coherency.txt consistency check.
⚙️ Root Cause
The rule compared surface tokens directly against the variant map
without considering that an inflected word’s lemma (its base form)
might itself be one of the allowed variants.
As a result, legitimate inflections (e.g., “doggies”) were incorrectly treated as mixed variant usages.
🔧 Fix
Added a lemma-based immunity check in WordCoherencyRule to skip reporting
when a token’s lemma belongs to the same variant set defined in coherency.txt.
For example, if doggy and doggie are coherent variants,
then their lemmas are now correctly excluded from triggering false alarms for doggies, doggier, etc.
if (!Collections.disjoint(lemmas, variants)) {
// lemma itself is one of the coherent variants → inflected form → skip
continue;
}
🧠 Additional Notes
I also checked another repository (english-pos-dict)
and found that there are multiple word pairs similar to doggie/doggy
that could potentially cause the same issue.
However, only doggie was actually included in coherency.txt.
✅ Scope
Modified file: languagetool-language-modules/en/WordCoherencyRule.java
No core or cross-language code was changed.
This fix only affects the English module.
🧪How to Reproduce
mvn edu.illinois:nondex-maven-plugin:2.2.1:nondex -Dtest=org.languagetool.rules.en.EnglishTest#testLanguage -DnondexMode=ONE -DnondexSeed=933178 -Dsurefire.failIfNoSpecifiedTests=false
Summary by CodeRabbit
New Features
Bug Fixes
Improvements