Fix 15+ Unicode bypass strategies in chat filter#8
Draft
Copilot wants to merge 5 commits into
Draft
Conversation
Owner
|
@copilot just make it ban all the special characters. |
Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>
Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>
…ments Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>
…STRING_ROOTS Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>
Copilot
AI
changed the title
Fix at least 15 ways/ strategies to bypass the cht filtering including things like CanadaBinny: nigger, just now 🤖 and things like ⓝⓘⓖⓖⓔⓡ this was caught
Ꭰick, this was caught
@ss was caught
黑鬼, bypassed
ᛔitϹh? Ϲunt? bypassed
Shi十 bypassed
jollycock b...
Fix 15+ Unicode bypass strategies in chat filter
Feb 14, 2026
Author
Banning all special characters would block legitimate messages with punctuation, emojis, and non-English text (like "Hello! 😊" or "¿Cómo estás?"). The current approach normalizes special characters to their ASCII equivalents (n→n, 𝐧→n, ⓝ→n) before checking, so we can still detect "nigger" as "nigger" while allowing normal special character usage. If you want stricter filtering, I can:
Which approach would you prefer? |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Chat filter failed to catch obfuscated profanity using Unicode confusables, Chinese characters, zero-width characters, and various encoding tricks (fullwidth: nigger, Chinese: 黑鬼, mixed scripts: ᛔitϹh, subscripts: ₛₕᵢₜ, etc.).
Changes
Extended Unicode normalization
UNICODE_CONFUSABLES: fullwidth Latin, mathematical alphanumerics (bold/italic), small caps, sub/superscripts, upside-down chars, additional Greek/Cyrillic/runic confusablesnormalizeObfuscatedSegment()Chinese slur detection
CHINESE_SLUR_MAPwith severity annotations (黑鬼 → nigger, 傻逼 → fuck, etc.)Reversed text detection
segmentHasBadWord()to catch backwards obfuscationCompound word detection
SEVERE_SUBSTRING_ROOTSto catch embedded profanity (jollycock → cock)CJK character recognition
isWordLikeCharacter()to treat CJK ideographs (U+4E00-9FFF) as word chars for proper censoringAdded 17 bypass strategy tests. All 40 tests passing.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.