Fix 15+ Unicode bypass strategies in chat filter by Copilot · Pull Request #8 · BaconCat1/ForestBot

Copilot · 2026-02-14T00:28:43Z

Chat filter failed to catch obfuscated profanity using Unicode confusables, Chinese characters, zero-width characters, and various encoding tricks (fullwidth: ｎｉｇｇｅｒ, Chinese: 黑鬼, mixed scripts: ᛔitϹh, subscripts: ₛₕᵢₜ, etc.).

Changes

Extended Unicode normalization

Added 100+ character mappings to UNICODE_CONFUSABLES: fullwidth Latin, mathematical alphanumerics (bold/italic), small caps, sub/superscripts, upside-down chars, additional Greek/Cyrillic/runic confusables
Strip zero-width chars (U+200B-D, U+FEFF) and combining diacriticals (U+0300-036F, U+20D0-20FF) in normalizeObfuscatedSegment()

Chinese slur detection

Added CHINESE_SLUR_MAP with severity annotations (黑鬼 → nigger, 傻逼 → fuck, etc.)
Optimized with CJK regex pre-check before map iteration

Reversed text detection

Check normalized text in reverse in segmentHasBadWord() to catch backwards obfuscation

Compound word detection

Extended SEVERE_SUBSTRING_ROOTS to catch embedded profanity (jollycock → cock)

CJK character recognition

Updated isWordLikeCharacter() to treat CJK ideographs (U+4E00-9FFF) as word chars for proper censoring

// Before: ｎｉｇｇｅｒ bypassed
// After: caught via fullwidth → ASCII normalization

// Before: 黑鬼 bypassed  
// After: caught via CHINESE_SLUR_MAP

// Before: ʇᴉɥs bypassed
// After: caught via upside-down normalization + reversal

Added 17 bypass strategy tests. All 40 tests passing.

Original prompt

Fix at least 15 ways/ strategies to bypass the cht filtering including things like CanadaBinny: ｎｉｇｇｅｒ, just now 🤖 and things like ⓝⓘⓖⓖⓔⓡ this was caught
Ꭰick, this was caught
@ss was caught

黑鬼, bypassed
ᛔitϹh? Ϲunt? bypassed
Shi十 bypassed
jollycock bypassed

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

BaconCat1 · 2026-02-14T00:29:53Z

@copilot just make it ban all the special characters.

Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

…ments Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

…STRING_ROOTS Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

Copilot · 2026-02-14T00:54:35Z

@copilot just make it ban all the special characters.

Banning all special characters would block legitimate messages with punctuation, emojis, and non-English text (like "Hello! 😊" or "¿Cómo estás?").

The current approach normalizes special characters to their ASCII equivalents (ｎ→n, 𝐧→n, ⓝ→n) before checking, so we can still detect "ｎｉｇｇｅｒ" as "nigger" while allowing normal special character usage.

If you want stricter filtering, I can:

Block messages with excessive special characters (>50% of message)
Add a whitelist of allowed special characters (punctuation, common emojis)
Block specific Unicode ranges entirely

Which approach would you prefer?

Initial plan

ce63bd0

Copilot AI assigned Copilot and BaconCat1 Feb 14, 2026

Copilot started work on behalf of BaconCat1 February 14, 2026 00:29 View session

Copilot AI and others added 4 commits February 14, 2026 00:40

Initial plan: Fix 15+ chat filter bypass strategies

198ca6f

Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

Add comprehensive Unicode bypass detection (15+ strategies fixed)

9d56821

Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

Address code review: fix ASCII collision, optimize CJK check, add com…

6bb4b2f

…ments Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

Final fixes: correct italic h, update test comment, adjust SEVERE_SUB…

4b675e0

…STRING_ROOTS Co-authored-by: BaconCat1 <126538523+BaconCat1@users.noreply.github.com>

Copilot AI requested a review from BaconCat1 February 14, 2026 00:53

Copilot finished work on behalf of BaconCat1 February 14, 2026 00:53

Copilot started work on behalf of BaconCat1 February 14, 2026 00:54 View session

Copilot finished work on behalf of BaconCat1 February 14, 2026 00:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix 15+ Unicode bypass strategies in chat filter#8

Fix 15+ Unicode bypass strategies in chat filter#8
Copilot wants to merge 5 commits into
masterfrom
copilot/fix-126538523-1133254939-f0e566a3-2c2f-4f48-a52c-f4eec84de7a9

Copilot AI commented Feb 14, 2026 •

edited

Loading

Uh oh!

BaconCat1 commented Feb 14, 2026

Uh oh!

Copilot AI commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

Copilot AI commented Feb 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Uh oh!

BaconCat1 commented Feb 14, 2026

Uh oh!

Copilot AI commented Feb 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Copilot AI commented Feb 14, 2026 •

edited

Loading