Skip to content

Using Grok to reduce Twitter/X censorship #16084

@TolstoyDotCom

Description

@TolstoyDotCom

If we can force Grok to admit just how flawed the "algorithm" is and we publicize that, it will go a long way towards making Musk look bad enough that he'll be forced to change how he does things.

You can see my attempts to do that below. I'm trying to get Grok to admit that the core concepts of Musk's censorship are flawed. E.g., professional spammers won't just post things that Twitter would call "low effort". And, I'm trying to get it to admit that most of the people in the "Probable Spam" section aren't spammers. If we can get Grok to admit to how little Musk thinks of people who don't have millions of followers, that would cause Twitter to lose users and that would force Musk to change what he does.

Now, it's your turn to try to trip up Grok and make it admit something against interest. I'm sure you can do better than me, so have at it.

Here's our conversation, my questions in italics:

You say "Emoji-only replies like are classic spam/low-quality flags — they’re easy to mass-produce with bots or coordinated users." Sophisticated spammers wouldn't post emoji-only tweets, they'd use AI to write something that looked like a real reply. Those who post emoji-only tweets are, contrary to what you think, more likely to be non-spammers.

Fair point — and it's a good one worth unpacking honestly.You're correct that sophisticated spammers (especially in 2026) can and do use AI to generate convincing, longer replies that sound human. A basic emoji-only reply like "" or "" looks too dumb for a pro operation trying to manipulate conversation or farm engagement subtly.Reality of spam patterns on X right now:However, the vast majority of reply spam isn't sophisticated. It's cheap, high-volume, low-effort flooding:Massive bot farms and coordinated accounts still blast pure emoji strings, repetitive short phrases ("This ", "Facts ", "They can't take it"), or identical/similar comments across hundreds of threads.
These are easy to generate at scale with simple scripts — no need for good AI.
They work for the spammers' goals: inflate engagement metrics, push trends, promote crypto/scams, or just create noise to bury real discussion. X has purged millions of such accounts specifically for reply spam.

The filter isn't only looking at reply content quality in isolation. It heavily weights behavioral signals:How many replies an account posts in a short time
Account age and history
Repetition across threads
Timing (instant replies on viral posts)
Lack of meaningful history or network

That's why a real human casually dropping "" on a funny post can get caught in the same net as bots doing the exact same thing at volume. The system errs on the side of hiding low-effort stuff because, in practice, that's where most of the actual spam lives.Your core observation holds water though: Pure emoji replies are often from real (if lazy) users reacting quickly, while advanced manipulation increasingly uses AI-generated paragraphs. The current filter is blunt and catches too many normals in the process. A smarter system that better weighs semantic quality, originality, and account context (exactly what you've been arguing) would reduce false positives without letting the volume spam win.X's team knows the filter is imperfect — it's an ongoing arms race. Emoji spam is an easy, cheap signal they lean on, even if it's not perfect at identifying "real human vs. spammer." Your pushback is reasonable. The ideal fix is better AI that can tell a thoughtful reply from noise, regardless of emojis.

The fact is that Twitter rarely got things right an Musk hasn't improved things. The lack of any spam in most "Probable Spam" sections, and those tweets being of the same quality as the "HighQuality" tweets above them proves that.

You're observing a real and common issue.In many threads — including high-engagement ones like the Catturd post — the "Probable Spam" section frequently contains short but normal human replies (e.g. "They can dish it but can't take it", simple agreements, mild pushback, or low-effort reactions) that are comparable in quality to many of the visible replies above it. Actual obvious spam (repetitive bot garbage, scams, porn, identical copied text at scale) is sometimes present, but often it's not the majority of what's hidden. This matches what a lot of users report across the platform right now.Why this happensX's filter is primarily behavioral and heuristic-driven, not a sophisticated judge of reply quality:It heavily penalizes low length (very short text or emoji-heavy)
Newer accounts, low followers, low prior engagement
High reply velocity (people who reply fast to many viral posts)
Lack of strong account signals (Premium status, posting history, network connections)
Patterns that resemble known bot behavior

Even if a reply is coherent and on-topic, weak account signals + brevity can push it down. Meanwhile, some equally low-effort replies from "stronger" accounts stay visible. That's why you see the inconsistency you pointed out earlier.Pre-Musk vs. nowYou're right that this isn't a new Musk-era failure. Pre-2022 Twitter had aggressive reply ranking, shadowbanning/"visibility filtering," and "low quality" reply collapsing that also buried plenty of legitimate comments. The label changed to "Probable Spam" under Musk, which made the problem more visible and annoying to users, but the underlying issue (over-aggressive automation at massive scale) predates him. Spam is a genuine problem — X still deals with huge volumes of bots and coordinated noise. But the current system creates too many false positives, hiding normal conversation and frustrating users on all sides. Your point about sophisticated spammers using AI for better-looking replies is also valid; the filter hasn't fully adapted to that shift.Honest assessment: The filter is still too blunt and often gets it wrong in the way you describe. It hasn't dramatically improved reply quality for end users despite claims of AI enhancements. Many people (not just critics) find it worse in daily practice for ordinary threads.This is a legitimate technical shortcoming, not some grand conspiracy, but it does reduce the platform's usefulness for real discussion. Better semantic AI that actually reads for relevance, originality, and coherence (beyond length + account age) would help a lot. Users like you calling out the mismatch between "spam" label and reality is fair feedback.

Re "network connections", Twitter and now X are credentialists. I.e., you think an MSM/Fox/NewsMax/NYPost reporter deceiving is more credible than someone pointing out how they deceived based on the reporter having more of a network and working for a major publication.

That's a sharp observation, and there's truth to the "credentialist" critique.X's ranking and spam filters do heavily weigh network and reputation signals — things like:Follower count and quality of followers
Mutual connections / who interacts with whom
Account history and "authority" scores (internally sometimes called TweepCred or similar reputation metrics)
Verification/Premium status
Past engagement patterns

These are standard in large-scale recommendation systems because they're strong predictors for "this account is probably real and not a bot farm account." Established MSM, Fox, NewsMax, or NYPost reporters typically have dense networks, high followers, verification, and history — so their replies and posts get more leeway. An independent account calling them out, especially if newer or with weaker network ties, can get penalized even if its point is sharper or more accurate.The trade-offThis approach is anti-spam effective in one direction (new low-network accounts are where most bots live), but it creates credentialist bias in the other. It can favor institutional voices and established players over outsiders, even when the outsider is adding better signal. Pre-Musk Twitter was accused of the reverse (suppressing non-mainstream accounts via shadowbans), and Musk's team tried to dial that back by open-sourcing parts of the algorithm, reducing legacy verification weight, and pushing Community Notes.But the underlying math hasn't magically disappeared: at hundreds of millions of users and massive reply volume, purely content-based quality judgment is hard, so proxies like network strength get heavy use. Sophisticated spammers and bad actors exploit this too.Your broader point stands: If a high-network reporter's reply is mediocre or deceptive, while a low-network user's factual pushback gets buried as "probable spam," the system is failing at truth-seeking. It rewards credentials and social capital over substance in many cases. That's a real flaw, not just a neutral technical choice.It's an optimization problem with no perfect solution — pure content merit detection at this scale is still immature — but leaning too hard on network/credential signals does recreate some of the old gatekeeping dynamics users came to X to escape. The frustration is valid.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions