AI toxicity harder to fake than intelligence

Image © Arstechnica

A cross-platform study introduces a computational Turing test, finding AI replies remain detectable with 70–80% accuracy across nine models and three social platforms.

November 7, 2025

Researchers from the University of Zurich, the University of Amsterdam, Duke University, and New York University carried out a cross-platform study to assess how easy it is to tell AI-generated replies from human ones on social media. They introduced what they call a computational Turing test, using automated classifiers rather than human judgments, and found detection accuracy between 70% and 80% across nine open-weight language models responding on X/Twitter, Bluesky, and Reddit.

In their approach, real posts from real users were used to prompt nine models to generate replies. The team then employed classifiers and linguistic analysis to identify features that reliably separate machine-produced text from content written by humans.

Even after trying calibration steps, the researchers report that AI outputs remain distinguishable, especially by their affective tone and emotional expression. They tested strategies ranging from simple prompting to fine-tuning, but deeper emotional cues persisted as telltale signs of machine authorship.

The study lists models including Llama 3.1 8B and 70B variants, Mistral 7B, Qwen 2.5 7B Instruct, Gemma 3 4B Instruct, and others, with DeepSeek-R1-Distill-Llama-8B and Apertus-8B-2509 among the tested systems. Notably, instruction-tuned variants did not consistently yield more human-like text; in some cases they were worse than their base counterparts. And larger models did not universally improve deception; in several comparisons, smaller models rivaled or outperformed larger ones.

Platform differences also emerged: the AI was hardest to distinguish from humans on Twitter/X, where classifier accuracy was lowest, while Reddit replies were more readily identified as AI-generated. Bluesky lay in between. The results suggest that current models still struggle to reproduce spontaneous negativity and unscripted emotion, underscoring a fundamental tension between making text sound natural and making it harder to spot as machine-produced.

Overall, the authors argue that stylistic mimicry and semantic realism may be at odds in present architectures, meaning AI-generated text remains distinctly artificial even with optimization efforts.

This work challenges the assumption that more sophisticated optimization yields more human-like output, reinforcing the value of automated detection methods for maintainers of social platforms and for researchers studying AI behavior.

Arstechnica