LLMs Unmask Pseudonymous Users at Scale

Image © Arstechnica

A new study shows large language models can link burner social-media accounts to real people across platforms with notable recall and precision, raising urgent privacy concerns.

March 3, 2026

A new study reveals that large language models (LLMs) can help link pseudonymous or burner accounts to real individuals across multiple social-media platforms, challenging the notion that pseudonymity provides strong privacy protection. The researchers report recall rates as high as 68% and precision up to 90%, benchmarks that surpass traditional, labor-intensive deanonymization methods.

To test their approach, the team used several public data sources. One dataset paired posts from Hacker News with LinkedIn profiles by cross-referencing clues present in user bios. They then removed identifying details from the text and ran a large language model to infer identities. A Netflix micro‑identities dataset—featuring personal preferences, recommendations, and transaction histories—provided a second line of analysis, while a dataset built around a Reddit history supplied a third angle.

The researchers describe their framework as a move beyond classic, citation-based attacks. In their experiments, they observed that AI agents could perform deanonymization with substantial effectiveness on free-form text, not just structured data. The results highlight a real privacy risk: inexpensive, scalable identity inference could enable doxxing or targeted profiling across platforms.

“The average online user has operated under an implicit threat model where pseudonymity offers protection,” the study authors note. “LLMs disrupt this assumption by enabling more scalable deanonymization that requires less manual labor.”

Given the implications, the authors propose mitigations such as rate limits on data access, robust detection of automated scraping, and guardrails in AI systems to curb misuse. They also emphasize practical steps for individuals, like limiting what is posted publicly and regularly deleting old content to reduce residual identifiers. The research underscores a broader call to rethink privacy protections in an era of increasingly capable AI.

Arstechnica