Unmasking Authors: LLMs Achieve Breakthrough in Style-Based Authorship Verification
By Pablo Miralles-Gonz\'alez, Javier Huertas-Tato, Alejandro Mart\'in, David Camacho
Published on November 24, 2025| Vol. 1, Issue No. 1
Content Source
This is a curated briefing. The original article was published on cs.CL updates on arXiv.org.
Summary
This research proposes a novel unsupervised method for computational stylometry, specifically authorship attribution and verification, by leveraging the extensive pre-training and in-context learning capabilities of Large Language Models (LLMs). The approach uses LLM log-probabilities to measure "style transferability" between texts, effectively distinguishing writing style from topic—a common challenge in traditional supervised and contrastive methods. The study reports that this new method significantly outperforms current LLM prompting techniques and achieves higher accuracy than contrastively trained baselines, with performance scaling consistently with model size and offering flexible computational trade-offs for verification tasks.
Why It Matters
This research represents a significant leap forward in computational stylometry, with profound implications across various professional domains in the AI industry and beyond. Firstly, by effectively decoupling writing style from topical content, the proposed LLM-based method addresses a critical limitation of previous techniques, promising more robust and reliable authorship attribution. This breakthrough has immediate relevance for forensic linguistics, enabling more accurate plagiarism detection—crucial for academic integrity—and identity linking in sensitive contexts. As AI-generated text proliferates, the ability to discern nuanced human authorship patterns becomes an increasingly vital tool in combating misinformation and maintaining content authenticity. Secondly, for professionals developing AI safety and ethics solutions, this advance offers a powerful mechanism to verify the origin and authenticity of digital content, a cornerstone in building trust in an age of synthetic media. Thirdly, from an LLM development perspective, this work showcases a sophisticated and elegant application of LLM log-probabilities and in-context learning. It underscores the untapped potential of leveraging the intrinsic knowledge within pre-trained models for complex, 'one-shot' tasks, pushing the boundaries of what LLMs can achieve without extensive fine-tuning. This innovation could inspire new paradigms for leveraging LLMs in other pattern-recognition challenges, ultimately enhancing the capabilities and trustworthiness of AI systems in text analysis.