Evaluating Authorship Verification Robustness Under Domain Shift and LLM-Based Rewriting
This repository contains the code and experimental framework for my MSc dissertation at the University of Sheffield. The project investigates the robustness of transformer-based authorship verification (AV) models under challenging real-world conditions: domain shift (e.g., news articles vs. tweets) and adversarial rewriting using large language models (LLMs).
- Domain Shift: Can authorship verification models reliably detect stylistic consistency across different genres when no adversarial rewriting is applied?
- Adversarial Robustness: How robust are these models to LLM-based adversarial rewriting (style obfuscation and impersonation) in same-domain texts?
- Combined Challenge: How do AV models perform when domain shift and adversarial attacks are combined?
- DistilBERT showed superior robustness across all scenarios despite being the smallest model
- Domain shift impact: DistilBERT maintained stability (1% drop), while RoBERTa showed catastrophic failure (18% drop)
- Adversarial attacks: Impersonation attacks reduced all models to near-random performance (ROC-AUC < 0.56)
- Combined challenges: When domain shift and impersonation were combined, performance approached random guessing
This project uses the CrossNews dataset as a Git submodule for experiments related to authorship verification and threat text analysis.
- Repository:
external/CrossNews - Description: A cross-source dataset for document-level fake news detection.
Citation:
M. Ma, “CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark”, AAAI, vol. 39, no. 23, pp. 24777-24785, Apr. 2025. GitHub: https://github.com/mamarcus64/CrossNews
Three transformer architectures were selected to represent different design trade-offs:
| Model | Description | Parameters | Context Length |
|---|---|---|---|
| DistilBERT | Lightweight, efficient baseline | 66M | 512 tokens |
| RoBERTa | Enhanced BERT with robust pretraining | 125M | 512 tokens |
| BigBird | Sparse attention for long sequences | 128M | 4096 tokens |
Two LLM-based attack strategies using Flan-T5-Large:
- Style Obfuscation: Untargeted paraphrasing to conceal authorial cues
- Style Impersonation: Targeted rewriting to mimic another author's style
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.8882 | 0.7999 | 0.8161 |
| RoBERTa | 0.8785 | 0.7946 | 0.8084 |
| BigBird | 0.8108 | 0.7321 | 0.7438 |
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.8711 | 0.7874 | 0.8006 |
| RoBERTa | 0.8703 | 0.6127 | 0.4880 |
| BigBird | 0.8149 | 0.6719 | 0.5891 |
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.5590 | 0.5406 | 0.5431 |
| RoBERTa | 0.5587 | 0.5391 | 0.5455 |
| BigBird | 0.5444 | 0.5316 | 0.5305 |
SHAP (SHapley Additive exPlanations) analysis revealed critical insights:
- Models often rely on platform-specific artifacts (hashtags, URLs) rather than genuine stylistic cues
- Punctuation patterns and function words dominate decisions under adversarial conditions
- Even correct predictions often stem from topic-related content rather than authorial style
All experiments use fixed random seeds (7, 1001, 1211) for reproducibility.
This project has been ethically reviewed and approved by the Ethics Committee of the University of Sheffield.