This repository presents our approach to watermarking language model outputs using dynamically constructed system instructions generated by the Prompting LM. The figure below illustrates our full pipeline, highlighting the interaction between user requests, system instructions, the Marking LM, and the Detecting LM.
The fixed prompt used to guide the Prompting LM in generating system instructions is shown below. This template ensures consistent instruction generation while allowing for diverse lexical, semantic, or structural watermarking strategies.
The following screenshot summarizes our main quantitative findings, including detection accuracy under fine-tuning and distillation. Our method achieves high robustness even under model transformations, demonstrating the reliability and adaptability of the watermark signal.
@ARTICLE{11146861,
author={Dasgupta, Agnibh and Tanvir, Abdullah All and Zhong, Xin},
journal={IEEE Transactions on Artificial Intelligence},
title={Watermarking Language Models through Language Models},
year={2025},
volume={},
number={},
pages={1-10},
keywords={Watermarking;Adaptation models;Robustness;Training;Large language models;Intellectual property;Closed box;Tuning;Context modeling;Codes;Content authentication;instruction control;large language models;prompt engineering;robust watermarking},
doi={10.1109/TAI.2025.3605117}
}📌 A. Dasgupta, A. A. Tanvir and X. Zhong, "Watermarking Language Models through Language Models," in IEEE Transactions on Artificial Intelligence, doi: 10.1109/TAI.2025.3605117.