Skip to content

cent664/LLMWM

Repository files navigation

Watermarking Language Models through Language Models


🔍 Project Overview

This repository presents our approach to watermarking language model outputs using dynamically constructed system instructions generated by the Prompting LM. The figure below illustrates our full pipeline, highlighting the interaction between user requests, system instructions, the Marking LM, and the Detecting LM.

Overview of the proposed scheme:

overview

💬 Fixed Prompt for the Prompting Language Model:

The fixed prompt used to guide the Prompting LM in generating system instructions is shown below. This template ensures consistent instruction generation while allowing for diverse lexical, semantic, or structural watermarking strategies. fixed_p

📊 Results Snapshot:

The following screenshot summarizes our main quantitative findings, including detection accuracy under fine-tuning and distillation. Our method achieves high robustness even under model transformations, demonstrating the reliability and adaptability of the watermark signal. Capture

📚 Citation:

@ARTICLE{11146861,
  author={Dasgupta, Agnibh and Tanvir, Abdullah All and Zhong, Xin},
  journal={IEEE Transactions on Artificial Intelligence}, 
  title={Watermarking Language Models through Language Models}, 
  year={2025},
  volume={},
  number={},
  pages={1-10},
  keywords={Watermarking;Adaptation models;Robustness;Training;Large language models;Intellectual property;Closed box;Tuning;Context modeling;Codes;Content authentication;instruction control;large language models;prompt engineering;robust watermarking},
  doi={10.1109/TAI.2025.3605117}
}

📌 A. Dasgupta, A. A. Tanvir and X. Zhong, "Watermarking Language Models through Language Models," in IEEE Transactions on Artificial Intelligence, doi: 10.1109/TAI.2025.3605117.

About

Implementation of 'Watermarking Language Models through Language Models'

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published