A new highly technical paper from researchers at Inria, Imatag and Meta AI proposes methods to make watermarking of large language models more robust and reliable. Watermarking embeds an imperceptible signal into AI-generated text that allows detecting its origin later. This can be useful for tracing misuse.
The paper makes three main contributions:
1. It provides statistical tests with strong theoretical guarantees on false positive rates. Previous watermarking techniques lacked reliability at low false positive rates, which is important for avoiding false accusations. The new methods ensure the detection threshold matches the desired false positive rate.
2. It compares two main watermarking techniques on standard NLP benchmarks. The results show watermarking causes small declines in performance, on the order of 1-3% across various tasks. This suggests the techniques could be practical for real applications without greatly harming the model’s capabilities.
3. It develops more advanced detection schemes for when the AI model is available during detection and for hiding multiple bits in the watermark. This enables identifying not just that text was AI-generated but also which model version created it!
Overall, the work moves watermarking closer to practical use by improving reliability and studying impact on real NLP tasks. Key benefits are avoiding false positives and enabling more applications like tracing text to specific users or model versions.
One expert commented the techniques could help mitigate risks from AI text generation like impersonation or fake news. However, watermarking alone does not prevent harmful use entirely. Continued research into AI governance will be important as these models become more widely available.