Technology
Danish Kapoor
Danish Kapoor

Google made SynthID watermark tool open source

Google announced that it has made open source the watermark technology called SynthID, which it developed to enable easier identification of texts produced by artificial intelligence (AI). This technology is now available to developers through Google’s Responsible Producer AI Toolkit.

Pushmeet Kohli, vice president of Google DeepMind research, told MIT Technology Review, “Other AI developers will be able to use this technology to detect text emerging from their large language models (LLM), which will enable more developers to build AI responsibly.” “It will make it easier to do it,” he said.

Recently, the use of large language models to spread political disinformation, create non-consensual sexual content, and other harmful purposes has made watermark technologies even more important. For example, the state of California is discussing making the use of artificial intelligence watermarks mandatory, while the Chinese government enacted this requirement last year. However, these tools are still in development.

Introduced last August, SynthID adds an invisible watermark to the outputs produced by artificial intelligence, allowing these contents to be detected. The technology can work with various formats such as image, audio, video and text. According to Google, the text version of SynthID works by making the output slightly less likely to be detected by the human eye. In this way, it is aimed to use the text safely without being abused.

An LLM produces text one at a time in small units called tokens. These tokens can represent a single character, word, or part of a phrase. To generate a text string, the model predicts the next most likely token. These predictions are based on previous words and the probability scores assigned to each possible token.

For example, “My favorite tropical fruits are __.” In expression, LLM can fill this gap with tokens such as “mango”, “lychee”, “papaya” or “durian”, and each token is assigned a probability score. SynthID can adjust the probability scores of the tokens to be selected without compromising the quality, accuracy and creativity of the output.

This process is repeated throughout the generated text, resulting in dozens of adjusted probability scores in a sentence and hundreds in a page. The final pattern of probability scores adjusted by the model’s word preferences is considered the watermark.

Google integrated the system into Gemini

Google claims that the system is already integrated into the Gemini chatbot and that the watermarking process does not have a negative impact on the quality, accuracy, creativity or speed of the text. This technology, which can work even with texts as short as three sentences, can be effective even with texts that have been clipped, rewritten or changed. However, this system may have difficulty in short texts, rewritten content and answers to knowledge questions.

“SynthID is not a magic solution for detecting AI-generated content,” Google stated in a blog post published in May. “But it is an important building block for developing more reliable AI identification tools and can help millions of people make informed decisions about how to engage with AI-generated content.”

Danish Kapoor