Must Learn AI Security Part 21: Watermark Removal Attacks Against AI

Chapter 21

Oct 05, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Watermark Removal attack against AI?

A watermark removal attack against AI refers to the process of removing a unique identifier or watermark that is embedded in a digital image or video to protect its copyright or ownership. This attack can be carried out by using various techniques such as image processing algorithms or machine learning models to detect and remove the watermark from the image or video. This can result in unauthorized use or distribution of the copyrighted content. It is important to note that such activities are illegal and can result in legal consequences.

Types of Watermark Removal attacks

There are several types of watermark removal attacks that can be carried out against AI. Some of them are:

Image processing attacks: These attacks involve applying filters, transformations, or other image processing techniques to the watermarked image in order to remove the watermark.
Machine learning attacks: These attacks involve training machine learning models to recognize and remove watermarks from images or videos.
Adversarial attacks: These attacks involve adding noise or manipulating the input to the watermark detection algorithm so that it fails to detect the watermark.
Copy-move attacks: These attacks involve copying a portion of the watermarked image and pasting it onto another part of the same image, effectively covering up the watermark.
Blurring or masking attacks: These attacks involve blurring or masking the watermark in order to make it unreadable or hard to detect.

It is important to note that these attacks are unethical and illegal as they violate the intellectual property rights of the owners of the watermarked content.

How it works

A watermark removal attack against AI typically works by training a machine learning model to recognize and remove the watermark from the image or video. Here are the general steps for a watermark removal attack against AI:

Gather the watermarked images or videos: The first step is to gather the images or videos that contain the watermark that needs to be removed.
Analyze the watermark: The next step is to analyze the watermark and understand its structure, size, and placement in the image or video.
Train a machine learning model: Machine learning models can be trained to recognize the watermark and remove it from the image or video. The model is trained using a large dataset of watermarked images or videos.
Apply image processing techniques: Image processing techniques such as filtering, smoothing, or blurring can be applied to the watermarked image or video to remove the watermark.
Use adversarial attacks: Adversarial attacks can be used to manipulate the input to the watermark detection algorithm so that it fails to detect the watermark.
Apply copy-move attacks: Copy-move attacks can be used to copy a portion of the image that contains the watermark and paste it onto another part of the same image. This effectively covers up the watermark.
Apply blurring or masking techniques: Blurring or masking techniques can be applied to the watermark to make it unreadable or hard to detect.

Why it matters

The negative effects of a watermark removal attack against AI are mainly related to intellectual property rights and the financial losses that can be incurred by the owners of the copyrighted content. Some of the negative effects include:

Loss of revenue: Watermarks are used to protect the ownership and copyright of digital content such as images, videos, or software. If a watermark removal attack is successful, it can result in the unauthorized use and distribution of the copyrighted content, leading to a loss of revenue for the content owner.
Legal consequences: Watermark removal attacks are illegal and can result in legal consequences such as fines, penalties, or even imprisonment.
Decrease in the value of the content: If the watermark removal attack is successful and the content is distributed without the watermark, it can result in a decrease in the value of the content as it is no longer unique or original.
Reputation damage: If the copyrighted content is used without permission, it can damage the reputation of the content owner, especially if the content is used in a negative or inappropriate way.

The negative effects of a watermark removal attack against AI are significant and can have serious consequences for the content owner and the wider community.

Why it might happen

An attacker who carries out a watermark removal attack against AI gains unauthorized access to copyrighted content, which they can then use or distribute without permission. This can result in financial gain for the attacker, as they can sell or distribute the content without having to pay the owner for its use.

Additionally, an attacker may gain a competitive advantage by using the stolen content to create similar products or services without investing in research or development costs. This can give them an unfair advantage over their competitors and lead to greater profits.

However, it is important to note that watermark removal attacks are illegal and unethical, and can result in legal consequences for the attacker. The financial gain from such attacks is short-lived and can be quickly overshadowed by the potential legal and reputational damage that can be incurred.

Real-world Example

One real-world example of a watermark removal attack against AI is the DeepFakes phenomenon. DeepFakes are videos that have been manipulated using machine learning algorithms to insert a person's face into the video. These videos are often used to create fake news or to spread misinformation.

Initially, DeepFakes were created using software that was specifically designed to remove watermarks from images and videos. The software used machine learning algorithms to recognize and remove the watermark from the video, and then replaced it with the manipulated content.

As a result of this attack, many individuals and organizations have suffered reputational and financial damage. For example, a DeepFake video of former US President Barack Obama was created and spread on social media, leading to concerns about the potential use of such videos for political propaganda.

To combat this issue, researchers and developers have created tools to detect and prevent DeepFakes, which use watermarking techniques to protect the authenticity and ownership of digital content.

How to Mitigate

There are several ways to mitigate a watermark removal attack against AI:

Use robust watermarks: Watermarks that are difficult to remove or manipulate can make it more challenging for attackers to carry out a successful removal attack. Robust watermarks should be applied in such a way that they cannot be easily cropped or obscured.
Apply multiple watermarks: Applying multiple watermarks at different locations in the image or video can make it more challenging for attackers to remove all of them.
Use invisible watermarks: Invisible watermarks can be used to embed information into the content without affecting its visual appearance. This can make it more difficult for attackers to detect and remove the watermark.
Use detection tools: Detection tools can be used to detect and prevent watermark removal attacks. These tools use machine learning algorithms to recognize and identify watermarks in digital content.
Monitor and enforce copyright laws: Copyright laws should be enforced to ensure that individuals or organizations who violate intellectual property rights are held accountable.
Educate users: Educating users about the importance of watermarking and the risks associated with watermark removal attacks can help to prevent such attacks from occurring.

Mitigating watermark removal attacks requires a combination of technical and legal measures, and a commitment to protecting intellectual property rights.

How to monitor/What to capture

To detect a watermark removal attack against AI, the following should be monitored:

Image or video quality: The quality of the image or video can be an indicator of a watermark removal attack. If the image or video quality has been significantly altered, it may indicate that a watermark has been removed.
Image or video metadata: Image or video metadata can provide information about the origin and ownership of the content. If the metadata has been altered or removed, it may indicate a watermark removal attack.
Image or video comparison: Comparing the watermarked and non-watermarked versions of the image or video can help to identify if a watermark has been removed. Differences in the two versions may indicate that a watermark has been removed.
Machine learning model performance: If a machine learning model that is designed to detect watermarks is performing poorly or inconsistently, it may indicate a watermark removal attack.
Social media and online platforms: Social media and online platforms should be monitored for instances of copyrighted content being used without permission. This can help to identify instances of watermark removal attacks.

Monitoring image or video quality, metadata, comparison, machine learning model performance, and social media and online platforms can help to detect watermark removal attacks against AI.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]