Must Learn AI Security Part 23: Blurring or Masking Attacks Against AI

Chapter 23

Oct 19, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Blurring or Masking attack against AI?

A blurring or masking attack against AI refers to a type of adversarial attack where an attacker manipulates input data, typically images or videos, by applying a blurring or masking effect with the intention of deceiving or compromising an AI system's performance. The goal of such an attack is to make it difficult for the AI system to accurately recognize or classify the altered input data while still maintaining the original content's recognizability to human observers.

Types of Blurring or Masking attacks

There are several types of blurring and masking attacks against AI, particularly targeting computer vision systems. These attacks involve manipulating input data to deceive or degrade the AI system's performance. Here are some common types of blurring and masking attacks:

Gaussian blur attack: In this attack, the input image is convolved with a Gaussian kernel, causing the image to become blurred. This makes it difficult for AI systems, such as facial recognition or object detection models, to recognize or classify objects accurately.
Motion blur attack: This technique simulates the effect of motion blur, typically caused by camera movement or fast-moving objects. The attacker applies a directional blur to the input image, which can cause AI systems to misclassify or fail to detect objects or features.
Median filtering attack: By applying a median filter to the input image, the attacker can blur or distort the image, preserving some edges while removing details. This can negatively impact the performance of AI systems that rely on edge detection or fine-grained features.
Noise addition attack: The attacker adds noise, such as Gaussian noise or salt-and-pepper noise, to the input image. This can mask or obscure objects and features, making it difficult for AI systems to recognize or classify them correctly.
Occlusion attack: In this type of attack, the input image is altered by occluding parts of the object or scene with other objects, patterns, or colors. This can deceive AI systems, such as facial recognition systems or object detectors, by hiding critical features or making it difficult to identify the target object.
Patch or sticker attack: The attacker places specially designed patches or stickers on objects, which can cause AI systems to misclassify or fail to recognize the object. For example, placing a specific pattern on a stop sign might cause an AI-powered autonomous vehicle to misidentify the sign as a different traffic sign.
Adversarial perturbation attack: The attacker creates an adversarial example by applying small, imperceptible perturbations to the input image that are specifically designed to cause the AI system to misclassify or make incorrect predictions.

How it works

A blurring or masking attack against AI works by manipulating input data, such as images or videos, in a way that hinders the AI system's ability to recognize or classify objects or features accurately while still maintaining the content's recognizability to human observers. These attacks primarily target computer vision systems like convolutional neural networks (CNNs) used in object detection, facial recognition, and image classification tasks.

Here's a general overview of how a blurring or masking attack works:

Identify target AI system: The attacker first identifies the target AI system, such as a facial recognition system or an object detection model, and gains an understanding of its functioning, vulnerabilities, and potential weaknesses.
Select attack technique: Based on the target AI system, the attacker selects an appropriate blurring or masking technique, such as Gaussian blur, motion blur, median filtering, noise addition, occlusion, or patch/sticker attack. The chosen technique aims to deceive the AI system by altering input data in a way that makes it difficult for the system to recognize or classify objects or features correctly.
Generate manipulated input: The attacker applies the chosen blurring or masking technique to the original input data (e.g., image or video). The manipulated input retains its recognizability to human observers but is altered in a way that adversely affects the AI system's performance.
Inject manipulated input: The attacker introduces the manipulated input into the target AI system, either during the training phase (data poisoning) or the inference phase (adversarial attack).
Observe system performance degradation: If the attack is successful, the AI system's performance degrades, leading to misclassifications, incorrect predictions, or failure to detect objects or features. This can have various consequences depending on the application, such as allowing unauthorized access in a facial recognition system or causing an autonomous vehicle to misinterpret traffic signs.

Why it matters

A blurring or masking attack against AI can have various negative effects on the targeted AI system and its applications. Some of the primary adverse consequences include:

Degraded performance: The AI system's performance may be significantly degraded due to misclassifications, incorrect predictions, or failure to detect objects or features in the manipulated input data.
Security risks: In security-sensitive applications like facial recognition or biometric authentication systems, a successful blurring or masking attack can lead to unauthorized access, false identification, or evasion of security measures.
Safety concerns: In safety-critical systems, such as autonomous vehicles or medical image analysis, a blurring or masking attack could cause the AI system to make incorrect decisions or misinterpretations, potentially leading to accidents or incorrect diagnoses.
Loss of trust: A successful blurring or masking attack can undermine users' trust in the AI system and its reliability, potentially leading to reduced adoption and utilization of AI technologies in various domains.
Financial and reputational damage: Organizations deploying AI systems may suffer financial losses or reputational damage due to the negative consequences of a successful blurring or masking attack. This can include costs related to addressing security breaches, rectifying system vulnerabilities, and loss of customer trust.
Legal and regulatory implications: If a blurring or masking attack causes an AI system to produce biased, discriminatory, or harmful outcomes, the organization responsible for deploying the system may face legal and regulatory consequences.

Why it might happen

An attacker can gain various advantages and achieve different objectives by executing a blurring or masking attack against AI systems. Some of the primary gains for an attacker include:

Evasion: The attacker can evade detection or recognition in AI-based systems such as facial recognition, biometric authentication, or object detection. This can help the attacker bypass security measures, access restricted areas, or evade surveillance.
Disruption: By degrading the performance of an AI system, the attacker can cause disruptions in its operation, leading to misclassifications, incorrect predictions, or failure to detect objects or features. This can negatively impact the AI system's users and the organization deploying the system.
Exploitation: In some cases, the attacker may use a blurring or masking attack as a stepping stone to exploit other vulnerabilities in the AI system or the infrastructure it is deployed on, potentially gaining unauthorized access to sensitive data or causing further damage.
Sabotage: The attacker may aim to sabotage the AI system's operation, either to harm the organization using the system or to undermine users' trust in the AI technology, thereby reducing its adoption and utilization.
Competitive advantage: In some scenarios, the attacker may be a competitor seeking to gain an advantage by demonstrating the weaknesses or vulnerabilities of the targeted AI system, potentially discrediting the system's developers or providers and promoting their own alternative solutions.
Proof of concept or research: Some attackers may execute blurring or masking attacks to demonstrate the feasibility of their attack techniques, either for personal satisfaction, notoriety, or as part of security research.

Real-world Example

A real-world example of a blurring attack against AI is the "adversarial glasses" attack demonstrated by researchers from Carnegie Mellon University in 2016. The attack targeted facial recognition systems, which are widely used in security, surveillance, and authentication applications.

In this attack, the researchers designed eyeglasses with specific adversarial patterns printed on the frames. These adversarial patterns were crafted to deceive facial recognition systems into misclassifying the person wearing the glasses. The researchers showed that when a person wore the adversarial glasses, the facial recognition system could either misidentify them as a different individual or fail to recognize them as a person in the dataset.

The adversarial glasses attack can be considered a masking attack since it involved adding a pattern to the input image (the person's face) that disrupted the AI system's ability to correctly recognize the individual. This attack demonstrated the vulnerability of facial recognition systems to adversarial manipulation and highlighted the need for developing more robust and resilient AI models.

How to Mitigate

Mitigating blurring or masking attacks against AI involves enhancing the robustness and resilience of AI systems, particularly computer vision models. Various strategies can be employed to achieve this:

Adversarial training: Train the AI model with adversarial examples, including images with various blurring or masking techniques applied. This helps the model learn to recognize and correctly classify objects or features even in manipulated inputs.
Data augmentation: Expand the training dataset by including variations of the original images, such as those with different types of blurring, noise, occlusions, or other transformations. This can improve the model's generalization capabilities and make it more resistant to attacks.
Input preprocessing: Apply preprocessing techniques to the input data, such as noise reduction, image sharpening, or contrast enhancement, to counteract the effects of blurring or masking attacks before the data is fed into the AI model.
Ensemble learning: Combine the outputs of multiple AI models, possibly using different architectures or training techniques, to improve the overall system's robustness and resilience against attacks.
Model regularization: Employ regularization techniques, such as L1 or L2 regularization, during the training process to reduce the model's complexity and prevent overfitting, making it more robust against adversarial attacks.
Defense distillation: Train a more robust model by using the output probabilities of a previously trained model as "soft" targets. This can help the new model to learn more generalizable features and become more resistant to adversarial attacks.
Outlier detection: Monitor the AI system for unusual inputs or unexpected behavior that may indicate a potential attack. Implementing a separate model or mechanism to identify outliers or suspicious inputs can help take appropriate countermeasures before the attack affects the system's performance.
Continuous learning and updates: Regularly update the AI model with new data, incorporating instances of known attacks and adversarial examples, to keep the model up-to-date and improve its resistance against new or evolving attack techniques.

By combining these strategies, organizations can develop more robust and resilient AI systems that can better withstand the effects of blurring or masking attacks and maintain their performance in the presence of manipulated inputs.

How to monitor/What to capture

Detecting a blurring or masking attack against AI requires monitoring various aspects of the AI system's operation and input data. Here are some key aspects to monitor:

Input data anomalies: Watch for unusual or unexpected patterns in the input data, such as excessive blurring, noise, or occlusions, which could indicate an attempt to manipulate the data to deceive the AI system.
Model performance metrics: Monitor performance metrics like accuracy, precision, recall, and false positive/negative rates. Sudden or unexplained degradation in these metrics may signal an ongoing attack.
Classification confidence: Keep an eye on the AI model's confidence scores for its predictions. If there's a noticeable increase in low-confidence predictions or misclassifications, it could be a sign of an attack.
Output distribution: Analyze the distribution of the AI model's outputs or predictions over time. A significant deviation from the expected distribution may indicate that the model is being fed manipulated inputs.
System logs and usage patterns: Examine system logs and usage patterns to identify unusual activity, such as unauthorized access, repeated attempts with similar inputs, or sudden spikes in the frequency of specific types of inputs, which could indicate an attacker is probing the system.
Comparison with baseline: Establish a baseline of normal system behavior and performance, and compare current operation against this baseline to detect potential anomalies or signs of an attack.
Outlier detection: Implement outlier detection mechanisms to identify suspicious inputs or data points that deviate significantly from the expected patterns or distributions, which could be indicative of an attack.
User feedback: Encourage users to report any unexpected behavior or anomalies they encounter while using the AI system, as this can help identify potential attacks that may not be immediately apparent through automated monitoring.

By closely monitoring these aspects and implementing a comprehensive detection strategy, organizations can improve their chances of detecting blurring or masking attacks against AI systems and take appropriate countermeasures to mitigate the effects of such attacks.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]