Must Learn AI Security Part 3: Adversarial Attacks Against AI

Chapter 3

Aug 15, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is an Adversarial attack?

Adversarial attacks against AI are like throwing a wrench in the gears of a well-oiled machine. These attacks involve crafting sneaky input data to confuse AI systems, making them produce incorrect or misleading results. It's like someone giving you a fake treasure map and watching you dig holes all day. These attacks can expose vulnerabilities in AI systems and, if not addressed, can have some serious consequences, like a bad hair day for AI. So, it's crucial to develop robust AI models that can withstand these pesky adversarial attacks.

How it works

Adversarial attacks against AI are like a sneaky game of trick-or-treat. The attacks aim to fool an AI model by making small, crafty changes to the input data and generally happens in the following order:

First, the attacker identifies an AI model's weakness.
Next, they create an adversarial example, which is the input data with some subtle alterations. To the human eye, the changes are barely noticeable, but they're just enough to send the AI model into a tizzy.
The altered input data is then fed into the AI model which tries to make sense of it. But because of the adversarial example, the model ends up making incorrect predictions or classifications.
The attacker then sits back and watches the chaos unfold, like a mischievous kid who's just tied everyone's shoelaces together.

Keep in mind that not all AI models are defenseless against these attacks. Like much of AI security right now, ways to protect AI systems from adversarial attacks are being developed, like training them with adversarial examples or building more robust models.

Types of Adversarial attacks

There's a whole variety of adversarial attacks against AI, just like there are many ways to ruin a perfectly good pie. Here are a few common types:

Fast Gradient Sign Method (FGSM): This one generates adversarial examples by adding small, malicious changes to the input data that confuse the AI model.
Projected Gradient Descent (PGD): This attack iteratively adjusts the input data to maximize the AI model's error, making it confused.
Carlini & Wagner (C&W) Attack: This sneaky attack is like slipping onions into a fruit salad. It optimizes the input data to minimize the difference between the original and adversarial examples while still fooling the AI model.
DeepFool: This attack is like a game of hide-and-seek with a twist. It finds the smallest possible perturbation to the input data, making it almost invisible to the AI model while still causing it to make incorrect predictions.
One-Pixel Attack: This one alters just one pixel of an image to confuse AI models in image classification tasks, showing that even the tiniest change can trip up these fancy AI systems.

AI models need to be designed and tested to withstand these adversarial attacks.

Why it might happen

People have different reasons for launching adversarial attacks against AI. It's like asking why someone would put salt in a sugar bowl. Some reasons include:

Exploiting vulnerabilities: Just like some folks get a kick out of finding loopholes, attackers might want to expose weaknesses in an AI system and use them to their advantage.
Sabotage: Some attackers might want to undermine a competitor's AI system or cause reputational damage.
Security research: Not all adversarial attacks are malicious. Some researchers use these attacks to study AI systems' vulnerabilities and develop more robust and secure models. It's like testing the locks on your doors to make sure no one can break in.
Bypassing security systems: Some attackers might use adversarial attacks to fool AI-powered security systems, like facial recognition or spam filters. It's like wearing a disguise to sneak past the bouncer at a nightclub.
Stealing sensitive data: By attacking AI models, some folks might be trying to access confidential information or intellectual property.

While there are benefits for the attackers, these actions can have serious consequences for others. That's why it's essential to develop AI models that can stand their ground against these sneaky attacks.

Real-world Example

A real-world example of an adversarial attack against AI is a research experiment conducted by a team of researchers at Google Brain, OpenAI, and Stanford University in 2017. They demonstrated that by slightly modifying an image, they could deceive an AI-based image recognition system into misclassifying it.

In this particular experiment, they used a technique called "fast gradient sign method" (FGSM) to create adversarial examples. They took an image of a panda and added a small amount of carefully calculated noise, which is imperceptible to humans. This noise caused the AI image recognition system to misclassify the panda as a gibbon with a high confidence level, even though the altered image still appeared to be a panda to humans.

This example highlights the vulnerability of AI systems, particularly deep neural networks, to adversarial attacks. By making subtle changes to the input data, attackers can manipulate the AI system's output, potentially leading to incorrect decisions or unintended actions. Adversarial attacks can pose significant risks in various applications, including autonomous vehicles, security systems, and medical diagnostics, among others.

How to Mitigate

Mitigating adversarial attacks against AI systems typically involves a combination of approaches, as no single method can guarantee complete protection. Some potential methods to mitigate adversarial attacks include:

Data Augmentation: Enhance the training dataset by adding adversarial examples generated using various attack methods, which can help the AI system to learn and recognize these perturbations and improve its robustness against such attacks.
Adversarial Training: Train the AI model using a combination of clean and adversarial examples, allowing the model to learn from both types of data and improve its resilience against adversarial attacks.
Gradient Masking: Regularize the model during training by adding noise or applying other transformations to the gradient, making it harder for an attacker to compute the gradient and generate adversarial examples.
Defensive Distillation: Train a second model that learns to mimic the output probabilities of the original model, effectively smoothing the decision boundaries and making it more difficult for an attacker to find adversarial examples.
Randomization: Introduce randomization during the inference stage, such as by applying random transformations to input data or randomly selecting subsets of the model for evaluation. This can make it more challenging for an attacker to generate adversarial examples that consistently fool the AI system.
Detection Methods: Employ techniques to detect adversarial examples at runtime, such as by comparing the input's features to known clean and adversarial examples or monitoring the model's behavior during inference.
Ensemble Methods: Use multiple AI models or an ensemble of models to make predictions. This can reduce the likelihood of a single adversarial example fooling all models simultaneously.
Robust Model Architectures: Design AI models with in-built robustness to adversarial attacks, such as by incorporating attention mechanisms, dropout layers, or other architectural components that can help the model withstand adversarial perturbations.
Regularization Techniques: Apply regularization techniques like L1 or L2 regularization during training to reduce model complexity and improve generalization, which can make the model less susceptible to adversarial attacks.
Ongoing Research and Collaboration: Stay up to date with the latest research in adversarial robustness and collaborate with other researchers and practitioners to develop and share effective mitigation techniques.

How to Monitor and what to capture

Monitoring adversarial attacks against AI systems involves detecting and analyzing unusual or malicious activities that target the AI models. This can be achieved through a combination of techniques including the following:

Input Monitoring: Analyze input data for anomalies, unexpected patterns, or changes in distribution that might indicate an adversarial attack. This can be done using statistical methods, machine learning algorithms, or deep learning techniques to detect and flag suspicious inputs.
Model Behavior Monitoring: Track the AI model's behavior, such as its confidence in predictions or output probabilities, to identify anomalies that could suggest an adversarial attack. An unusually high or low confidence level or a sudden change in the model's behavior may be indicative of an attack.
Performance Metrics Tracking: Continuously monitor the AI system's performance metrics, such as accuracy, precision, recall, and F1 score, to identify any sudden or unexpected drops in performance that could be the result of an adversarial attack.
Log Analysis: Collect and analyze logs from the AI system and its surrounding infrastructure to identify unusual activities, patterns, or access attempts that might suggest an attack.
Intrusion Detection Systems (IDS): Implement intrusion detection systems that monitor network traffic, system activities, or application-level events to detect and report potential adversarial attacks.
Runtime Verification: Employ runtime verification techniques to ensure that the AI model's behavior adheres to a predefined set of properties or specifications, which can help detect deviations caused by adversarial attacks.
Periodic Model Evaluation: Regularly evaluate the AI model using test datasets and validation sets to assess its performance and robustness against known and unknown adversarial examples.
Audit Trails: Maintain detailed audit trails of all activities, including data access, model updates, and system configurations, to support the investigation and analysis of potential adversarial attacks.
Incident Response Plan: Develop a comprehensive incident response plan to address potential adversarial attacks, including steps to detect, analyze, contain, eradicate, and recover from an attack.
Collaboration and Information Sharing: Collaborate with other organizations, researchers, and practitioners to share information about adversarial attacks, detection techniques, and best practices for monitoring and mitigating such attacks. This can help improve the overall security posture of AI systems across the community.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]