Must Learn AI Security Part 8: Model Stealing Attacks Against AI

Chapter 8

Sep 05, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Model Stealing attack against AI?

A Model Stealing attack against AI is a type of attack in which an adversary attempts to steal the machine learning model used by a target AI system. The attacker can use various techniques to accomplish this, such as querying the target model and using the responses to create a similar model or using training data to train a new model that mimics the target model's behavior. This type of attack is particularly concerning because it can allow an adversary to replicate the target model's decision-making capabilities, potentially leading to a range of security and privacy issues.

How it works

A Model Stealing attack against AI typically works by exploiting vulnerabilities in the target AI system. The attacker may begin by querying the target model with carefully crafted inputs and analyzing the responses to gain insights into how the model is making its decisions. This information can then be used to train a new model that closely mimics the behavior of the target model. Alternatively, the attacker may attempt to access the target model's training data, either through direct data theft or by exploiting weaknesses in the target system's security protocols. With access to the training data, the attacker can train a new model that is able to make similar decisions to the target model. Once the attacker has successfully stolen the target model, they may use it for a variety of malicious purposes, such as launching attacks against the target system or using the stolen model to gain unauthorized access to sensitive data.

Types of Model Stealing attacks

There are several different types of Model Stealing attacks against AI. Here are some examples:

Query-based attack: In this type of attack, the attacker queries the target model with carefully crafted inputs and uses the responses to train a new model that closely mimics the behavior of the target model.
Membership inference attack: In this type of attack, the attacker uses queries to determine if a particular data point was used to train the target model. This can be used to steal the model's training data.
Model inversion attack: In this type of attack, the attacker uses the output of the target model to infer sensitive information about the training data used to create the model.
Reconstruction attack: In this type of attack, the attacker uses the output of the target model to reconstruct some or all of the training data used to create the model.
Trojan attack: In this type of attack, the attacker creates a backdoor in the target model that can be activated later to compromise the security of the system.

Why it matters

A Model Stealing attack against AI can have several negative effects, including:

Loss of intellectual property: If an attacker successfully steals a model, they can use it to replicate the decision-making capabilities of the target system. This can lead to loss of intellectual property, as the attacker can use the stolen model to create competing products or services.
Security and privacy risks: A stolen model can be used to launch attacks against the target system, such as data exfiltration, denial-of-service attacks, or unauthorized access to sensitive information. Additionally, the stolen model may contain sensitive information that can be used to compromise the privacy of individuals or organizations.
Reputation damage: A successful Model Stealing attack can damage the reputation of the target organization, especially if the attack results in loss of intellectual property, data breaches, or other security incidents.
Financial losses: A Model Stealing attack can result in significant financial losses for the target organization, including the cost of investigating and mitigating the attack, lost revenue due to decreased customer trust, and potential legal liabilities.

Why it might happen

An attacker can gain several things from a successful Model Stealing attack against AI. Here are some examples:

Knowledge of proprietary algorithms: If the target system is using proprietary algorithms or models, an attacker can gain valuable knowledge by stealing the model. This can be used to develop competing products or services.
Access to sensitive information: A stolen model can be used to launch attacks against the target system and gain unauthorized access to sensitive information, such as customer data, financial data, or intellectual property.
Ability to replicate the target system's decision-making: With a stolen model, an attacker can replicate the decision-making capabilities of the target system. This can be used to create competing products or services or to launch targeted attacks against the target system.
Financial gain: An attacker can use a stolen model to make decisions that result in financial gain, such as stock market trading or fraudulent activities.

Overall, a successful Model Stealing attack can provide the attacker with valuable information and capabilities that can be used for a variety of malicious purposes.

Real-world Example

One real-world example of a Model Stealing attack against AI occurred in 2019 when researchers from the University of California, San Diego, and University of California, Berkeley, demonstrated a successful attack on Amazon's Alexa and Google Home. The researchers were able to train a new model that closely mimicked the behavior of the target systems by querying them with carefully crafted inputs and analyzing the responses. They were then able to use the stolen models to launch a range of attacks, including activating smart home devices, making unauthorized purchases, and accessing personal information.

The researchers also demonstrated a related attack in which they were able to use the target models to infer sensitive information about the users, such as their medical conditions or financial status. This attack was possible because the target models revealed patterns in their decision-making that were related to the sensitive information.

This example highlights the real-world threat posed by Model Stealing attacks against AI, and the need for organizations to take steps to protect their AI systems from these types of attacks.

How to Mitigate

There are several ways to mitigate the risk of a Model Stealing attack against AI, including:

Secure data management: Organizations should implement robust security protocols for their training data, including encryption, access controls, and monitoring. They should also limit access to the data and use anonymization techniques where possible.
Regularly update and patch AI systems: Organizations should regularly update and patch their AI systems to address any vulnerabilities that may be discovered. This can help prevent attackers from exploiting known weaknesses in the system.
Use model obfuscation techniques: Model obfuscation techniques can be used to make it more difficult for attackers to steal a model. This can include techniques such as adding noise to the model output or using differential privacy techniques to mask the training data.
Monitor for suspicious activity: Organizations should monitor their AI systems for suspicious activity, such as unusual queries or data access patterns, which may indicate a Model Stealing attack.
Use multi-factor authentication: Multi-factor authentication can be used to secure access to AI systems and prevent unauthorized access.
Implement a response plan: Organizations should have a response plan in place in case of a Model Stealing attack or other security incident. This should include procedures for investigating and mitigating the attack, as well as communications plans for informing stakeholders and customers.

Overall, mitigating the risk of a Model Stealing attack requires a comprehensive approach that includes technical measures, secure data management, and a robust response plan.

How to monitor/What to capture

To detect a Model Stealing attack against AI, the following should be monitored:

Query patterns: Monitoring the queries made to the AI system can help detect a Model Stealing attack. If an attacker is attempting to steal the model, they may send a large number of queries in a short period of time, or they may send queries that are designed to probe the system for weaknesses.
Data access patterns: Monitoring data access patterns can help detect a Model Stealing attack. If an attacker is attempting to steal the model, they may access training data that they are not authorized to access, or they may access data in a way that is outside of the normal usage patterns.
Model performance: Monitoring the performance of the model can help detect a Model Stealing attack. If an attacker is successfully stealing the model, there may be a noticeable decline in the performance of the model.
Network traffic: Monitoring network traffic can help detect a Model Stealing attack. If an attacker is attempting to steal the model, there may be a noticeable increase in network traffic, or there may be traffic to suspicious IP addresses.
User behavior: Monitoring user behavior can help detect a Model Stealing attack. If an authorized user is behaving suspiciously, such as accessing data that they are not authorized to access, this may indicate that they are attempting to steal the model.

Overall, detecting a Model Stealing attack requires a comprehensive approach that includes monitoring a range of different indicators, including queries, data access patterns, model performance, network traffic, and user behavior.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]