Must Learn AI Security Part 10: Backdoor Attacks Against AI

Chapter 10

Sep 08, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Backdoor attack against AI?

A backdoor attack against AI refers to a malicious manipulation of an artificial intelligence system, usually during the training process, by embedding a hidden pattern or trigger. This allows the attacker to compromise the AI's behavior and control its decision-making process when the specific trigger is introduced.

How it works

A backdoor attack against AI typically involves the following steps:

Data poisoning: The attacker manipulates the training dataset by injecting carefully crafted samples containing a hidden trigger or pattern, along with the desired malicious output. This process is called data poisoning.
Model training: The AI system is trained using the poisoned dataset. Since machine learning algorithms learn from the data provided, the model will also learn the hidden triggers and their associated malicious behavior. During this phase, the backdoor is embedded into the model.
Model deployment: The compromised AI model is deployed for its intended use. The system appears to function normally, providing accurate predictions and classifications for most inputs.
Exploitation: The attacker introduces the hidden trigger or pattern to the AI system. When the system encounters this trigger, it produces the malicious output that the attacker intended, allowing them to exploit the AI system without being detected.
Bypassing security or other functionalities: The attacker can use the backdoor to bypass security measures, misclassify certain inputs, or perform other malicious actions, depending on the intended goal of the backdoor attack.

The effectiveness of a backdoor attack against AI depends on the sophistication of the trigger, the attacker's knowledge of the AI system, and the ability to manipulate the training data without raising suspicion. Detecting and preventing such attacks is an ongoing challenge in the field of AI and cybersecurity.

Types of Backdoor attacks

There are several types of backdoor attacks against AI, each with different strategies and goals. Some common types include:

Trojan attacks: In a trojan attack, the attacker embeds a hidden trigger in the AI model during the training phase. When the AI encounters the trigger, it produces a specific malicious output. This type of attack is often used to compromise security systems or to make the AI system perform unintended actions.
Clean-label attacks: In a clean-label attack, the attacker manipulates the training data by introducing malicious samples with the correct labels. The AI system learns these manipulated samples as normal examples, which allows the attacker to control the AI's behavior without using an explicit hidden trigger. This type of attack is harder to detect because the training data appears legitimate.
Poisoning attacks: A poisoning attack involves modifying a small portion of the training data with malicious inputs. The AI model learns to associate these inputs with the attacker's desired outputs. When the AI encounters similar inputs in the real world, it may produce the malicious outputs, potentially causing harm or misleading users.
Model inversion attacks: In a model inversion attack, the attacker attempts to reconstruct sensitive information about the training data by querying the AI model with carefully crafted inputs. This type of attack exploits the fact that AI models may inadvertently memorize certain aspects of the training data, potentially exposing private information.
Membership inference attacks: In a membership inference attack, the attacker tries to determine if a specific data point was part of the AI model's training dataset. By analyzing the model's behavior and confidence in its predictions, the attacker can infer information about the training data and potentially gain insights into sensitive information.

These are just a few examples of the different types of backdoor attacks against AI systems. Each of these attacks poses unique challenges and emphasizes the importance of robust security measures and careful scrutiny of AI models and their training data.

Why it matters

A backdoor attack against AI can have several negative effects, depending on the attacker's intentions and the targeted system. Some common negative consequences include:

Compromised security: If the AI system is part of a security infrastructure, such as a facial recognition system or an intrusion detection system, a backdoor attack can allow unauthorized access, bypassing security measures and putting sensitive data or resources at risk.
Misleading or incorrect outputs: A successful backdoor attack may cause the AI system to produce incorrect or malicious results when the hidden trigger is encountered. This can lead to false information, misclassification of data, or inappropriate actions taken based on the AI's output.
Loss of trust: If an AI system is found to be compromised by a backdoor attack, users may lose trust in the system's reliability and accuracy. This can have long-term consequences for the adoption of AI technologies and the reputation of the organizations deploying them.
Privacy breaches: Some backdoor attacks aim to extract sensitive information from the AI system or its training data. This can lead to privacy breaches, exposing personal or confidential information and potentially causing harm to individuals or organizations.
Legal and regulatory consequences: If a backdoor attack results in a security breach, data leak, or other negative outcomes, the affected organization may face legal and regulatory consequences. This can include fines, penalties, or even criminal charges, depending on the severity of the incident and the jurisdiction.
Financial losses: A successful backdoor attack can lead to financial losses for the affected organization, either directly (e.g., through theft of funds or data) or indirectly (e.g., through reputational damage, loss of customers, or the costs associated with addressing the attack and remediation).

Overall, a backdoor attack against AI can have significant negative effects on the targeted system, its users, and the organization responsible for the AI. It highlights the importance of securing AI systems and ensuring their resilience against potential threats.

Why it might happen

An attacker can gain several benefits from a successful backdoor attack against AI, depending on their goals and the targeted system. Some potential gains include:

Control over the AI system: The attacker can manipulate the AI system's behavior and decision-making process when the hidden trigger is encountered. This can allow them to control the system's actions or bypass security measures, depending on the intended purpose of the backdoor.
Access to sensitive data or resources: If the AI system is part of a security infrastructure or handles sensitive information, the attacker may be able to access restricted data or resources by exploiting the backdoor, potentially causing harm or stealing valuable information.
Disruption or sabotage: The attacker may use the backdoor to disrupt or sabotage the AI system's normal functioning, causing it to produce incorrect or misleading outputs. This can lead to operational issues, financial losses, or reputational damage for the targeted organization.
Espionage: A backdoor attack can provide the attacker with insights into the AI system's inner workings, its training data, or the organization deploying it. This information may be valuable for industrial espionage, gaining a competitive advantage, or further malicious activities.
Leverage for future attacks: By compromising an AI system through a backdoor attack, the attacker may gain a foothold within an organization's infrastructure, which can be exploited for future attacks or to maintain persistent access to the system.
Demonstrating technical prowess: Some attackers may carry out backdoor attacks against AI systems to demonstrate their technical skills, either for personal satisfaction or to gain notoriety within the hacker community.

The specific gains from a backdoor attack against AI will depend on the attacker's objectives and the nature of the targeted system. Regardless of the attacker's goals, such attacks can have significant negative consequences for the affected AI system and the organization responsible for it.

Real-world Example

A real-world example of a backdoor attack against AI is the BadNets attack, which was demonstrated by researchers from New York University in 2017. In this case, the attack was conducted as an experiment to study the vulnerabilities of AI systems, rather than for malicious purposes.

The researchers focused on backdooring a Deep Neural Network (DNN) used for traffic sign recognition. They poisoned the training dataset by adding a small, inconspicuous sticker to a stop sign in some of the training images. The DNN was then trained on this poisoned dataset.

The compromised AI model correctly recognized normal stop signs but failed to recognize stop signs with the specific sticker pattern, which had been embedded as a backdoor trigger. Instead, the AI system classified such stop signs as "speed limit" signs.

Although this example was an academic experiment, it highlighted the potential risks of backdoor attacks against AI systems and the importance of securing AI models against such threats. In a real-world scenario, an attacker could use a similar strategy to compromise safety-critical systems like autonomous vehicles or other AI-based decision-making processes, leading to potentially harmful consequences.

How to Mitigate

Mitigating the risk of backdoor attacks against AI involves various strategies at different stages of the AI development lifecycle. Here are some key approaches to help prevent and detect such attacks:

Secure and vet training data: Ensuring the integrity of the training dataset is crucial. Collect data from trusted sources, and validate the quality and authenticity of the data. Perform regular audits to detect any anomalies or malicious patterns.
Data augmentation and sanitization: Augment the dataset with diverse examples to make it more resilient against adversarial attacks. Data sanitization techniques, such as removing suspicious samples or applying data transformation, can help eliminate potential backdoor triggers.
Robust model architecture: Design AI models with robust architectures that are less susceptible to adversarial manipulation. This may include using techniques like dropout, adversarial training, or defensive distillation to improve the model's resilience against attacks.
Model monitoring and validation: Continuously monitor the AI model's performance during training and deployment. Use validation datasets to evaluate the model's accuracy and identify any unusual behavior. Regularly retrain the model with updated, clean data to minimize the risk of backdoor attacks.
Secure AI development pipeline: Implement strict access controls and security measures throughout the AI development process. This includes securing the infrastructure, protecting data storage and transmission, and monitoring for unauthorized access or tampering.
Transparency and explainability: Employ AI explainability techniques, such as attention maps or feature attribution, to understand and interpret the model's decision-making process. This can help identify potential backdoors or malicious behavior within the model.
Anomaly detection and intrusion prevention: Use intrusion detection systems and anomaly detection techniques to identify and respond to potential threats or unusual activities in the AI system.
External audits and third-party testing: Conduct regular external audits and third-party penetration testing to evaluate the security and robustness of the AI system and identify potential vulnerabilities or backdoors.

By employing these mitigation strategies, organizations can minimize the risk of backdoor attacks against AI systems and ensure the security and reliability of their AI models. It is essential to maintain a proactive and comprehensive approach to AI security to stay ahead of evolving threats and protect against potential attacks.

How to monitor/What to capture

To detect a backdoor attack against AI, it is important to monitor various aspects of the AI system during development, training, and deployment. Here are some key elements to monitor:

Training data: Keep a close watch on the training dataset to detect any anomalies, inconsistencies, or malicious patterns that might indicate tampering or poisoning. Regularly audit and validate the data to ensure its quality and authenticity.
Model performance metrics: Monitor the AI model's performance metrics, such as accuracy, loss, and other evaluation scores during training and validation. Look for any unexpected fluctuations or discrepancies that could indicate the presence of a backdoor.
Model behavior: Analyze the AI model's behavior during training and deployment, paying attention to any unusual or unexpected outputs. Use explainability techniques to understand the decision-making process and identify potential backdoors or malicious behavior.
System logs and access patterns: Monitor system logs and access patterns to detect any unauthorized access, data manipulation, or tampering with the AI model or its training data. Implement strict access controls and track user activities to identify potential security breaches.
Network activity: Keep an eye on network traffic and communication patterns between the AI system and external entities. Unusual or unexpected network activity could indicate an attempt to inject a backdoor or exfiltrate sensitive information.
Anomalies and intrusion alerts: Use anomaly detection and intrusion prevention systems to identify and respond to potential threats or suspicious activities within the AI system.
Model updates and retraining: Monitor the AI model's updates and retraining processes to ensure the integrity of the model and its training data. Verify the source and quality of any new data being incorporated into the model.
External reports and threat intelligence: Stay updated on the latest research, threat intelligence, and security reports related to AI backdoor attacks. This can help you identify new attack vectors, techniques, and trends, and adapt your monitoring strategy accordingly.

By closely monitoring these aspects of the AI system, you can detect potential backdoor attacks and respond quickly to mitigate any adverse effects. It is crucial to maintain a proactive approach to AI security and continuously improve your monitoring and detection capabilities to stay ahead of evolving threats.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]