Must Learn AI Security Part 4: Trojan Attacks Against AI

Chapter 4

Aug 21, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Trojan attack against AI?

Much like any type of Trojan attack in the security realm, a Trojan attack against AI is a type of cyber-attack where a malicious actor disguises a piece of malware as a legitimate software program or data file. Once the Trojan is installed on an AI system, it can give the attacker unauthorized access to the system, steal sensitive data, or cause other types of damage. In the case of AI, Trojan attacks can be particularly damaging because they can manipulate the algorithms that make decisions based on data, leading to incorrect or even dangerous outcomes.

How it works

The general steps taken in a Trojan attack against AI can vary, but here are some common steps that attackers may take:

Reconnaissance: The attacker does research on the target AI system to identify vulnerabilities and weaknesses.
Delivery: The attacker delivers a Trojan to the AI system, often through email phishing, social engineering or through infected software.
Installation: The Trojan is installed on the AI system, allowing the attacker access to the system.
Command and Control: The attacker establishes a command-and-control infrastructure to remotely control the Trojan and carry out malicious actions.
Exploitation: The attacker exploits the Trojan to carry out malicious actions, which can include stealing sensitive data, manipulating algorithms to produce incorrect results, or causing other types of damage.
Cover-up: The attacker may attempt to cover up their tracks to avoid detection and continue their malicious activities.

These steps take a similar approach to the adversary tactics and techniques of the MITRE ATT&CK Matrix for Enterprise. If not already, you should become very familiar with these threat models and methodologies.

See: ATT&CK Matrix for Enterprise

Types of Trojan attacks

There are different types of Trojan attacks against AI. Here are a few examples:

Data Poisoning: In this type of attack, the attacker injects incorrect or malicious data into an AI system, which can manipulate the system's decision-making process.
Model Stealing: In this type of attack, the attacker steals the AI model used by a company or organization, which can allow the attacker to replicate the model and use it for malicious purposes.
Backdoor Access: In this type of attack, the attacker gains unauthorized access to an AI system by exploiting a vulnerability or creating a backdoor.
Adversarial Attacks: In this type of attack, the attacker creates adversarial inputs that can cause an AI system to produce incorrect or unexpected outputs.
Malware Injection: In this type of attack, the attacker injects malware into an AI system through a Trojan, which can allow the attacker to control the system and carry out malicious activities.

It's important to be aware of these different types of Trojan attacks against AI and take appropriate measures to prevent them.

Why it matters

The negative results of a Trojan attack against AI can be severe and can vary depending on the type and severity of the attack. Here are some possible negative results:

Data Theft: Attackers can use Trojan attacks to steal sensitive data from an AI system, such as personal information, financial data, or intellectual property.
Manipulation of Algorithms: Attackers can use Trojan attacks to manipulate the algorithms used by an AI system, which can result in incorrect or biased decisions.
System Disruption: Trojan attacks can disrupt the functioning of an AI system, which can cause it to malfunction or stop working altogether.
Financial Loss: Trojan attacks can result in financial loss for companies or organizations, either through theft of funds or loss of revenue due to system disruption.
Reputation Damage: If a company or organization is the victim of a Trojan attack, it can damage their reputation and erode trust with customers and partners.

These negative results can have long-lasting consequences for companies or organizations that fall victim to Trojan attacks against AI, which is why it's important to take preventative measures to secure these systems.

How it might happen

A Trojan attack against AI can happen in several ways, but here are some common methods:

Social Engineering: Attackers may use social engineering tactics to trick users into downloading and installing Trojan malware, often through phishing emails or other types of social engineering attacks.
Software Vulnerabilities: Attackers may exploit vulnerabilities in software or operating systems used by an AI system to gain access and install Trojan malware.
Third-Party Software: Attackers may target third-party software components or libraries used by an AI system, which can contain vulnerabilities that can be exploited to install Trojan malware.
Malicious Websites: Attackers can use malicious websites to exploit vulnerabilities in a user's browser or operating system, which can allow them to install Trojan malware on the AI system.
Physical Access: Attackers may gain physical access to an AI system and install Trojan malware directly onto the system.

Once the Trojan malware is installed on the AI system, the attacker can use it to remotely control the system, steal data, or manipulate algorithms to produce incorrect or biased results.

Real-world Example

A real-world example of a Trojan attack against AI occurred in 2019, when researchers from the University of California, Berkeley, published a paper detailing how they inserted backdoor trojans into deep learning models. They demonstrated that an attacker could train the model to recognize a specific, seemingly innocuous trigger, like a small watermark, patch, or a specific color pattern. When the AI model encounters this trigger in the input data, it will produce a specific, predefined incorrect output, which can be controlled by the attacker.

In their experiment, the researchers inserted a backdoor into a facial recognition system. They trained the model to recognize a specific pattern of glasses on a person's face. When the AI system encountered a face with these glasses, it would incorrectly classify the person as a specific individual, regardless of their actual identity. This could be exploited to bypass security systems, falsely incriminate someone, or cause other unintended consequences.

This example highlights the risk of trojan attacks in AI systems, where an attacker can manipulate the training process or insert malicious code into the model itself, causing the system to behave in unintended and potentially harmful ways when exposed to specific triggers.

How to Mitigate

There are several ways to mitigate Trojan attacks against AI, including:

Use of Antivirus and Firewall Software: Antivirus and firewall software can help prevent Trojan malware from being installed on an AI system and can detect and block malicious activity.
Regular Software Updates: Regular software updates can help fix vulnerabilities in the software or operating system used by the AI system, making it more difficult for attackers to exploit these vulnerabilities.
Strong Access Controls: Implementing strong access controls, such as limiting user access to only what is necessary and requiring multi-factor authentication, can help prevent unauthorized access to the AI system.
Employee Education: Educating employees on how to recognize and prevent social engineering attacks, such as phishing emails, can help prevent Trojan malware from being installed on the AI system.
Adversarial Training: Adversarial training involves training an AI system to recognize and defend against adversarial attacks, such as adversarial inputs or data poisoning.
Regular System Audits: Regular system audits can help identify vulnerabilities and weaknesses in the AI system, allowing them to be addressed before they can be exploited by attackers.

By implementing these mitigation strategies, companies and organizations can better protect their AI systems from Trojan attacks and other types of cyber threats.

How to monitor

To monitor against Trojan attacks against AI, here are some steps you can take:

Implement Real-Time Monitoring: Implementing real-time monitoring of AI systems can help detect and alert security teams to any unusual activity or attempts to access the system.
Implement Intrusion Detection and Prevention: Intrusion detection and prevention systems can help detect and prevent unauthorized access to AI systems, including Trojan attacks.
Use Machine Learning: Machine learning can be used to detect anomalies in the behavior of an AI system and flag any suspicious activity that could be indicative of a Trojan attack.
Conduct Regular Penetration Testing: Regular penetration testing can help identify vulnerabilities in an AI system, allowing them to be addressed before they can be exploited by attackers.
Monitor Network Traffic: Monitoring network traffic can help detect any attempts to exfiltrate data from an AI system or any suspicious activity that could be indicative of a Trojan attack.
Implement User Behavior Analytics: User behavior analytics can help detect any unusual or suspicious behavior by users of an AI system, which could be indicative of a Trojan attack.

By implementing these monitoring strategies, companies and organizations can better protect their AI systems from Trojan attacks and other types of cyber threats. It's important to continually evaluate and update monitoring strategies to ensure that they are effective and up to date with the latest threats.

What to capture

To identify when a Trojan attack against AI is happening, you should capture the following types of data:

Network Traffic: Monitoring network traffic can help detect any unusual traffic patterns that could be indicative of a Trojan attack. This includes capturing data on the volume and frequency of data transfers, the source and destination IP addresses, and the type of data being transferred.
System Logs: System logs can provide valuable information on user activity, system performance, and security events. Capturing data on user logins, system activity, and system errors can help detect any unusual or suspicious activity that could be indicative of a Trojan attack.
User Behavior Analytics: Capturing data on user behavior, such as the types of files accessed, the frequency of access, and the times of day when access occurs, can help detect any unusual or suspicious behavior that could be indicative of a Trojan attack.
AI Model Performance Metrics: Capturing data on the performance of an AI model, such as accuracy, precision, and recall, can help detect any unusual or unexpected changes in the performance of the model that could be indicative of a Trojan attack.
Security Alerts: Capturing data on security alerts generated by intrusion detection and prevention systems, firewalls, and antivirus software can help detect any attempted or successful Trojan attacks.

By capturing and analyzing this data, companies and organizations can better detect and respond to Trojan attacks against AI, helping to mitigate their impact and reduce the risk of data theft, system disruption, and other negative consequences.

To prevent Trojan attacks against AI, it's important to maintain strong cybersecurity practices, including regular software updates, strong passwords, and employee education about phishing and social engineering tactics. To prevent Trojan attacks, it's important to keep software up-to-date, use strong passwords, and be cautious when downloading files from unknown sources. This can include using secure coding practices, regularly updating software and systems, and implementing strong access controls and monitoring.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]

Alex reid

Nov 9, 2023

I know this is a little late for the entry but would a proper ci/cd pipeline protect against Trojan injection?. The frequent refresh rate would flush the Trojan quickly limiting the damage. It would not prevent a supply chain attack but it should disrupt a direct local code execution injection attempt. Thoughts?

Expand full comment

1 reply by Rod Trent

1 more comment...