Must Learn AI Security Part 18: Bias Exploitation Attacks Against AI

Chapter 18

Sep 29, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Bias Exploitation attack against AI?

A bias exploitation attack against AI is a type of attack where an adversary intentionally manipulates an AI system's output by exploiting the biases present in its algorithms. This can be done by training the AI system with biased data or by manipulating the input data to the system in a way that triggers the biases. As a result, the AI system's output becomes skewed and inaccurate, leading to potentially harmful consequences. For example, a facial recognition system trained on biased data may misidentify individuals from certain ethnicities or races.

Types of Bias Exploitation attacks

There are several different types of bias exploitation attacks against AI, including:

Data poisoning: This attack involves intentionally feeding biased or malicious data into an AI system to manipulate its output.
Adversarial attacks: Adversarial attacks involve manipulating the input data to an AI system in a way that causes it to produce biased or incorrect output.
Model inversion: Model inversion attacks are a type of attack where an adversary can steal sensitive information from an AI system by exploiting its biases.
Backdoor attacks: Backdoor attacks involve adding malicious code to an AI system's algorithms, which can later be triggered to manipulate its output.
Membership inference attacks: This attack involves an adversary attempting to determine whether a specific individual's data was included in the training data for an AI system, which can be used to exploit any biases in the system's algorithms.

All of these attacks seek to exploit the biases present in an AI system to manipulate its output, leading to potentially harmful consequences.

How it works

A Bias Exploitation attack against AI works by taking advantage of the biases present in an AI system's algorithms. The attack may involve manipulating the training data used to develop the AI system or modifying the input data provided to the system during operation. Here's an example of how a Bias Exploitation attack might work:

Let's say an AI system is designed to predict whether a loan applicant is likely to default on a loan. The system is trained using historical data on loan applicants, which includes information such as income, credit score, and employment history. However, the historical data may contain biases, such as a preference for applicants who are male or who come from certain neighborhoods.

An attacker could exploit these biases by manipulating the training data used to develop the AI system. For example, they could remove data on female loan applicants or those who come from certain neighborhoods. As a result, the AI system's algorithms may become biased towards male applicants or those from certain neighborhoods, leading to incorrect predictions.

Alternatively, the attacker could manipulate the input data provided to the system during operation. For example, they could provide false information about the applicant's income or employment history to trigger biases in the AI system's algorithms. As a result, the AI system may produce incorrect predictions about the applicant's likelihood of defaulting on a loan.

Overall, Bias Exploitation attacks seek to exploit the biases present in an AI system to manipulate its output, leading to potentially harmful consequences.

Why it matters

The negative effects of a Bias Exploitation attack against AI can be significant and wide-ranging. Here are some examples:

Discrimination: Bias Exploitation attacks can cause an AI system to produce discriminatory output, such as denying loans or job opportunities to certain groups of people based on their race or gender.
Unfair treatment: If an AI system is biased, it may treat certain individuals unfairly, leading to negative consequences such as incorrect medical diagnoses or wrongful arrest.
Decreased trust: If an AI system is found to be biased, it can lead to a decrease in trust in the system and the organization that developed it. This can be detrimental to the adoption and usage of the AI system.
Inaccurate results: A Bias Exploitation attack can lead to inaccurate results, which can cause problems in fields such as healthcare, where incorrect diagnoses can have life-threatening consequences.
Legal issues: If an AI system is found to be biased, it can lead to legal issues and lawsuits, which can be costly and damaging to an organization's reputation.

Overall, the negative effects of a Bias Exploitation attack against AI can be far-reaching and can impact individuals, organizations, and society as a whole.

Why it might happen

The goals of an attacker in a Bias Exploitation attack against AI may vary depending on their motivations. However, here are some examples of what an attacker may gain from such an attack:

Financial gain: An attacker may try to manipulate the output of an AI system to gain financial benefits, such as by getting approved for loans or insurance that they would not otherwise qualify for.
Strategic advantage: An attacker may try to manipulate the output of an AI system to gain a strategic advantage over competitors, such as by influencing the outcome of an election or gaining an advantage in a business deal.
Political gain: An attacker may try to manipulate the output of an AI system to achieve political goals, such as by influencing public opinion or suppressing the vote of certain groups.
Sabotage: An attacker may try to manipulate the output of an AI system to cause damage or disruption to an organization, such as by causing a medical diagnosis system to produce incorrect diagnoses or causing a self-driving car to crash.
Personal gain: An attacker may try to manipulate the output of an AI system to achieve personal goals, such as by gaining access to sensitive information or causing harm to an individual or group.

Overall, the motivations of an attacker in a Bias Exploitation attack against AI can vary widely, and the potential gains from such an attack can be significant.

Real-world Example

One real world example of a Bias Exploitation attack against AI is the case of Amazon's AI-based hiring tool. In 2018, it was reported that Amazon had developed an AI system to help with its hiring process. However, the system was found to be biased against women, as it had been trained on resumes submitted to Amazon over a 10-year period, which were predominantly from male applicants.

As a result, the AI system learned to favor male applicants over female ones, and even downgraded resumes that included words like "women's" or names of women's colleges. This was because the system was using past data to make predictions about future hiring decisions, and the past data was biased.

This Bias Exploitation attack against AI had significant negative consequences. It resulted in Amazon abandoning the AI-based hiring tool altogether, as it was not able to produce fair and unbiased results. It also raised concerns about the use of AI in hiring and the potential for such systems to perpetuate existing biases in the workforce.

This example highlights the importance of using unbiased data to train AI systems, as well as the need for ongoing monitoring and auditing of such systems to ensure that they are not producing biased results.

How to Mitigate

Mitigating Bias Exploitation attacks against AI can be challenging, but here are some strategies that can help:

Use diverse and representative data: The first step in mitigating Bias Exploitation attacks against AI is to use diverse and representative data to train the system. This can help to reduce the biases in the training data and improve the accuracy and fairness of the AI system's output.
Regularly audit and update AI systems: AI systems should be regularly audited and updated to ensure that they are producing unbiased and accurate results. This can involve monitoring the system's output for bias and updating the algorithms and data used to train the system as needed.
Use multiple sources of data: AI systems should be trained on multiple sources of data to reduce the risk of bias. This can include data from different geographic regions, different time periods, and different demographic groups.
Include ethical considerations in the design process: Ethical considerations should be included in the design process for AI systems, with a focus on fairness, transparency, and accountability.
Educate users and stakeholders: Users and stakeholders should be educated about the potential for Bias Exploitation attacks against AI and how to identify and report such attacks.

Overall, mitigating Bias Exploitation attacks against AI requires a multi-faceted approach that involves careful attention to the design, development, and implementation of AI systems. It also requires ongoing monitoring and auditing to ensure that AI systems are producing unbiased and accurate results.

How to monitor/What to capture

To detect a Bias Exploitation attack against AI, several things should be monitored, including:

Input data: The input data provided to the AI system should be monitored to ensure that it is diverse and representative of the population being served. If an attacker is manipulating the input data to exploit biases in the AI system, this could be detected by monitoring the input data for unusual patterns.
Output data: The output data produced by the AI system should be monitored for bias and accuracy. If an attacker is exploiting biases in the AI system, this could be detected by monitoring the output data for patterns that are inconsistent with the expected results.
Training data: The training data used to develop the AI system should be monitored to ensure that it is diverse and representative of the population being served. If an attacker is manipulating the training data to exploit biases in the AI system, this could be detected by monitoring the training data for unusual patterns.
System logs: The system logs of the AI system should be monitored for unusual activity, such as a sudden increase in traffic or unexpected changes in the system's configuration. These could be signs of an attacker attempting to exploit the system.
User feedback: User feedback should be monitored to identify any patterns of bias or inaccuracies in the AI system's output. This could include feedback from users who believe they have been unfairly treated by the system or who have identified biases in the system's output.

Overall, monitoring the input and output data, training data, system logs, and user feedback can help to detect Bias Exploitation attacks against AI and enable organizations to take action to mitigate them.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]