Must Learn AI Security Part 17: Social Engineering Attacks Against AI

Chapter 17

Sep 28, 2023

This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.

The full series index (including code, queries, and detections) is located here:

The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version

The book will be updated when each new part in this series is released.

What is a Social Engineering attack against AI?

A Social Engineering attack against AI refers to a situation in which an attacker manipulates or exploits the vulnerabilities of an artificial intelligence system by using psychological tactics and deception. The attacker usually aims to gain unauthorized access to sensitive data, manipulate the AI's behavior, or compromise its security measures.

Types of Social Engineering attacks

In the context of AI, social engineering attacks can include:

Input manipulation: Feeding false or misleading data to the AI system to compromise its decision-making capabilities or to make it behave in a way that benefits the attacker.
Reverse engineering: Analyzing the AI system to discover its underlying algorithms and use this knowledge to exploit its weaknesses.
Impersonation: Pretending to be an authorized user or system to gain access to the AI's data and resources.
Exploiting human vulnerabilities: Taking advantage of the human users or operators of an AI system, such as tricking them into revealing sensitive information or performing unauthorized actions.

How it works

A Social Engineering attack against AI works by exploiting the weaknesses in the AI system, its algorithms, or its human users. Attackers use psychological manipulation and deception techniques to achieve their goals. Here are some common steps in a social engineering attack against AI:

Research: The attacker gathers information about the target AI system and its users. This may involve studying the AI's capabilities, identifying potential vulnerabilities, and understanding the roles and responsibilities of the human operators.
Planning: The attacker devises a strategy to exploit the identified weaknesses. This plan may involve crafting a persuasive message, creating fake identities, or designing a false scenario to manipulate the AI system or its users.
Execution: The attacker carries out the planned attack, which may involve input manipulation, reverse engineering, impersonation, or exploiting human vulnerabilities. For example, the attacker might send a malicious email to a human user, pretending to be a trusted source, or feed the AI system with deceptive data that influences its decision-making.
Exploiting the results: Once the attacker has successfully manipulated the AI system or its users, they can gain unauthorized access to sensitive data, compromise the AI's behavior, or perform other malicious actions.
Covering tracks: In some cases, the attacker may attempt to erase any evidence of their actions or create a diversion to deflect suspicion from themselves.

Why it matters

A Social Engineering attack against AI can have several negative effects on the target system, its users, and the organization as a whole. Some of these effects include:

Compromised data: Unauthorized access to sensitive information, such as personal data, intellectual property, or trade secrets, can lead to data breaches, identity theft, or corporate espionage.
Manipulated behavior: The attacker may alter the AI's decision-making process or functionality to serve their objectives, leading to incorrect or harmful decisions that can affect the organization's operations and reputation.
Loss of trust: If users and stakeholders become aware of a successful social engineering attack against an AI system, they may lose trust in the system's reliability and security, which can have long-term consequences for the organization's reputation and customer relationships.
Financial losses: The costs associated with a successful attack can be significant, including potential legal fees, regulatory fines, and expenses related to recovering from the breach and implementing better security measures.
Disruption of operations: Depending on the nature of the attack, it may disrupt the normal functioning of the AI system, resulting in downtime or loss of productivity.
Human consequences: Social engineering attacks often target human users, and the psychological impact of being manipulated or deceived can lead to stress, guilt, or a sense of violation.

To minimize the negative effects of social engineering attacks, organizations should invest in robust security measures, ongoing monitoring of AI systems, user education and training, and incident response planning.

Why it might happen

An attacker may gain several benefits from a successful Social Engineering attack against AI, depending on their objectives and the nature of the target system. Some potential gains include:

Access to sensitive data: The attacker might obtain valuable information such as personal data, intellectual property, trade secrets, or financial records, which can be used for identity theft, corporate espionage, or financial gain.
Control over AI behavior: By manipulating the AI's decision-making process or functionality, the attacker can make the system serve their purposes, potentially causing harm to the target organization or its users.
Bypassing security measures: Social engineering attacks can help the attacker circumvent traditional security measures, such as firewalls and encryption, by exploiting human vulnerabilities or weaknesses in the AI system itself.
Disruption of operations: The attacker may aim to disrupt the target organization's operations by causing downtime or loss of productivity, either as an act of sabotage or to divert attention from another malicious activity.
Damage to reputation: A successful attack can damage the reputation of the target organization and erode trust in its AI systems, potentially leading to loss of customers, partners, or investors.
Financial gain: In some cases, the attacker may directly profit from the attack, such as by selling stolen data, demanding a ransom, or using the compromised AI system to carry out fraudulent activities.
Competitive advantage: The attacker, often a competitor, may use the information or control gained from the attack to gain a competitive edge in the market.

To protect against these potential gains for attackers, organizations should implement strong security measures, monitor AI system activities, educate users about potential risks, and have a robust incident response plan in place.

Real-world Example

One example of a social engineering attack against AI involves the manipulation of natural language processing (NLP) systems. In this case, the target AI is not compromised itself, but its output is influenced by the attacker's input, often referred to as "adversarial examples" or "adversarial attacks."

In 2020, researchers from the University of Maryland and the University of Texas at Dallas demonstrated a method to deceive popular NLP models such as BERT, RoBERTa, and XLNet. They used a technique called "TextFooler" to create adversarial examples by making minor changes to the input text while preserving its meaning. These modified inputs were able to mislead the AI models into producing incorrect predictions or classifications.

For instance, the researchers changed a sentence from "The characters, cast in impossibly contrived situations, are totally estranged from reality" to "The characters, cast in impossibly engineered circumstances, are fully estranged from reality." Although the meaning of the sentence remained the same, the AI model's sentiment analysis changed from negative to positive.

While this research was conducted in a controlled academic setting, it demonstrates the potential for attackers to manipulate AI systems in real-world applications, such as sentiment analysis, content moderation, or recommendation engines. A malicious actor could exploit such vulnerabilities to spread misinformation, generate fake reviews, or manipulate public opinion.

How to Mitigate

Mitigating social engineering attacks against AI involves a combination of technical and human-centric approaches to strengthen the security of AI systems and protect them from manipulation and exploitation. Some strategies to mitigate these attacks include:

Robust AI design: Develop AI systems with security and robustness in mind. Employ techniques such as adversarial training, which involves training the AI model on manipulated inputs to make it more resistant to adversarial attacks.
Input validation: Implement input validation and filtering mechanisms to detect and block malicious or suspicious inputs that may attempt to manipulate the AI system.
Continuous monitoring: Monitor the AI system's performance and user behavior to identify anomalies, potential attacks, or unauthorized access.
Secure authentication: Use strong authentication methods and access controls to prevent unauthorized access to the AI system and its data.
User education and training: Educate and train human users about social engineering tactics, the potential risks, and best practices to avoid falling victim to such attacks.
Regular security audits: Conduct regular security audits and vulnerability assessments to identify and address potential weaknesses in the AI system or its associated infrastructure.
Incident response planning: Develop a comprehensive incident response plan to detect, contain, and recover from social engineering attacks against AI systems.
Encourage responsible AI research: Support research into AI security and robustness, and collaborate with the research community to develop new defenses and mitigation strategies.

By implementing these strategies, organizations can reduce the risks associated with social engineering attacks against AI and ensure the security and reliability of their AI systems.

How to monitor/What to capture

Detecting a social engineering attack against AI requires monitoring various aspects of both the AI system and the behavior of its human users. Some key elements to monitor include:

AI system performance: Keep track of the AI system's performance metrics, such as accuracy, response time, and error rates. Significant deviations from the expected performance may indicate manipulation or an ongoing attack.
Input data: Monitor the data fed into the AI system to identify anomalies, unexpected patterns, or signs of tampering. This may help uncover attempts to manipulate the AI system through adversarial inputs or deceptive data.
User behavior: Track the actions of human users interacting with the AI system, such as login attempts, access to sensitive data, and configuration changes. Unusual behavior or access patterns may suggest a compromised user account or an attacker attempting to exploit the AI system.
Network traffic: Analyze network traffic to and from the AI system for signs of unauthorized access, data exfiltration, or other malicious activities.
System logs: Review logs generated by the AI system, its supporting infrastructure, and related applications to identify suspicious events or patterns that may indicate an attack.
Communication channels: Monitor communication channels, such as email, chat, and social media, for phishing attempts, social engineering tactics, or other indicators of an attack targeting the AI system's human users.
Sentiment and context analysis: Use AI-based techniques to analyze the content and context of conversations, messages, or user interactions to identify potential social engineering attempts.
Incident reporting: Encourage users to report any suspicious activities, unusual requests, or potential social engineering attacks they encounter while interacting with the AI system.

By monitoring these aspects and employing advanced analytics and machine learning techniques, organizations can improve their ability to detect social engineering attacks against AI and respond quickly to mitigate potential risks.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]