This post is part of an ongoing series to educate about new and known security vulnerabilities against AI.
The full series index (including code, queries, and detections) is located here:
https://aka.ms/MustLearnAISecurity
The book version (pdf) of this series is located here: https://github.com/rod-trent/OpenAISecurity/tree/main/Must_Learn/Book_Version
The book will be updated when each new part in this series is released.
Periodically, throughout the Must Learn AI Security series, there will be a need to envelop previous chapters and prepare for upcoming chapters. These Compendiums serve as juncture points for the series, even though they might function well as standalone articles. So, welcome! This post serves as one of those compendiums. It’ll all make much more sense as the series progresses.
An on-prem LLM is a large language model that is run on the organization’s own machines using non-public data. A large language model is a type of artificial intelligence system that can generate natural language texts based on a given input or prompt. An on-prem LLM can have several advantages, such as:
Privacy: An on-prem LLM can protect the data and the texts from unauthorized access, modification, or leakage, as they are not exposed to the internet or third-party services.
Security: An on-prem LLM can prevent or mitigate cyberattacks, such as data breaches, malicious injections, or supply chain attacks, as they are not dependent on external components or services.
Performance: An on-prem LLM can optimize and improve the efficiency, accuracy, and scalability of the language generation, as they can leverage the organization’s own hardware and software resources.
An on-prem LLM can be used for various applications, such as:
Text summarization: An on-prem LLM can generate concise and informative summaries of long or complex texts, such as documents, reports, or articles.
Text generation: An on-prem LLM can generate original and creative texts, such as poems, stories, code, essays, songs, or parodies, based on a given topic or prompt.
Text completion: An on-prem LLM can complete or extend a given text, such as a sentence, a paragraph, or a document, by adding relevant and coherent words or sentences.
Text analysis: An on-prem LLM can analyze and extract useful information from a given text, such as names, dates, facts, opinions, or sentiments.
Text translation: An on-prem LLM can translate a given text from one language to another, while preserving the meaning and the style of the original text.
Text conversation: An on-prem LLM can engage in natural and interactive conversations with users, such as chatbots, voice assistants, or virtual agents.
To secure an on-prem LLM, you need to follow some best practices, such as:
Use secure development practices: You need to ensure the quality and integrity of the code that powers the LLM, and avoid any errors, bugs, or vulnerabilities that could compromise its functionality or security. You also need to ensure the transparency, explainability, and accountability of the LLM, and adhere to the ethical principles and standards of your organization and industry. You can use code reviews, testing, debugging, documentation, code analysis, code obfuscation, and encryption methods to achieve this.
Use secure data practices: You need to protect and preserve the data that is used to train, test, and run the LLM, and ensure that it is authentic, reliable, and relevant. You also need to ensure the confidentiality and privacy of the data, and prevent any unauthorized access, modification, or leakage. You can use encryption, hashing, tokenization, backup, recovery, data cleansing, normalization, transformation, anonymization, and pseudonymization methods to achieve this.
Use secure access practices: You need to control and manage the access rights and permissions of the users and entities that interact with the LLM and ensure that only authorized parties can access and use the LLM. You also need to prevent or mitigate any exploitation or abuse of the LLM by malicious actors. You can use authentication, authorization, role-based access control, attribute-based access control, communication, interaction, auditing, logging, monitoring, and reporting methods to achieve this.
[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]
[Join the Microsoft Security Copilot community: https://aka.ms/SCPCommmunity]