Using Microsoft Purview for Data Classification and Labeling to Secure Generative AI

Cats and Labels

Nov 20, 2023

Generative AI is a branch of artificial intelligence that can create new and original content, such as text, images, audio, or video, based on a given input or prompt. Generative AI has many potential applications and benefits, such as enhancing creativity, productivity, and customer experience. However, Generative AI also poses significant security risks and challenges, such as data breaches, malicious attacks, ethical issues, and regulatory compliance.

Therefore, it is essential to ensure that the data that is used to train, test, and run Generative AI systems is secure, trustworthy, and resilient. One of the key steps to achieve this is to use data classification and labeling, which are methods of organizing and classifying data based on its attributes, characteristics, and sensitivity. Data classification and labeling can help to:

Improve data quality and relevance: Data classification and labeling can help to ensure that the data is authentic, reliable, and relevant, and that it does not contain any errors, noise, or bias that could affect the Generative AI system’s performance or security. Data classification and labeling can also help to identify and remove any redundant, obsolete, or trivial data that is not needed or useful for the Generative AI system.
Protect data privacy and confidentiality: Data classification and labeling can help to protect the data’s privacy and confidentiality, and prevent any unauthorized access, modification, or leakage of the data. Data classification and labeling can also help to comply with the data protection and governance policies and regulations of the organization and the industry, such as the General Data Protection Regulation (GDPR) or the California Consumer Privacy Act (CCPA).
Manage data access and usage: Data classification and labeling can help to control and manage the access rights and permissions of the users and entities that interact with the data and the Generative AI system. Data classification and labeling can also help to monitor and audit the data’s usage and activities and detect and prevent any misuse or abuse of the data and the Generative AI system.

One of the tools that can help to implement data classification and labeling effectively and efficiently is Microsoft Purview, which is a unified data governance service that provides a comprehensive view of the data estate across hybrid and multi-cloud environments. Microsoft Purview can help to automate and manage the metadata, classification, and labeling of the data that is used for Generative AI, and provide insights and intelligence to improve the data’s security and reliability.

How to Use Microsoft Purview for Data Classification and Labeling

To use Microsoft Purview for data classification and labeling, the following steps are recommended:

Register and scan data sources: The first step is to register and scan the data sources that contain the data that is used for Generative AI, such as databases, storage accounts, or data lakes. Microsoft Purview supports various data sources, such as Azure SQL Database, Azure Data Lake Storage, Azure Cosmos DB, Azure Synapse Analytics, and more. Registering and scanning data sources can help to discover and catalog the data assets, and extract and store the metadata, such as the data schema, structure, and properties.

See: Understand data classification in the Microsoft Purview governance portal | Microsoft Learn

Classify and label data assets: The next step is to classify and label the data assets based on their attributes, characteristics, and sensitivity. Microsoft Purview provides more than 200 built-in system classifications, such as Credit Card Number, Social Security Number, or Email Address, and the ability to create custom classifications, such as Employee ID, Product Name, or Customer Feedback. Classifying and labeling data assets can help to categorize and organize the data and apply consistent and standardized tags or classes to the data.

See: How to use the Microsoft data classification dashboard | Microsoft Learn

Apply sensitivity labels: The next step is to apply sensitivity labels to the data assets based on their level of confidentiality and protection. Microsoft Purview supports the integration with Microsoft Information Protection, which provides a set of predefined sensitivity labels, such as Confidential, Highly Confidential, or Public, and the ability to create custom sensitivity labels, such as Internal, Restricted, or Secret. Applying sensitivity labels can help to protect and encrypt the data and enforce the appropriate access and usage policies and rules to the data.

See: Unified Data Governance with Microsoft Purview | Microsoft Azure

Search and browse data assets: The next step is to search and browse the data assets that are categorized and labeled in Microsoft Purview. Microsoft Purview provides a user-friendly and intuitive interface, called the Microsoft Purview governance portal, that allows users to search and browse the data assets by using keywords, filters, facets, or classifications. Searching and browsing data assets can help to find and access the relevant and useful data for Generative AI, and understand the data’s context, lineage, and quality.
Monitor and report data assets: The final step is to monitor and report the data assets that are categorized and labeled in Microsoft Purview. Microsoft Purview provides various tools and features, such as the data classification dashboard, the content explorer, the activity explorer, and the data insights, that allow users to monitor and report the status and performance of the data assets, and the activities and events that occur in the data estate. Monitoring and reporting data assets can help to oversee and communicate the data’s security and reliability and identify and address any issues or risks.

Generative AI is a powerful and promising technology that can create new and original content, but it also requires a high level of security and reliability for the data that is used to train, test, and run it. Data classification and labeling are essential methods to ensure the data’s security and reliability, and Microsoft Purview is a useful tool to implement data classification and labeling effectively and efficiently. By using Microsoft Purview for data classification and labeling, users can improve the data’s quality and relevance, protect the data’s privacy and confidentiality, manage the data’s access and usage, and gain insights and intelligence to enhance the data’s security and reliability.

[Want to discuss this further? Hit me up on Twitter or LinkedIn]
[Subscribe to the RSS feed for this blog]
[Subscribe to the Weekly Microsoft Sentinel Newsletter]
[Subscribe to the Weekly Microsoft Defender Newsletter]
[Subscribe to the Weekly Azure OpenAI Newsletter]
[Learn KQL with the Must Learn KQL series and book]
[Learn AI Security with the Must Learn AI Security series and book]
[Join the Microsoft Security Copilot community: https://aka.ms/SCPCommmunity]

Using Microsoft Purview for Data Classification and Labeling to Secure Generative AI

Cats and Labels

How to Use Microsoft Purview for Data Classification and Labeling

Discussion about this post