In today’s data-driven world, managing and leveraging data effectively is crucial for any organisation. Microsoft Purview Unified Data Catalog is designed to help businesses achieve comprehensive data governance, ensuring data is secure, compliant, and valuable.
In this blog, Enterprise Architect Team Lead, Paul Westlake, will guide you through understanding the Microsoft Purview Unified Data Catalog. He will cover its key components, including the data map, data products, data assets, data quality, and health management, providing insights on how to leverage these features for effective data governance and management.
How can Microsoft Purview Unified Data Catalog help businesses?
Microsoft Purview Unified Data Catalog is a centralised platform that integrates data governance, management, and discovery. It enhances data discovery with its data map feature and improves data quality for business value creation. This unified system simplifies data governance for data consumers, stewards, and owners.
It streamlines data access by organising data into data products offering comprehensive health management to ensure data remains secure and usable. Additionally, it supports federated governance, balancing centralised policy development with decentralised data management, ultimately making data a strategic asset for business growth.
Data map
The data map in Microsoft Purview is a foundational component that enables organisations to discover, classify, and understand their data landscape. It provides a visual representation of data assets across the enterprise, helping users to locate and manage data efficiently. The data map supports federated governance, allowing for centralised policy development while enabling self-service access and maintenance.
The Data Map supports a wide variety of data sources, ensuring comprehensive coverage of an organisation’s data estate. Here are some key types of data sources that can be included:
- Microsoft Fabric: Integrates seamlessly with Microsoft Fabric, allowing users to govern their entire data estate, including items such as Power BI reports, data pipelines, and data warehouses.
- Cloud storage: Includes services like Azure Blob Storage and Amazon S3 for storing large amounts of unstructured data.
- Databases: Covers relational databases such as Azure SQL database and Oracle, as well as NoSQL databases like Azure Cosmos DB and MongoDB.
- Big data and analytics: Supports big data solutions like Azure Data Lake Storage and Google BigQuery for large-scale data analytics.
- Data warehouses: Includes cloud-based data warehousing solutions like Azure Synapse Analytics and Snowflake.
- File systems: Encompasses file storage services such as Azure Files and HDFS for distributed file systems.
- Enterprise applications: Integrates with enterprise systems like SAP HANA and SAP Business Warehouse for comprehensive data management.
Â
By supporting a diverse range of data sources, Microsoft Purview Data Map ensures that you can comprehensively map your organisation’s data landscape, regardless of where the data resides. This holistic view enables better data governance, improved data quality, and more efficient data management.
Data products
Data products in the Unified Data Catalog are logical groupings of related data assets created for specific business purposes. These products provide context and practical use cases for data consumers, making it easier to find and utilise relevant data. By organising data into products, you can streamline data access and enhance the overall data experience for your users.
What a data product might include:
- Business glossary: A collection of business terms and definitions to ensure a common understanding of the data.
- Objectives and key results (OKRs): Trackable business objectives tied to the data product, such as increasing sales by 10% or reducing support cases by 3%. OKRs help align data products with business goals and measure their impact.
- Critical Data Elements (CDEs): Key data elements that are essential for decision-making and require high-quality governance. Examples include customer IDs or financial metrics. CDEs help focus governance efforts on the most impactful data
- Data assets: Individual components that make up the data product, such as tables, files, reports, and more. These assets are categorised and tagged to ensure they are easily discoverable and manageable, providing a structured approach to understanding and utilising the data.
Â
By including these components, a data product in the Unified Data Catalog becomes a comprehensive and valuable resource for data consumers, driving better decision-making and business outcomes.
Data quality
Ensuring data quality is essential for reliable decision-making and AI-driven insights. Microsoft Purview Unified Data Catalog offers robust data quality features that help organisations assess and improve the quality of their data assets. Here are some key concepts:
- No-code/low-code rules: Create data quality rules without extensive coding. These rules ensure consistent data quality across the organisation.
- AI-powered data profiling: Automatically analyse data to identify patterns and anomalies. AI recommendations can be refined by users for improved accuracy.
- Data quality lifecycle: Involves steps like registering data sources, running data profiling, and applying quality rules. Regular scans help maintain data quality over time.
- Data quality scans: Review data assets based on quality rules and produce scores. These scores help assess and improve data health.
- End-to-end visibility: Monitor data quality continuously within each governance domain. Identify and resolve issues promptly.
- Actionable insights: Measure, monitor, and enhance data quality to support reliable AI-driven insights and decision-making.
Â
By leveraging these data quality concepts, you can maintain high data standards, ensuring that your data is accurate, complete, and consistent. This supports better decision-making and more effective use of AI technologies.
Data health management
Microsoft Purview’s data health management features include health controls to monitor data quality, governance, and compliance metrics. It assesses data quality dimensions like accuracy and completeness through regular scans, tracks governance practices, and ensures regulatory compliance by monitoring sensitivity labels.
Detailed health management reports provide insights into data health, while end-to-end visibility allows continuous monitoring within each governance domain. These features offer actionable insights to identify issues and implement corrective measures, ensuring data remains accurate, secure, and usable.
Data lineage
Data lineage in Microsoft Purview Unified Data Catalog provides a detailed view of how data flows and transforms across an organisation’s data landscape. It captures the entire lifecycle of data, from its origin through various stages of processing and transformation to its final destination.
This feature is crucial for several reasons. Firstly, it enhances traceability by allowing you to track the flow of data, making it easier to understand where data originates, how it has been transformed, and where it is ultimately used. This is particularly useful for troubleshooting and root cause analysis in data pipelines.
Secondly, data lineage is essential for regulatory compliance, as it ensures that data handling practices meet legal and industry standards by providing a clear audit trail of data movement and transformations.
Additionally, it supports impact analysis by visualising how changes in one part of the data pipeline can affect downstream processes and reports, helping you to make informed decisions about data modifications.
Lastly, by tracking the lineage of data, you can identify and address data quality issues more effectively, providing insights into the sources of data anomalies and helping maintain high data standards.
Incorporating sensitivity labels into the data schema
Incorporating sensitivity labels into the data schema within Microsoft Purview Unified Data Catalog is a critical step in protecting sensitive information and ensuring data security. Sensitivity labels can be applied to various data assets such as files, database columns, and reports, indicating the sensitivity level of the data: confidential, highly confidential, or public.
Microsoft Purview supports automated labelling based on predefined rules and conditions, ensuring that sensitive data is consistently labelled across the organisation without manual intervention. The Data Map feature extends the use of sensitivity labels to assets stored in various data sources, including Azure, multi-cloud, and on-premises locations, maintaining a unified approach to data security.
Sensitivity labels also play a vital role in enforcing access control policies. For instance, data labelled as highly confidential can be restricted to specific users or groups, ensuring that only authorised personnel can access sensitive information.
Furthermore, sensitivity labels provide valuable insights for compliance reporting, allowing organisations to generate reports on how sensitive data is handled and ensuring that data protection practices align with regulatory requirements.
By incorporating sensitivity labels into the data schema, organisations can enhance data security, ensure compliance, and maintain control over sensitive information throughout its lifecycle.
Use case: onboarding a CRM data source
Imagine a company, Tech Services Inc uses a customer relationship management (CRM) system to manage customer interactions, sales data, and support tickets. The CRM system contains valuable information such as customer contact details, purchase history, and support interactions. To leverage this data for better decision-making and governance, Tech Services Inc decides to onboard their CRM system into Microsoft Purview.
By onboarding the CRM data source, Tech Services Inc can create a comprehensive Data Map that visualises all the data assets within the CRM system. This includes tables for customer information, sales transactions, and support tickets. The Data Map helps the company understand the structure and flow of data within the CRM, making it easier to manage and govern.
Creating a data product
Once the CRM data is onboarded, Tech Services Inc can create a data product to provide valuable insights to their sales and marketing teams. Let’s say they want to create a “customer insights” data product.
The “customer insights” data product would include various data assets from the CRM system, such as customer profiles, purchase history, and support interactions. By grouping these related data assets into a single data product, Tech Services Inc can provide a comprehensive view of customer behaviour and preferences.
This data product can be used by the sales team to identify high-value customers, track sales performance, and tailor marketing campaigns. The marketing team can use it to analyse customer segments, understand buying patterns, and improve customer engagement strategies.
Benefits of using Microsoft Purview data governance
By using Microsoft Purview to onboard the CRM data and create the “customer insights” data product, Tech Services Inc can achieve several benefits:
- Enhanced data discovery: The data map provides a visual representation of the CRM data, making it easier for users to discover and understand the data assets available to them.
- Improved data governance: With centralised data governance, Tech Services Inc can ensure that data policies are consistently applied across the CRM data, enhancing data security and compliance.
- Better decision-making: The “customer insights” data product provides valuable insights that help the sales and marketing teams make informed decisions, driving business growth.
- Streamlined data access: By organising data into a data product, users can easily access and utilise the relevant data assets without having to navigate through the entire CRM system.
Â
Ultimately, using Microsoft Purview for data governance and lifecycle management helps your business leverage data more effectively.
Deploying Microsoft Purview Unified Data Catalog
To ensure a successful deployment of Microsoft Purview Unified Data Catalog, it’s important to prepare your organisation thoroughly. Start by providing comprehensive training specifically for data owners and data stewards, who play crucial roles in managing and governing data.
It’s important to establish clear data governance policies and procedures aligned with your business objectives and regulatory requirements. Engage data owners and data stewards in this process to ensure that policies are practical and enforceable. Their involvement will help in maintaining data quality and compliance across the organisation.
Ensure your IT infrastructure can support the platform, including necessary integrations with existing systems such as SQL, Azure SQL, Microsoft Fabric and beyond. This will help achieve a seamless implementation and optimal performance. Data owners and stewards should be involved in validating these integrations to ensure that data flows smoothly and securely.
Here at Advania, we’re well versed in managing, governing and securing your data. We can help you take these steps to ensure you are well-prepared to harness the full potential of Microsoft Purview Unified Data Catalog and drive meaningful business outcomes.