Click to learn more about author Tejasvi Addagada.
Enterprises are modernizing their data platforms and associated tool-sets to serve the fast needs of data practitioners, including data scientists, data analysts, business intelligence and reporting analysts, and self-service-embracing business and technology personnel.
However, as the tool-stack in most organizations is getting modernized, so is the variety of metadata generated. As the volume of data is increasing every day, thereupon, the metadata associated with data is expanding, as is the need to manage it.
The first thought that strikes us when we look at a data landscape and hear about a catalog is, “It scans any database ranging from Relational to NoSQL or Graph and gives out useful information.”
- Modeled data-type
- Inferred data types
- Patterns of data
- Length with minimum and largest threshold
- Minimal and maximum values
- Other profiling characteristics of data like frequency of values and their distribution
What Is the Basic Benefit of Metadata Managed in Catalogs?
1. Increased availability of intelligence about data that brings out better context to insights
2. The reduced turnaround time to find answers during the analysis
3. Increased efficiency of subject matter experts in turning out information for impact analysis
4. Removes ambiguity in relationships among data in the landscape
5. Simplifies the views of data through meaning, identified redundancy, and relationships
The uses of metadata have developed multi-folded over the last year, much of this attributed to technological advancements and public policy changes. Most enterprises are using catalogs for several use cases, such as the ones listed below.
1. Data Discovery
- Associated with the doctrine of data-democratization
- Answers questions such as “Where does data exist physically?” in schemas as objects and instances as elements
- Searching for data in single or multiple application systems-of-records, systems of reference like lakes or warehouses
2. System Privacy Profiling
- Related to the convention of data protection and privacy management
- Identifying factors that are private to data subjects even though modeling names might not relate
- Helps to know risk categorization of applications, including their logs in SOC operations
3. Controlling Access to Data
- Analogous to the principle of data security
- Identifying data entitlements and handling them in a single repository
- Managing user groups, users, data access policies, owners who can grant/revoke access to data
4. Data Administration
- Connected with the aspects of managing data and governing it
- Curating and identifying processes related to data creation and processing
- People information like owners of data, business, process, and personnel stewarding data
- Finding commonality in ownership of data across an organization to manage context
- Associated with the principle of “interpreting data in the right sense”
- Definitions of what data means to a specific situation and person
- Collecting and finding singular and common contextual description based on the application of data in processes
- Corresponding to the principle of interoperability of data within a firm and beyond
- Means of usage, including reports, dashboards, artificial intelligence models
- Frequency of usage, vintage of artifacts using the specific data
7. Classifying Data
- For better management — correlated to principles of availability of data
- The pace of change of data and applying it — master, reference, transaction data
- Privacy classifications — private, sensitive, special category, behavioral data
- Labels — national identifiers, address, names, card-related data, health information
- Transformation classifications — native, derived, or transformed data
8. Canonical Management
- Logical groups, names, canonical modeling attribute names, other standard modeling names, class associations in BIAN, MISMO, etc.
9. Rules Operations
- Related to principles of interoperability and coverage
- An integral part of business metadata often ignored in operational metadata processes, orchestrated for operations
- Classifying rules better through business rules, policy enforcement rules, derivation and transformation rules, Data Quality rules, rule execution statistics
- Maintaining business rules is an excellent enabler in performing an impact analysis, data analysis, and need analysis
- Managing relationships between data gets better at finding the rules that data is a party to
10. Data Operations
- Extends the principle of data distribution management
- Assists with the understanding of data usage, derived/native characteristics, vintage, last used, pipelines, archival and destruction policies, partitions, jobs, schedules.
How Does Governance Enable Metadata?
Metadata Management also requires analysts to put information into a catalog at the right stage of the change. It can be done by including the right stakeholders, consistently, through the lifecycle of data. Data, as well, has a lifecycle, POSMAD (Plan, Obtain, Store/Share, Maintain, Apply, Decay), that helps bring out the lineage.
Even enriched agile management models like scrum, kanban, DAD, FDD can benefit from curating and using the institutional knowledge on data, in projects, for accelerated delivery of features. Data Governance can enable a balance of hosting and serving of metadata guaranteeing that metadata works for most use-cases.
- Where it comes from?
- Which processes it applies to?
- Who uses the business term?
- Which systems leverage the data element — storing, sharing, transforming, and decaying
As governance formalizes active management of metadata through a specific operating rhythm and processes, it becomes much easier to integrate it into project lifecycles planning for data changes or usage. A Data Governance function provides a leeway to put up guide-rails or guard-rails that help to assess, direct, and monitor the management of metadata to assist with the goals of managing data and metadata in an organization.
Moreover, managing metadata requires a standard framework that can channelize personnel to facilitate the capture of information associated with data.
Some Questions That Are Commonly Provoked While Managing Catalogs
1. Have you democratized the catalog for any personnel in the organization to put in information they know about data? Are there identified data stewards who can give direction in baselining information the organization can use?
2. Is metadata from sources fueling the management of schema-drifts in data lakes and warehouses? How often should metadata from sources be scanned or pushed into the catalog?
3. Are you looking at business terms used in vivid contexts with specialized names but capturing the synonyms or common names to ease the usage globally?
4. Is Metadata Management bridging the gap between regional, global operations, and IT? Business analysis is poised to enable this communication, but Metadata Management comes with enablers that can push this aspect at a sped-up pace.
5. Do you have a forum today where all the relevant stakeholders can bring a common understanding or enrichment of what they know of data before publishing?
6. How are you planning to help the business of self-service with the excellent information that has been captured in the metadata repository? Is the catalog too technical to accept?
7. Is compliance taking away your greatest effort with little direction to find out why you are doing this in the first place?
8. Does your meta-model take into consideration the various use-cases — considerations like how many data owners a business term can have, how many contributors and viewers, how we can define many systems of truths?
Business metadata does not generate by itself automatically and requires every responsible stakeholder to consistently contribute to definitions and other administrative information. Such metadata, if actively managed, enables organizations to better govern data while producing better efficiencies.