Click to learn more about author Mike Brody.
Big Data keeps getting bigger. We’re generating so much information these days that we’re running out of sustainable places to put it and have begun tinkering with the idea of storing it in space.
Yes, outer space.
With this much information on our hands, it’s no wonder the enterprise Metadata Management market is projected to hit USD 9.34 billion by 2023. If there’s one thing my time in Business Intelligence has taught me, it’s that enterprises underestimate the importance of Data Governance in all its forms. Garbage in, garbage out: it’s cliché because it’s true.
Metadata Management, “the administration of data that describes other data,” helps organizations locate and track data assets as they are manipulated — but that doesn’t say much about the features to look for in a Metadata Management Solution (MMS). After you’ve done your cost-benefit analysis and decided to spring for an MMS, keep an eye out for these five capabilities. Some are essential and near-universal while others, like data enrichment, are nice-to-haves that might make a world of difference depending on your business needs.
1. Data Inventory
What data do we have, and where is it?
A data inventory lists your data assets and their locations — it’s crucial to any MMS offering. Your inventory should include information about the repository, the type of content it contains, and whether that content includes any personally identifiable information (PII).
Data inventories are sometimes referred to as “data maps” but should not be confused with data catalogs. April Reeve, Principal at Reeve Consulting, describes data catalogs as menus “from which a user selects and, if access approved, data is provisioned.” Data inventories, by contrast, list all data assets for a given organization. While any MMS should provide a data inventory, some may additionally make it possible for administrators to easily view individuals’ data catalogs as well. This is particularly useful when it comes to managing security around PII.
2. Data Lineage
Where has our data been, and what happened to it along the way?
Data lineage is all about trust and accountability. Heavily regulated industries such as healthcare and finance know the risks associated with losing track of information from a compliance standpoint, but there are also operational costs to consider. Something as simple as assuming a column of prices to be in US dollars (USD) can wreak havoc on a business if in fact the values were previously converted to Canadian dollars (CAD). NASA’s Mars Climate Orbiter was famously lost in 1999 due to a unit conversion error, costing the organization anywhere from $2M to $3.5M, depending on how you do the math. And these are just conversion errors! Database administrators know there’s a practically infinite number of ways for records to go off the rails.
Whether keeping an eye on your data’s Chain of Custody for HIPAA compliance or monitoring a series of ETL jobs, an MMS with data lineage tracking will make it easier to locate and troubleshoot data quality and compliance issues before they spiral out of control.
What data can we add to our data?
Sometimes you want the ability to add new metadata to your inventory, particularly as your business requirements change. The General Data Protection Regulation (GDPR) of 2018, for example, made European residency an important data point for companies around the world. All those either processing or storing EU residents’ personally identifiable information (PII) were, as of May 25th, required to track and manage that data per GDPR regulations or suffer heavy fines. An MMS with tagging capabilities makes it easier to make note of such characteristics, and some may even allow you to apply tags based on a pattern. In the GDPR example, the MMS might apply the GDPR tag to any records associated with addresses belonging to the EU and its territories, saving your management team a great deal of time and reducing the prevalence of human error.
4. Business Rules
What does our data mean to us?
Semantics help data stewards agree on what an organization’s data actually means, and there are two important components an MMA can help catalog: definitions and business rules. A term or entity must first be defined before it can be related to other entities using rules.
Referring again to the GDPR compliance example, we might ask what constitutes an “address” for the purposes of the tagging rule. Should the rule be based on mailing addresses? Geographic addresses? Permanent addresses? You might settle on a rule/definition set like:
- Address: Mailing address at the time records were received.
- EU Resident: Anyone with an Address in EU territory.
- Erasure: the deletion of all PII associated with a particular individual for a specified time period.
- All EU Residents have the right to erasure.
5. Data Connectivity
How can we import/export our metadata?
Connectivity underpins the rest of an MMS’s utility, as being able to load metadata from database management systems, data governance tools, ETL solutions, and files is critical. It’s equally important to be able to then pass that data to Business Intelligence and Analytics solutions for insight reporting. Look for the ability to connect import/export data via an API, as this often affords maximum power and flexibility.
Not every organization is going to invest in an MMS, but those that do shouldn’t settle for a solution that will prove inadequate a few years down the line. In addition to the cost of the solution, there’s the time and effort required to establish inventories, catalogs, definitions, and rules. There’s no guarantee you’ll be able to easily port those assets into a new MMS, so put in the initial work to find something that will grow with your organization and its Data Management strategy over time.