“Metadata is hotter than ever,” said Donna Burbank, Managing Director at Global Data Strategy. “And there’s data to back up that assertion.” Speaking at DATAVERSITY® Database Now Online 2017 Conference, Burbank was referring the survey findings of the research report Emerging Trends in Metadata Management. 80 percent of survey respondents said that Metadata is as important – or more important – than in the past. Although not surprising, Burbank said, “It’s nice to have documentation that this actually is a growing trend.” She also remarked that at least one participant asks about Metadata in every webinar or conference presentation that she does.
The simplest definition of Metadata is ‘data in context,’ she said. “It’s both the business and technical context around data.” Another way to look at it is the “who, what, why, where, and how” of data. Understanding the meaning of each term, each metric, each field in a database is important. Data can get mixed, transformed, matched, moved,
“But at some point, there is a human element to what this data means. Ultimately, the real reason we’re doing this is the business value, and what that means for organizations, regardless of technology.”
Now that more business users understand the role of good data in business success, not having Metadata, she said, just isn’t an option. “It’s like not having a data trail for your finance department. You get audited, so you have to.” Burbank’s experience with business users is that they are sometimes surprised when IT proposes a project to setup a Metadata Management system. She said that on one project,
“The finance users looked at us in shock, and said, ‘You mean you don’t do this already? We couldn’t get away with saying, I’m not sure where the money comes from. I just store it bags in the back room.’ Without Metadata, an organization is at risk for making decisions based on the wrong data,” she said.
Seemingly minor mistakes caused by poorly managed Metadata can be catastrophic. Poor Metadata Management can cause serious issues with order fulfillment for something as simple as not being able to deliver product to your customers because the address is wrong. Often some of the most critical Metadata lies with one individual, who ‘just knows’ how things are done, and without capturing this institutional memory in a Business Glossary, Metadata Repository or Data Model, it can’t be shared, so it is lost when the one person who ‘knows’ is no longer available. “The better our Data Quality, and the better our Metadata is – that is where you start to get business value,” she said.
The Loss of the Mars Orbiter: A Metadata Problem
In 1999, NASA lost a $125 million Mars orbiter spacecraft, due to “pure Metadata issues,” because one team used English units while another used metric units, she said. “And because they didn’t have that Metadata documented, the system actually went off course. That’s a pretty big price tag for a very simple effort.”
NASA’s reputation was also affected, and opportunities for the orbiter’s intended research were lost. “I’m sure there was somebody that said, ‘I just know. We always use metric,’ or, ‘We always send it in English units.’” Metadata Management documents the knowledge that people ‘just know,” so that everyone is on the same page.
“If we could have better usage of our data, think of all the insights we could have instead of just trying to manage it,” she said.
Trends in Use Cases & Metadata Types
The Emerging Trends survey found that business leaders expect decreasing opportunities for Data Warehousing, BI reporting, and software development to give way to Data Science and Big Data Analytics in the future, with opportunities for Master Data Management, Data Quality improvement, and Data Governance to remain constant.
Even with predictions for fewer Data Warehouses in the future, “They still exist, and they’re still a big part of the business,” she said. One interesting discovery she noted: the top five current Metadata use cases ranged from 30 percent (for software development) to 60 percent (for Data Governance), yet the predictions for future use cases ranged between 32 percent and 42 percent, showing more equity across a variety of different sources. “There are a lot of equally important things, which makes our job a little more complicated,” she said.
Metadata sources respondents expected to remain constant are Business Glossaries and Data Warehouses, with increasing reliance on Big Data platforms, Data Quality tools, and ETL tools, and decreasing reliance on BI tools, Data Models, and relational database sources.
Metadata & Enterprise Data Management
“The reason you manage Metadata is to get business insights,” she said, so when Burbank works with clients, she starts with their business strategy. “What are we looking to do and how can Metadata support that? How can data support that?” Organizations are implementing enterprise-wide Data Management initiatives like Data Governance and Master Data Management, she said, “And to really get that right, you need not only the technical stuff, but the business meaning around it as well.”
Creating a multi-focal Data Strategy is required to manage inter-related data sources, processes, and goals across the organization, she said. “If I’m trying to get that 360 view of ‘customer,’ and ‘customer’ is in 17 different systems in different formats, that’s when Metadata comes into play.” A successful Data Strategy serves as a way to align business strategy with governance, and clarifies how people, policies, and culture around data should be managed in line with the goals of the business.
Strategic management and coordination of disparate data sources includes leveraging tools for MDM, BI, Analytics, Data Quality, and Data Modeling. “Metadata could be the circle around all of this, because it really links technology with the business, which is one of the values of it,” she said.
Types and Sources of Metadata
“Metadata isn’t just for relational databases anymore,” and although relational databases are still a big part of business, they are being augmented by a variety of other tools and systems.
“What do we mean by a database anymore? It’s not like everything now is in a structured format. Is it a bunch of media files I have out in an AWS bucket? Is it my Data Lake? Is it my Data Warehouse? Is it all of them?”
Burbank provided what she called a “whirlwind tour” of 19 sources and types of Metadata, from relational databases, ETL tools and Data Models to COBOL copybooks, images and application code.
Image Credit: Global Data Strategy
She said that management of data stored in a relational database was challenging enough, but the scale and the volume has grown along with the types of data and the ways to manage it, “And they’re evolving so there’s no ‘one size fits all’ answer.” A centralized Metadata Repository is a common way to manage Metadata, becoming, “that single view of the truth.” but that’s not the only option – there are many new ways to manage it, she said.
Getting full lineage of how data was created, what it means, and how it’s been transformed can be a challenge, but there are a lot of new tools that can help with that, she said. Many tools have their own Metadata: Data Modeling tools, Business Glossaries, BI tools, etc., and some have built-in Data Quality statistics and reporting tools. There are Metadata exchanges and registries as well.
“If I’m sharing information with other organizations and I want to have a common JSON, or XML schema, I’m going to have that through a common Metadata Registry,” she said.
Metadata Discovery, Lineage and Matching
Burbank shared some of the technical innovations and best practices for managing Metadata, including:
- Creating matching rules such as ‘database columns are the same if they have the same name and data type.’ “This can get a little complicated, but it also has a lot of value, because once you get this right, you do have full lineage, and you can avoid some redundancies.”
- Using AI for recognition of patterns in data values, such as NNN-NN-NNNN is likely to be a Social Security number. “The tools to do this have gotten pretty robust.”
- Signal and image processing can be used to identify byte level patterns, such as a bank check or a service contract, for documents that don’t necessarily have structured metadata like a database.
- The use of keywords for human-defined tagging of information, similar to how photos can be tagged on social media. “Tools like AWS can do tagging as well.”
Metadata: More Important Than Ever
Interest in Metadata is high as types and sources of data are changing, but Business Glossaries and Data Warehouses remain constants. “There are some of these new use cases that we just need new tools for, because that concept of ‘database’ is evolving.” Metadata’s superpower is that it links technology to the business. “Some folks might say that because we have these new technologies, we don’t need it. I say, because we have these technologies, we need it even more.”
Check out Database Now! Online at http://databasenow.com/
Here is the video of the Database Now! Online 2017 Presentation: