Metadata exists at the nexus point of Data Governance in that it facilitates management of data sources, role-based access to valued data assets, and sustainable data integration. Metadata Management is considerably enhanced by Metadata cataloguing, which enables organizations to understand all of their data and their relationships to business processes regardless of sources, vendor platforms, or location.
“Fundamentally, if you cannot govern, measure or manage something, then you can’t optimize it,” remarked Nidhi Aggarwal, Tamr’s Global Head of Strategy, Operations and Marketing. “It can’t be done top down, though. That’s our philosophy. People have thought that governance can be done top down; it has to be done bottom up.”
Overcoming Dark Data
The unified cataloguing of Metadata in a centralized manner is critical to actually utilizing one’s data assets and overcoming the Dark Data phenomenon in which organizations only leverage a finite amount of their entire data due to common data limitations such as:
- Meaning
- Access
- Relation to business processes
Moreover, tools such as Tamr’s Metadata Catalogue (which is part of its larger Tamr Data Unification Platform, yet was recently made available via free download) provide a means to catalogue enterprise Metadata regardless of different distinctions. Aggarwal mentioned that, “When we talk to our customers, some of them have 25,000 databases; some of them 90,000 databases. There’s not a single place where we can actually list all of the Metadata off of those databases.” Metadata cataloguing mechanisms, however, can provide comprehensive overviews of that Metadata and source owners, tags, profiling metrics, registers, filters, and more so that business users (not just IT) can determine how that data can inform their jobs.
Machine Learning: Mapping and More
Furthermore, effective Metadata cataloguing is instrumental to mapping different data sources to relevant applications that enable data to assist business processes. Once users define which domains (whether customers, partners, products, suppliers, etc.) certain Metadata is relevant to, Machine Learning algorithms can be used to augment that process and expedite it—which further provides an enhanced end user experience. The more data these Machine Learning technologies encounter, the better they become at mapping them to relevant domains, uses, applications, and more. According to Aggarwal:
“What you have at the end, then, is not just a list of your data sources. You have your data sources categorized by what are the real world entities they create, who are their source owners, and what data sources are most often used. It becomes a valuable tool to find information about how your company’s using the data, who’s using it, for what purpose, what questions are being asked of that data, and that way it becomes a management and optimization tool.”
In addition to automating mapping processes, Machine Learning technologies can also perceive similarities between different sources and data types and also suggest any variety of types of action to take with it, including transformation processes.
Data Quality, Role-Based Access
It is also possible for cataloguing tools to detect facets of Data Quality and to group data according to quality standards. Nonetheless, the foundation of the various attributes of data that cataloguing provides is unequivocally Metadata, which is used to point back to the data sources and to distinguish them accordingly. Moreover, cataloguing enables users to actually see the Metadata, which is critical in the facilitation of role-based access to various data sources. “That’s actually important from a governance standpoint because you do not want the data available to anybody,” Aggrwal said.
Instead, cataloguing platforms can present Metadata in a visual way which organizations can use to determine who can access data, which themselves are governed “by more checks and balances,” according to Aggarwal. Cataloguing even provides a way to determine which Metadata is transparent and which is not for Data Governance purposes, since in some instances even Metadata can provide valuable information that requires restriction according to governance policies.
Visualizations, Self-Service
One of the pivotal aspects of competitive cataloguing solutions is that they come with an array of visualizations that enable end users to look at the Metadata and the data sources that they point back to. Tamr Catalogue relies on Tree Map (which Aggarwal stated was “invented for visualizing large sets of data in a small scale”) to both visualize numerous data attributes and their sources. Attributes can include factors about Data Quality, their uniqueness, source owners, and other relevant facets of the data’s use. For example, such tools can enable a CIO to determine which data sources pertain to a specific customer and how to integrate them based on their Data Quality and other relevant attributes.
Most importantly, this information is visualized in such a manner that the end user can absorb most of it in a single look on a single screen in which the different data elements are stratified according to color, shape, and other visual representations. The result is that cataloguing and such visualization help to further the self-service movement within Data Management. “You can actually switch from having IT and the coding people be the only people who can interact with data, to the business people who do not know coding and do not want to know coding, but want to ask simple questions,” Aggarwal noted.
Vendor Lock-In
Metadata cataloguing takes on particular eminence when organizations are dealing with a multitude of platforms, tools, and their vendors. Aggarwal observed that it is not uncommon for enterprises to face situations in which, because they are utilizing products from certain vendors, they encounter difficulty integrating that data outside of the vendors’ solution. Furthermore, the situation in which an organization’s Metadata is combined with a certain vendor’s solution so that the former’s Metadata now contains the latter’s proprietary Metadata can also limit the ability to view the organization’s entire catalogue of Metadata. According to Aggarwal:
“People have all of this data, and vendors promise you a catalogue as long as you buy all of your Data Management databases from them. By their own admission, each of [these vendors] has 10 to 12 percent of an organization’s data, and now you can’t move the Metadata between these vendors’ solutions. Overall, that is one of the main reasons why enterprises today do not have a catalogue.”
Data Lakes and More
The applicability of Metadata cataloguing transcends a variety of enterprise needs. The listing of apropos elements of both Metadata and the sources it points to is valuable for the purposes of integration and the maintenance of disparate databases. It is also useful for situations in which enterprises tend to utilize a single repository, such as Hadoop, for the majority of its data access and integration needs. In the latter cases of data lakes, Metadata cataloguing can enable a critical layer of governance which can help organizations maintain different elements of privacy and security.
Regardless of which particular application or use case an organization chooses to leverage Metadata cataloguing for, it provides a critical means of control in a Data Management landscape that is seemingly becoming more complicated each day. Aggarwal revealed: “Our mission is to enable people to be data driven: to manage their data more effectively, to be able to use analytics more effectively. And that has to be scalable.”