Build a Data-Driven Culture with a Data Catalog

By on

Humans can be very particular about things. Some little kids simply won’t eat spinach. Adolescents might refuse to wear clothing that isn’t imprinted with cool brand logos. And adults may not like the cultural change that accompanies the effort to become a data-driven enterprise.

Recent research addresses this in the context of Big Data and AI. Seventy-seven percent of respondents said that “business adoption” of Big Data and AI initiatives continues to represent a challenge for their organizations. These senior executives believe that technology isn’t the problem—people are. Predominantly standing in the way are cultural challenges (95 percent); only 5 percent relate to technology obstacles. Yet, 71 percent of firms have yet to forge a data culture.

In commenting on the study, Stephanie McReynolds, VP of Marketing at data catalog vendor Alation, relates that there is little innovation in changing human attitudes. Companies make the technology investment but there’s no groundwork for a data culture to drive the business value of that technology. “They’re missing the glue to leveraging data culture to have a positive business impact,” she says.

Industries’ focus on where to store data and how to model it and make it available has been the first phase of data-driven transformation. Now it’s time to bring together what technology such as AI can do and how the population of knowledge workers approach data-driven decision-making.

Machine, Meet Human. Human, Meet Machine

AI won’t take over the world, so to speak, but it does have value in accelerating the process of data-driven human discovery. “Machines are really good at parsing data and looking for patterns,” McReynolds said, but they can’t innovate with those patterns. Rather, humans are able to use the pattern-matching that machines provide as the underpinning for innovation.

That’s an example of closing the division between technology and data culture. “Humans can find answers, and Machine Learning and data science prompt them to move in the right direction,” said McReynolds. “It’s collaboration versus a computer being so smart that it just gives an automated recommendation.”

Alation’s data catalog technology relates to that idea. Its collaborative data catalog technology features automated data inventory and proper data usage, and also provides guidelines and recommendations that change how data is consumed in the organization, she said. “There are real-time prompts to end users as they go in and explore data. Why companies want data catalogs at the enterprise level is that they want an application for driving cultural change.”

An example of this is the catalog’s Trust Check feature for verification of data sets that are trustworthy, and notification of those that are not. For example, eBay uses this feature to embed its policies in the data catalog, and as an analyst interacts with data, “those policies are triggered as recommendations that pop up. It might say that joining these two tables is a violation of privacy policy, describing why the analyst can’t run the query,” McReynolds explained.

Equally important is that it’s possible to take the features of the data catalog and expand its use to a broader audience. Taking again the case of its customer eBay, product managers who traditionally have not written queries and who don’t consider themselves expert data people can use Alation as a learning platform to help them be comfortable with getting their hands on data.

“Alation also suggests query snippets to new users that they can accept as positive recommendations. So now they are comfortable with querying their own data directly to solve business problems,” she said.

That’s another behavioral change mechanism to create a strong data culture.

A research paper for which McReynolds was a co-author, “Learning by Doing versus Learning by Viewing: An Empirical Study of Data Analyst Productivity on a Collaborative Platform at eBay,” discussed how a data catalog can help speed up the learning process and the productivity of individuals using data in an organization. The other authors of the paper were Yue Yin of Northwestern University; Itai Gurvich of Cornell Tech; Debora Sys of eBay; and Jan A. Van Mieghem of Northwestern University. The research proved a 20 percent increase on average in learning productivity.

Seeing such a productivity jump, an organization that may have started a small data catalog project, perhaps with 20 or so users, could easily become convinced to release the catalog to a larger audience of data specialists in the organization, McReynolds said.

“So, within six to nine months they can move from a small footprint to an entire department or more departments, and from there move forward to users who aren’t data specialists.”

The Role of Data Catalogs in Collaboration for Transformation

A data catalog shouldn’t be looked at just as a technical inventory of data assets but as a living catalog of all context around data plus an inventory of data plus the ability to make recommendations, according to McReynolds.

“That value starts changing the culture of an organization. That catalog becomes the heart of not just data culture but of allowing different groups to align behind a transformation project in the enterprise.”  

Boundaries must be taken into account when it comes to collaboration, of course, and a data catalog should ensure that access rights that were defined in a database or a file system or a BI tool where the data resides are mirrored in the data catalog. Individuals who may have different levels of access to data can then contribute to collaboration with the assurance that rules and policies are maintained. 

As an example of collaboration with the help of a data catalog, McReynolds pointed to one of its customers, Pfizer, using geographically dispersed teams composed of doctors, data scientists, and other experts in data and medical fields to work together and find a way to help with identifying a rare heart disease—one that affects under one percent of patients—so that affected people receive the appropriate treatment. 

“It’s a brand new innovation again, where multiple parties can communicate through a data catalog because they are all using the same terms and have the same leads,” she said. 

McReynolds noted that enterprise Data Governance teams appreciate that in addition to maintaining policies, its solution encourages collaboration by surfacing data requests to different audiences to expand their usage of data. “We have an active way to register requests but also to provide the Data Governance team with some analytics on data usage across the organization.” They can see what users use what type of data most to inform their own asset strategy. 

Through the rest of the year, expect Alation to add even more features to make it easier to curate data, including helping organize a Data Stewardship program and take a more agile approach to Data Governance.

Image used under license from Shutterstock.com

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept