Alation’s Data Catalog: Enterprise Level Data Curation Moves Forward

By on

Your employees need to be able to easily search for business data assets that are relevant to their specific needs; trust that what they are pointed to is highly useful and of high quality; and, then be able to use the information as necessary. Whether that’s creating a visualization or a model or anything else. The Enterprise Data Catalog is a particular application that allows for such requirements.

“People who actually use data every day – analysts or Data Scientists or whomever – need to be able to find the data assets relevant to their needs and be able to trust them,” says Aaron Kalb, the Head of Product at Alation. “They then need be able to use them in whatever application they’re employing, whether that’s creating a data product or a visualization or doing modeling.”

The Alation Data Catalog is designed to help them reach those goals, automatically indexing a business’ data by source and accumulating knowledge about it. At its launch in 2015, Alation’s technology foundation was based on inventorying data and enriching that inventory by cataloging human interaction and behavior around data usage, parsing and analyzing database and query logs.

“If you’re an end-user of the data, a consumer of the data, you want clues when there are a hundred different tables that you could get customer segment information from,” says Stephanie McReynolds, VP, Marketing. But which one is the most trustworthy and accurate for particular needs? “Which one applies to the use case and the series of questions that you’re asking? There’s probably a reason for all one hundred of those tables to exist and they’re not all going to be a fit for the answer that you’re trying to get at,” she says.

To help with that issue, after crawling existing data sources, Alation creates a graph of the relationships between business analysts and data assets that results in a centralized and user-accessible catalog of enterprise data that is available in response to search queries as tables, files, queries, articles or workbooks. “All of our customers have multiple sources and having a single place, single search box across all of those is a huge asset to them,” says Kalb.

These search query responses have leveraged social signals, or what Kalb calls behavioral input, to help sort the wheat from the chaff for a particular user’s purposes – in much the same way that Amazon delivers to browsing consumers purchase recommendations based on what others with similar buying needs have chosen. In addition to taking what it has learned in the consumer space about how to use Machine Learning algorithms to get as much of this work automated as possible, Alation leaves the last mile – which it defines as curations – to the humans.

“Curations are about where the humans can actually add their knowledge to what the machine has automated in the creation of this catalog,” says McReynolds.

“If you think about data, I find in the course of my work as an analyst if I am interpreting the data in a certain way, if I am manipulating the data in a certain way to get to a result, the details of the steps I went through to get to the end result are actually super useful for other people in the organization. If I can share that through software, through a platform that’s going to keep all that knowledge in one place as reference materials while people work, there’s extreme value on that and it can be made a very engaging and social experience – just like looking at projects on Pinterest and clicking a single button to show your network.”

Taking the Next Steps with the Self-Service Data Catalog

The Alation Connect technology in version 4.0, which was introduced this past fall, extended the solution’s capabilities of synchronizing metadata, sample data, and query logs from data storage systems – the Hive Metastore on Hadoop and databases from Teradata, IBM, Oracle, SqlServer, Redshift, Vertica, SAP Hana and Greenplum – to parse SQL syntax more deeply by adding connectivity for popular SQL query processing engines over Hadoop. This includes new connectivity to SparkSQL and Presto as well as others, the company has reported. In March, it also announced that Alation’s 4.6 version seamlessly catalogs MicroStrategy, joining Alation’s existing support for Tableau Business Intelligence and Analytics software.

As it advances its technology, Alation continues to build a name for itself in an increasingly important arena: Self-service usage and analysis of data by non-technical business users.  “There is the trend of an uptake in self-service usage and the demand for self-service access to data by a less technical audience” as businesses consider how they can truly become data-driven organizations, says McReynolds. “They know there’s competitive advantage in that.” She has written about Forrester research that shows that insights-driven firms are 69 percent more likely to report year-over-year revenue growth of 15 percent or more.

The big trick is to gain the benefit without falling prey to some dangers. Alation’s deep integration with tools like Tableau and now MicroStrategy, Kalb has written, provide visibility into the complete data pipeline: from storage through visualization. Added on top of its original capabilities, these additions should have a positive effect on increasing the democratization of data-driven decision-making while decreasing the risk of inaccurate analysis and misinterpretation.

Prepping for intelligent self-service processes sets organizations up for next-stage smarts.

“The process of ideating around data and having it be an open communication around all the aspects of data brings the entire organization up to another level of data literacy so that we can really find useful solutions rather than get stuck in our own little silo,” McReynolds says.


Photo Credit: Elnur/

Leave a Reply