The movement towards self-service data access, integration, and analytics is at the core of the growing trend of empowering business users and utilizing data-centric processes to influence business and operations across vertical industries.
Subsequently, there are more tools, databases, and data sources for analysts or business users to examine to purportedly influence decision making and help achieve organizational objectives.
Although there are a number of options that utilize Natural Language Processing and Cognitive Computing to assist with these goals, there are still commonly found scenarios in which end users are required to account for aspects of:
- Writing code
- Knowing the underlying data source of data sets
- Prioritizing data and their sources to verify quality, timeliness, and relevance of data
These tasks can be formidable for the most experienced data-savvy business user, while actually training individuals in the lengthy documentation and disparate platforms involved in such a process—particularly for larger organizations—is potentially costly and time consuming.
However, a careful amalgamation of some of the most salient Data Management technologies today including aspects of Machine Learning, Data Discovery, and Data Governance can help to fuel the self-service process in a well-governed fashion that ultimately makes data much more viable to the end user and his or her job.
According to Satyen Sangani, CEO of Alation (which offers a data access platform by the same name):
“To do self-service appropriately, it’s not about opening up a gate and allowing people to do anything they want with the data… To really do self-service well, you’ve got to arm people not just with the data but with the knowledge of how to use the data.”
Contextual Machine Learning
One of the most relevant aspects of integrating disparate data sources or even of simply leveraging different data from a lone database is understanding those data’s context. Such context should be considered in relation to business objectives, other data, and previous decisions that were made with or influenced by that data. In this respect, the predictive analytics of Machine Learning are particularly useful for determining those sorts of relationships that help to provide context for data. Access platforms such as Alation—which provide a central point for determining germane data for particular queries—utilize Machine Learning algorithms to catalogue an organization’s data, and determine which are most beneficial for a particular use case.
“When you consume data, you have a set of questions like: who used this data before, what’s it connected to, where does it come from, how high quality is it, what does the data actually look like, what are the queries people have written in the past, [and] who are the experts on it,” Sangani observed.
CHECK OUT OUR NEW PODCAST
Tune in weekly to hear different data experts discuss how they built their careers and share tips and tricks for those looking to follow in their footsteps.
Machine Learning can help provide those answers and expedite them through a process in which there is a profound analysis of the systems that store the data, which also include “crawling” through previous query logs and past usages of data. As with all Machine Learning processes, initial results inform later ones with a snowball effect in which, with each use and with each additional context of a data set, those algorithms are able to determine additional relevance even faster. Contextual results are ‘published’ in what Sangani called a “Wikipedia-like page about the data” to understand its previous use and how that can inform present or future uses.
Data Discovery—Understanding Underlying Data Sources
End users frequently encounter situations in which they are unsure as to what sources of data to issue queries on or even how to best determine differences between data sets. Access platforms and certain integration hub solutions are able to facilitate those answers expediently by utilizing a form of search capabilities that are found in any assortment of Data Discovery options. As such, these options are able to increase the utility of conventional self-service Business Intelligence tools because the former enables the latter to determine where they should direct their queries which, in situations in which there are multiple databases or sources, are no easy feat. According to Sangani:
“In many cases people don’t know which table to use because there are maybe 20 tables that actually look exactly the same and are in different databases. They don’t know how to use the tables because there are all these rules that nobody has told them about and are documented in totally unrelated places from the actual database.”
However, the aforementioned search capabilities in concert with the Machine Learning process that actually plies through an organization’s entire range of databases and catalogues its data sets is able to aggregate those processes and quickly issue results. Analyzing those results frequently requires any assortment of self-service BI options.
The self-service facet of data access platforms and certain hubs of integration are reinforced by their ability to simplify the seemingly esoteric code writing process and make it as easy to do as simply issuing a search on Web-based search engines. In addition to requiring knowledge of code (such as SQL in many cases), issuing queries requires knowledge of the underlying database that contains the relevant data. A platform such as Alation is able to significantly accelerate the query-writing process for laymen by not only providing the database that the query can run against, but also by offering an explanation for why such a database is the best choice (the latter incorporates aspects of Machine Learning so that explanations for query results are becoming much more common with advanced analytics and self-service option. It also assists with writing the query by transforming the natural language in which business users think and communicate with into code. As Sangani noted:
“The analogy is sort of Google where if you think about writing a query, the notion of writing a query is similar to search. If you write the word San Francisco you might be prompted to subsequent terms like dining or attractions or restaurant. In the same way that Google does that prompting, we do that prompting but in tremendously complicated SQL statements where you need to know a lot of information to write a SQL query.”
Access Platform Wins
Data access platforms and certain data hubs with similar capabilities to those discussed in this article are able to effectively democratize employee data-based knowledge with the Machine Learning precedents of previous uses of data sets. They provide contextual information about data which can make data’s impact on decision making more effective, and are able to greatly reduce the complexity associated with using data across different sources in a timely fashion. In fact, the ability to get access to data across a broader range of resources much faster than was possible in conventionally centralized, IT led BI and analytics paradigms is the chief benefit of this technology.
The other, unequivocally, is cost—especially as related to the entire querying process. Sangani commented on this reality:
“If you can figure out a way to understand and explain the knowledge [of data] without having to go through some massive, human led documentation exercise, then you can really decrease the cost of asking a question.”
The overall importance of the varying means to integrate and access data—especially through dedicated access platforms—is that they demystify a number of critical facets of utilizing data-driven processes. By simplifying various facets of code writing, data source analysis, information cataloging, and data contextualization, such platforms make self-service BI tools more powerful while furthering the self-service movement in general. Without such options, Sangani cautioned that organizations are much more likely to become mired in a process that is now automated and expedited:
“We all kind of assume that all of these data tools enable people to find things faster. But when you’ve got 100 different tools in the organization one, the answer may already exist in plain sight but you just can’t find it. Two, even when you get the answer you have to learn whether or not it’s right because the definition of revenue in the report may be not the one that you need to answer the question that your boss is asking. All of these kinds of knowledge management questions are tribal knowledge. And what we essentially have observed is look, tribal knowledge is where you spend 90 percent of your time.”