Click to learn more about author Kelly Stirman.
Roger Magoulas introduced us to the term Big Data back in 2005. Never did we imagine that data was going to increase in volume so fast that the term itself would become almost irrelevant and big data would become just ‘data’. Big Data also brought with itself concepts like the 4v’s of Big Data in addition to the the definition of the Data Lake, several sources of data available in the same repository. However, this concept would rapidly become an Enterprise Data Swamp for many. The reason, by nature, Data Lakes accept any type of data, but without cataloging, curation, governance, security, and proper oversight, it will turn into costly data chaos adding to the $3 trillion that bad data costs us each year in the US.
How Did We Get Here?
Back in the early days of modern Data Analytics, Data Warehousing was introduced as a new concept. Only a very small amount of people had access to enterprise data. Those who did, would be data engineers and architects who would place data into formats that would make data somewhat accessible through complex SQL interfaces. This was a compelling method, SQL was and still is a strong language, however the complexity of this scenario, didn’t really present it as the most user friendly solution.
In an attempt to solve the issue, many companies took a step forward towards developing data abstraction layers that would make data more accessible to its consumers. These abstraction layers would sit on top of the data and not only shield users from very complex query languages and interfaces but also would bring an opportunity to the user to make sense of the data in terms of facts and dimensions.
Next Stage of Data Infrastructure
As we moved into the self-service age, we came across some unplanned results. Abstraction and semantic layers became almost a mandatory element of Data Warehousing architectures as business users sought to get more control over their Analytics. The time invested on creating semantics paid off and made everyone want one. However, since every BI tool in the market needed different semantics to understand the data, the investment had to be done multiple times. To alleviate the issue, many data consumers started developing their own version of these abstraction layers, using their own terminology and business rules causing chaos since each one of these abstractions would provide a different answer for the same queries.
Traditional approaches to accessing, curating, accelerating and securing data are costly and complex. Let’s take a look at the key needs that a Data-as-a-Service solutions should address to become an integral part of a robust Data Architecture.
Data Cataloging provides a unified view of the data while allowing users to increase their trust on it. Performing Analytics on a Data-as-a-Service platform that offers data cataloging abilities, allows business users to quickly find the data that they need to perform analysis on. Users can search for data, or find trusted datasets prepared by others, and get on board with their reporting or decision making process much more quickly that waiting for someone in IT to point them in the right direction.
Curation helps making data more useful for the business. Different systems will generate different formats of data that will need to be stored in a variety of repositories. Data Curation allows users to organize the data in a way that is appropriate for their own needs. It also integrates all these different sources into blended datasets that are more valuable that their separate parts.
Time is a valuable resource, in Data Analytics, every second we spend in the decision-making process could translate into exponential costs down the road. This is why a key element of Data-as-a-Service technologies is data acceleration. Faster time to insight is possible when the platform provides optimized access that removes costly and complex computing resource overhead.
Data insights are valuable, they are the fundamentals of any decision making process. Access and manipulation of these insights by the wrong elements can cause a great amount of damaging results. By providing fine-grained levels of security and governance, Data-as-a-Service technologies can guarantee that decisions are based on trustworthy and well managed data.
Data-as-a-Service allows business users to self-serve within a framework performing the analytics the want using the tools that they are already familiar with, such as BI and Data Science tools. It allows business users to focus on gaining value from their data while at the same time it provides the opportunity for data engineers to focus on maintaining a proper data platform. While the solution seems simple, it is important for enterprises to pick a tool that would take care of all things needed to avoid for a Data Lake to turn into a Data Swamp.