by John Joseph
I spoke with all the leading data analytics and business intelligence (BI) analysts last year from Gartner and Forrester and it was unanimous, they all hated the term “Big Data”. The reason is that the name implies only that this class of problems is about high data volume, but that’s not the case. Most Big Data projects were being done on smaller volumes of data, but involved data that was typically not stored in a traditional data warehouse. So while the Big Data challenge, solution, or situation can be about big data volume, it’s even more likely to involve a great variety of data types, data that is rapidly changing and/or moving at high velocity, or some combination of these three traits (volume, variety, velocity).
In fact, most of our work with clients across industries like telecommunications, energy, financial auditing, and manufacturing, leads us to believe a more appropriate trifecta of traits is volume, variety, and volatility, because changing data is such a common occurrence. In our experience, which was confirmed by input from those Gartner and Forrester analysts, most Big Data projects today focus on solving the variety and volatility problems and less so on the volume dimension.
It’s not to say that volume is a simple thing to deal with, especially when trying to process vast amounts of data for analytics and true real-time insights. But, the reality is that greater volume has been addressed, to a large degree, already and more of a key concern today is getting making sure analytic models can account for business complexity and the rate at which data and business situations change.
First, let’s look at variety, which can be interpreted a couple different ways. For energy utilities, for example, the sheer number of different types of data is very high because there are so many different types of devices and sensors that make up the grid. But an even bigger challenge when it comes to Big Data environments like this is that it is extremely hard to join all of these different data types together in one unified analytic. Not only are meters pumping out large amounts of data, but there are many different meter types, each with their own characteristics that complicate the analytics. Also, data sources are quite often separated by different systems in different departments that have their own processes, and unifying fractured data stores can compound Big Data complexity problems immensely. No matter where you look, we live in a heterogeneous data world. There is no escape.
Now let’s consider volatility. I think volatility is a more appropriate trait to describe the challenges of Big Data analytics because it puts the emphasis on how quickly the data changes. In Big Data environments, it is common for data to change almost constantly, and if this is not accounted for, any analytic results may be invalid the moment they are produced. This type of situation is especially true in industries where real-time intelligence is key, such as the stock market, or even for a telecom company where call data records only remain relevant for one day.
Solving both the variety and volatility challenges requires faster, agile analytics/BI approaches, such as those represented by today’s data discovery tools. The path to greater flexibility can’t go exclusively through a data warehouse where the procedures and processes used to manage the data are slow moving and costly. That makes it too costly to integrate dissimilar data. And of course it’s even worse if that data changes rapidly. By the time you could integrate it, the data would be too stale to matter. Data analysis software, including data visualization software , though, have been helping companies make data exploration and unification a much faster process by being completely agnostic to data format and location. Ultimately, taming variety and volatility are key to making the most of Big Data analytics.