With all the hullabaloo around Big Data, I’ve been a little surprised that there hasn’t been more talk about how to consume the vast petabytes that people are talking about…until I realized that there are really two Big Data problems out there!
Roughly speaking, the two primary ways in which data scales is by adding depth and by adding breadth. The first is what most people mean when they refer to Big Data. Want to run analytics on every single transaction that Wal*Mart has done over 10 years to analyze trends? THAT is vertical scale. Technically, you can characterize it as having lots and lots of similarly structured data. That is where technologies like Hadoop and column-based data storage make a big difference.
Horizontal Big Data, on the other hand, is like the Linked Data Cloud. It has all kinds of random information that ranges from highly structured and numeric to highly unstructured. Significantly, it tends to change quite a bit over time with increasing heterogeneity. That’s a completely different kind of scale, and one that is not well solved by using highly structured, vertically scaling technologies.
With Horizontal Big Data (maybe HBD will start catching on!), the problem isn’t how to crunch lots of data fast. Instead, it’s how to rapidly define a working subset of information to help solve a specific need. The really interesting and hard part is that 100 different people will require 101 different slices. Companies see this all the time when individuals in different departments, or across firewalls, need to share some of the information they’re working on, but not all. I need a little bit of finance, a little bit of research, and, "oh yeah, this trend information that I found on Wikipedia." Data marts were supposed to solve this specific kind of problem, but are impossible to maintain due to their overly restrictive data modeling requirements. You end up overly silo’d with no bridges between the silos, and with knowledge workers unsatisfied with their access to data.
Said another way, while we’ve made great strides in solving the Vertical Big Data problem, we have done very little to solve the Horizontal Big Data problem.
I don’t have to tell regular readers of SemanticWeb.com why Semantic Web Technology is key to solving the problems brought on by this other axis of data scale, with the current alternative being never-ending, one-off ETL products using relational technology. I will say, however, that this seems like a great marketing opportunity for our community in general that we’re not taking advantage of.
The Big Data problem is not fully solved by vertical scale technologies. We have a role to play.