Welcome to Magazine Premium

You can change this text in the options panel in the admin

There are tons of ways to configure Magazine Premium... The possibilities are endless!

Member Login
Lost your password?
Not a member yet? Sign Up!

How Big is Big Data?

May 6, 2011

BUBBLESby Angela Guess

Philip Howard discusses some misconceptions about what constitutes Big Data in a recent article: “There is a lot of confusion around “big data”. People naturally assume that big data means lots of data. Which is true. But it isn’t any old data or, at least it doesn’t have to be. The reason I bring this up is because just last week I heard about a company investigating the possibility of using Hadoop to store and support the analysis of several years’ worth of transactional data. Now, it is possible to think of reasons to use Hadoop for this purpose: it might be cheaper or you might prefer Java programmers to SQL developers but this is not the sort of environment where Hadoop would naturally spring to mind as an application. Moreover, I don’t care how large your organisation is, you won’t need huge quantities of disk for a few years of transactions. This isn’t, relatively speaking, “big” data, it’s actually pretty small data but if you are used to storing only 3 months’ worth of data then maybe it looks big.”

Howard continues, “So we need to be clear about what we mean by big. Generally speaking we are at least talking about hundreds of terabytes and more often petabytes. The next thing to think about big data is what sort of data it is. Hadoop and MapReduce are particularly useful when it comes to analysing semi-structured and unstructured data, while traditional data warehouses, using traditional analytic techniques, are not. On the other hand, you can do things with structured data in a conventional data warehouse that would be much more difficult to do using MapReduce. So there is a good case for treating Hadoop and MapReduce on the one hand and data warehousing on the other, as complimentary. However, if you are going to start looking at using Hadoop for transactional data then they become competitive, which is something else entirely.”

Read more here.

Creative Commons License photo credit: Lauren Manning

Related Posts Plugin for WordPress, Blogger...

Tags: , , , , , , , , , ,

Leave a Reply

Your email address will not be published. Required fields are marked *


Add video comment

FOLLOW US!

Friend me on FacebookFollow me on TwitterJoin my group on LinkedInWatch me on YouTubeRSS Feed

User Login

Lost Password

 

 

Latest Tweets

Twitter