Size Doesn’t Matter. Or Does It? A Rant on Big Data Terms

By on

By Karen Lopez


Rant signLast week at Enterprise Data World, I gave a lightning talk (a strictly-enforced time-limited five minute presentation).  Since I can barely manage to fit meaningful presentations in one hour given how much I talk, I decided to go for a rant.  That may not come as a surprise to you.  If it does, welcome new reader.

Last year I also gave a lightning talk on the Myths of Normalization, which I’ve turned into a blog series here, too.  That was also a rant.  I’m seeing a pattern here.  You should probably envision me holding a martini and my tablet while I ranted.  It will put you in the mood. I didn’t actually hold a martini in my hand.  It was on the table beside me.

Since I got good feedback, I decided to share my script for my talk here. By the way, those of you snickering about this rant, know that I’m working on a similar one for the RDBMS area as we speak.  Look for it at a NoSQL or Big Data event near you.

Size Doesn’t Matter.  Or Does It?

I’m @Datachick. I think a lot about data.  Today I’m on a rant.  I know.  SHOCKING!!!

I’m a huge fan of Big Data and NoSQL. Really.  A really, really big fan.  Get it?  BIG DATA.  Today I want to share with you some of my more snarky observations about BIG DATA.  By the way, every single one of these rants is totally unfair, cherry picked and irreverent.  I know. It’s shocking.

Let’s start with the basics: What is Big Data?  I’m here to tell you that nobody really knows.  The good thing about Big Data is just that.  So it can be anything you want it to be. Really.  Just like that nice friendly woman who wanted you to buy her a drink last night.

Here’s a nice definition from Wikipedia, the ultimate source of knowledge for the human race.  But that’s a whole ‘nother rant.

In information technology, big data consists of data sets that grow so large that they become awkward to work with(1)

What the heck kinda definition is that?  Data that’s so big it’s awkard?  I can’t wait to be in that meeting with CEO, CIO and friends.

Big Data vs. data

One of the things I noticed about Big Data is that it is always capitallized when it’s written.  I’m not sure why, because it really isn’t a proper noun.  We don’t capitalize DATA, so why should Big Data be?  I’m pretty sure that capitalizaiton is a way to spot the birth of a silver bullet.  Remember that when the next big thing, HUGE DATA, then GINORMOUS DATA is announced at next year’s EDW. I guess then big data will lose its title caps.


Hadoop is one of the many technologies that has come from the Big Data religion…er…movement…no…solutions.  The great thing about Hadoop is that everything that makes up Hadoop is named Hadoop.  Really.  You can’t make this stuff up.

What do you think the mascot is named?  Yep. You got it.  Not Harvey or Harry, but Hadoop.  Isn’t it just like the new crowd to not worry about giving everything its own distinctive name?  Eventually everything becomes consistent.

Okay, not everything. Other Hadoop-related projects at Apache include:

Remember when technologies had names we could say in front of business people without making them think we were idiots? vCOBOL. BASIC. SQLServer.  Try saying your project is late because your elephant driver needs to be tuned to work with your pig and ZooKeeper.  I want to watch.


One of the great things about Big Data is that usually we don’t know ahead of time what data we are going to get or what answers we need to answer.  Yes, Big Data often means a design that is just a big heap of THINGS related to THINGS.  Makes data modeling easy.  Sort of.  Not really. See, the problem with this is that the schema or the design is embedded in with the “real” data.  So you can add new data in an instant. Often just as the data arrives.   Get ready to sprint your data designs. And by sprint, I mean model at the speed of light.  I hope you are in training now.  Also, be prepared to see autocorrect change schemaless to many different words that the one you meant. Go ahead.  But try it at home, not at work.

Eventual Consistency

I used this term previously.  It’s okay, I’m being consistent.  Unlike most Big Data technologies.  See, there’s this concept of Eventual Consistency that says that data has controlled duplication across nodes.  Well, sorta controlled.  See in the Big Data world, it’s okay that the results of the query you run can produce different values than when I run it  Eventally at some point we will get this same result.  Just like a broken clock is right twice a day.  Except in Europe.

I don’t know about you, but I want to know that my version of my bank account balance is the same one that the bank is using to process my cheques.

And don’t even get me started on the people who say that “Eventually the customer will call and ask us to correct the data if it is important to him.”  Seriously?  What world does this guy live in? Talk about living in the clouds.

All I can say is Consistent my ASCII.


I’ve been snarky here.  But there really isn’t a reason to think that Big Data, NoSQL and the likes are competitors of traditional database technologies.  We need to be using the right tools for the right job.  Size Doesn’t Matter.

Schemaless is perfect for designing data solutions where you don’t know or really care about perfect data integrity.  Think about getting a data feed from an external source where you have no control of what they send you. That flexibility works.

Eventual consistency is just fine for many applications. Who really cares whether everyone sees your Facebook update at the same time? Or whether your iTunes receipt shows up hours later? They don’t do exchanges or refunds anyway.

I suggest you read up on Big Data, attend some talks like the ones here, find out what applications are using Hadoop and other non-relational technologies.  They will need data experts and you want to be ready when they need us.

And they involve data.  Love your data by using the right solutions.

(1) Wikipedia contributors. “Big data.” Wikipedia, The Free Encyclopedia. Wikipedia, The Free Encyclopedia, 1 May. 2012. Web. 1 May. 2012.

(2)  “Welcome to Apache Hadoop”,, 1 May. 2012,  Web. 1 May. 2012

Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept