Hadoop or Relational – Wrong Question

by Gil Allouche

Big data is on the rise. It’s being implemented by more businesses than ever, and it’s becoming even more available because of big data in the cloud which gives affordable, quality access to businesses of all sizes. The increasing popularity and accessibility make big data extremely attractive to many different businesses, which is a great thing.

Too often, however, people are quick to ditch the current technology in favor of something new and relatively unknown. Big data is definitely going to help any business that adopts it. However, that doesn’t mean it should automatically replace all current systems that a company is using. There are many things that big data excels at, but there are other things that different systems do better.

Comparing Hadoop to SQL, or relational databases as they’re often called, is a good example of this. Many times business leaders will completely abandon SQL in favor of Hadoop without realizing the differences between the two. On the other side, a startup will adopt Hadoop thinking it will take care of all its needs, when really it would be much better off with both Hadoop and SQL. They both have strengths that complement each other.

Hadoop

Unstructured Data

The most obvious difference between Hadoop and a relational database is the type of information that they gather and analyze. Hadoop is best for unstructured or semi-structured data like text, social media and websites. It takes the unstructured information and converts it into pairs that can then be analyzed.

Amount of Data

When dealing with extremely large data sets, Hadoop tends to be preferred over relational databases. Not only is it more cost-effective, but it’s more efficient to use Hadoop when you need to gather extremely large data sets. There’s a reason it’s called big data.

Flexibility

Hadoop also gives users more flexibility in the results they can achieve from data analysis and how the results are arrived at. Hadoop uses programs like Amazon Elastic MapReduce to quickly and efficiently process the information. It’s done in a different way than a traditional SQL database. Because of that, Hadoop gives companies more flexibility not only with the information that is gathered, but also with the results that are delivered. MapReduce is more general in how it processes information, giving it the leeway to deliver multiple outcomes depending on a company’s needs and preferences.

SQL

Structured Data

Relational databases are more efficient at processing structured data.  Structured data includes things like names, dates, birthdays, etc., and is entered into a spreadsheet. Using predetermined schema and with one end result in mind, SQL can easily and quickly process and analyze this information. If companies are going to work mostly with structured data, then there’s no reason to abandon relational in favor of Hadoop. While Hadoop offers greater data gathering abilities, it tends to be more clunky and less natural when working with structured data.

Real-time

Relational databases are important and especially effective for real-time queries. With more structure and less data, SQL can deliver analysis in real-time. Much of what is done on Hadoop is geared toward higher volume and consequently tends to deliver slower results.

Specific

SQL is also important when companies have structured information and are looking for a single result. The analytical capabilities of SQL are best utilized in these type of situations. With relational databases, companies have the benefit of both real-time analytics and pinpoint solutions. On the other hand, however, that specificity is one of the reasons people prefer SQL when looking for more flexibility. With SQL the user has to input the desired result before the analysis begins.

Together

Hadoop and SQL are best utilized together. Because of the nature of business today, most companies have need for both structured and unstructured data analytics. With both you can efficiently achieve the results you need.

Related Posts Plugin for WordPress, Blogger...

  2 comments for “Hadoop or Relational – Wrong Question

  1. PJ
    July 2, 2014 at 12:14 pm

    Fairly simplistic, but it is true that most companies will have both. However, even unstructured data has structure to it and structure usually has to be applied to get anything meaningful out of it. Additionally, Hadoop also has the ability to do relational, such as Hive, Impala, Drill etc. Furthermore, there will likely be a need to correlate data from old-school relational and new-school hadoop, so the data is going to have to end up somewhere. Companies need to think a lot broader than just whether they should be using hadoop or relational and have a proper data strategy that delivers the value they need.

    • Gil Allouche
      July 2, 2014 at 1:36 pm

      PJ, I completely agree with your last comment. I wrote this post mainly as a response to the articles emphasizing the death of the relational database or the battle between relational and Hadoop. There’s a lot of false assumptions out there about what Hadoop can or cannot do, so the emphasis on choosing between the two definitely misses the mark.

Leave a Reply

Your email address will not be published. Required fields are marked *