You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data News  >  Current Article

What is Hadoop, and What Can It Do?

By   /  February 26, 2015  /  No Comments

hadby Angela Guess

Matt Asay recently wrote in ReadWrite, “‘What is Hadoop?’ is a simple question with a not-so-simple answer. Using it successfully is even more complex. Even its creator Doug Cutting offered an accurate-but-unsatisfying “it depends” response when I asked him last week at the Strata+Hadoop World conference to define Hadoop. He wasn’t being coy. Despite serving as the poster child for Big Data, Hadoop has grown into a complicated ecosystem of complementary and sometimes competitive projects. Which is precisely what makes it so interesting and powerful. As Cutting went on to tell me, Hadoop can fill a myriad of different roles within an enterprise. The trick to getting real value from it, however, is to start with just one.”

Asay goes on, “Hadoop, avers Cutting, is much like Linux. ‘Linux, properly speaking, is the kernel and nothing more,’ he notes, channeling his inner Richard Stallman. ‘But more generally, it’s an ecosystem of projects. Hadoop is like that.’ But this wasn’t always the case. Hadoop started as a new way to process (MapReduce) and store (Hadoop Distributed File System, or HDFS) data. Ten years later, Hadoop has become motley assembly of oddly-named projects, including Pig, Hive, YARN, Hbase, and more. Already a far-ranging ecosystem, Hadoop is the largest galaxy in an ever-growing universe of Big Data (though people often use Hadoop to mean Big Data). Ultimately, says Cutting, Hadoop expresses a certain ‘style’ of thinking about data, one that centers on scalability (commodity hardware, open source, distributed reliability) and agility (no need to transform data to a common schema on load but rather load it and then improvise on schema as you go along).”

Read more here.

photo credit: Hadoop

You might also like...

The Next Phase of Data Management

Read More →