by Angela Guess
Peter Wayner recently wrote a white paper regarding how Hadoop improves Big Data processing. The paper begins, “It’s been a big year for Apache Hadoop, the open source project that helps you split your workload among a rack of computers. The buzzword is now well known to your boss but still just a vague and hazy concept for your boss’s boss. That puts it in the sweet spot when there’s plenty of room for experimentation. The list of companies using Hadoop in production work grows longer each day, and it probably won’t be long before ‘Hadoop cluster’ takes over the role that the words ‘crazy supercomputer’ used to play in thriller movies. The next version of the WOPR is bound to run Hadoop.”
Wayner continues, “The area is flourishing as the core project attracts a wide collection of helper projects that organize the workload and make it simpler to manage a collection of jobs to run at particular times. There’s HDFS, a standard file system that can organize the data spread out around the cluster; Hive, a data warehousing layer for making sense of this data; Mahout, a collection of routines for trying to learn something from said data; and ZooKeeper, a tool for keeping all of the balls in the air. At least a half-dozen or more other open source tools live in a stable orbit around Hadoop.”
photo credit: Hadoop
















