The Power and Limits of Hadoop

By on

panby Angela Guess

Gregory Mone has written an insightful look beyond Hadoop for the Communications of the ACM magazine. Mone begins, “Pandora will not discuss exactly how much data it churns through daily, but head of playlist engineering Eric Bieschke says the company has at least 20 billion thumb ratings. Once every 24 hours, Pandora adds the last day’s data to its historical pool—not just thumbs, but information on skipped songs and more—and runs a series of machine learning, collaborative filtering, and collective intelligence tasks to ensure it makes even smarter suggestions for its users. A decade ago this would have been prohibitively expensive. Four years ago, though, Bieschke says Pandora began running these tasks in Apache Hadoop, an open source software system that processes enormous datasets across clusters of cheap computers. ‘Hadoop is cost efficient, but more than that, it makes it possible to do super large-scale machine learning,’ he says. Pandora’s working dataset will only grow, and Hadoop is also designed for expansion. ‘It’s so much easier to scale. We can literally just buy a bunch of commodity hardware and add it to the cluster’.”

Mone continues, “Bieschke is hardly alone in his endorsement. In just a few years, Hadoop has grown into the system of choice for engineers analying big data in fields as diverse as finance, marketing, and bioinformatics. At the same time, the changing nature of data itself, along with a desire for faster feedback, has sparked demand for new approaches, including tools that can deliver ad hoc, real-time processing, and the ability to parse the interconnected data flooding out of social networks and mobile devices. ‘Hadoop is going to have to evolve,’ says Mike Miller, chief scientist at Cloudant, a cloud database service based in Boston, MA. ‘It’s very clear that there is a need for other tools.’ Indeed, inside and outside the Hadoop ecosystem, that evolution is already well under way.”

Read more here.

photo credit: Pandora

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept