You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  BI / Data Science Blogs  >  Current Article

What Does the Future of Apache Spark Look Like?

By   /  December 16, 2015  /  No Comments

Learn more about Jonathan Buckley.

Businesses have been buzzing about what Apache Spark could bring to their respective industries. Perhaps this shouldn’t be a total surprise considering the open source nature of Spark, the most popular of all Apache open source projects. Open source makes it a tool that is constantly being updated and improved as businesses tailor it to current and future needs. Even so, fads in the tech community come and go, which makes the excitement surrounding Apache Spark something to look at with positivity as well as a bit of skepticism. After all, how certain are we that Spark will be around in a decade or even a few years? Will something better come along? Predicting the future is a tricky task for anyone, but a quick look at what Apache Spark brings to the table shows that it does have staying power. If anything, the future looks bright for Spark, and there are many reasons why.

Perhaps the best and most significant reason Apache Spark will have sustained success for many years to come is the way so many tech companies have embraced the technology. These companies have recognized the potential of Spark and have quickly worked to support it. A short list of these companies include; Intel, ZoomData, Qubole, Cloudera, Altiscale, and more. And let’s not forget IBM’s own contributions to the developing technology. Last year, IBM announced they would be investing $300 million over a period of several years to help develop Apache Spark, placing some of their own data and programming experts (3,500 of them to be more precise) on the open source project. If that weren’t significant enough, in October of this year, IBM announced even further support for Spark. In the announcement, IBM reaffirmed its commitment to Apache Spark and their goal of helping to advance the technology. This included providing Spark-as-a-Service on their own platforms and their intention of rewriting some IBM applications to better include Spark.

The true significance of IBM’s announcements and other companies pledging their support is that Spark now has solid legs to stand on. Without the dedication of companies involved, Apache Spark would have serious trouble catching on. It’s important to note that Spark’s success and likely future growth isn’t so much a product of good marketing as it is its own usefulness as a big data tool. In fact, Spark’s many useful attributes have made it an almost indispensable tech tool for any company hoping to use big data to improve their operations, reach new customers, and discover new insights. Big data analytics has been shown to be the future of businesses anyway, so a tool designed to take advantage of that is likely to have a lot of success. Many experts are even saying that Apache Spark is the future of enterprise data, that if businesses truly want to unlock big data’s potential, they’ll need Spark to do so. With so much revolving around big data these days it only makes sense to gravitate toward a tool designed to make the most of it.

Spark’s future is a bright one in part because it compares so favorably to other existing big data tools. For example, Apache Spark can run up to 100 times faster than MapReduce, even in conditions that would typically favor the older technology. Spark also has a dynamic relationship with Hadoop, one that has yet to be settled. While some experts believe that Spark will eventually replace Hadoop, others say the two actually work well together, complementing their own abilities. It is this Hadoop Spark dynamic that some companies are offering, but more work still needs to be done as both technologies still need to define their roles and components before they function together as a well oiled machine. If anything is clear, it’s that the days of Hadoop working on its own are likely gone.

Due to its versatile nature, focus on big data insights, and support from many different businesses, Apache Spark looks to achieve new levels of success in the years to come. The future of Spark is one of major proliferation, where businesses of many types and sizes use it for their own big data purposes. In fact, Apache Spark may become a must-have big data tool that’s available through cloud applications, becoming a part of other tools that businesses already use. In any case, Spark likely has years of success ahead of it.

About the author

Jonathan Buckley is a Silicon Valley serial entrepreneur with a career focus on bringing highly disruptive B2B technologies to market in the enterprise data and security-related spaces. With a background in econometric modeling and business strategy from Arthur Andersen LLC, Jonathan has lead award winning marketing teams at many notable companies ranging from a co-founded IoT startup (since acquired) to a NASDAQ 100 networking company (since acquired) where revenues for his product grew from $60M to $222M per year under his leadership. After launching one of the world’s first enterprise cloud storage companies in 2007, Jonathan founded his own consultancy in 2008 called The Artesian Network, LLC specializing in brining disruptive technologies to market leveraging lean startup techniques. Qubole joined the roster of Artesian clients in early 2015 until Jonathan and members of his Team joined Qubole full-time later in the year to provide their full attention to the hyper growth of the company.

You might also like...

Analytics vs. Data Discovery

Read More →