Why Some of the Biggest Names in Big Data are Open Source

by Angela Guess

Hannah Augur recently wrote in Dataconomy, “The biggest names in data are open source. Many of them are even part of the same Apache family: Spark, Hadoop, Kafka, Cassandra. RapidMiner and Orange are there for data mining, and open source databases are chipping away at Oracle. Though closed source databases are still incredibly popular, open source alternatives are growing at rapid speed. It is very clear that, if they keep growing, those closed source databases won’t be big for much longer. Solid IT co-founder Matthias Gelbmann describes several database management systems in one blog post, noting that, ‘we often see, that once Redis is installed for caching, and people experience its speed and reliability, they start moving more and more functionality there.’ Redis, an open source database management system, has continued to grow despite the company, in their own words, having small business resources and no ‘intentional’ marketing.”

Augur goes on, “There are several reasons for the growth of these open source systems, one of which is the way it allows different people in different areas to effectively work together. When companies share their work and allow others to contribute, the result is outside eyes finding new holes and new possibilities. Deep learning technology owes a lot to big players like Google and Facebook, who actively give their data and resources back to the community. Technology appears to develop very quickly, but it is not an instantaneous process. If companies were to attempt to tackle big data software on their own, with no input or help from open-source softwares, it would be a painfully slow process. There is a serious need to keep up with the times, and big data is a rapidly growing field.”

Data Topics

Why Some of the Biggest Names in Big Data are Open Source

Leave a Reply Cancel reply