You are here:  Home  >  Data Education  >  BI / Data Science News, Articles, & Education  >  BI / Data Science Articles  >  Current Article

The Open Source Analytics Invasion

By   /  July 15, 2014  /  No Comments

open source analytics - blue x300by Jelani Harper

Open Source analytics have transformed the analytics space in Data Management in much the same way that Hadoop (another Open Source technology) has transformed Big Data.

Open Source analytics have almost single-handedly transfigured the way:

  • Commercial and proprietary Business Intelligence (BI) and Analytics providers provide script support to and integration options with R.
  • Some commercial and BI providers have packaged their products (including their pricing models).
  • Predictive Analytics is increasingly becoming a mainstream technology for both end (business) users and Data Scientists.
  • Core tools such as BI and analytics are delivered to customers, with stratifications for both open source and proprietary BI and advanced analytics and conventional analytics platforms.

In its wake, the prevalence of Open Source analytics has left organizations with some key questions that may require a drastic reallocation of resources such as:

  • Is it better to dedicate finances to personnel and training or to technology?
  • Is it more advantageous to foster an environment conducive to innovation or one in which germane to support and ease of use?
  • Is it necessary to give up commercial analytics functionality for the low licensing fees of open source options?
  • Is there any way to reap the benefits of both sides of all of the aforementioned scenarios and not sacrifice anything?

Perhaps. But there is no denying the impact that Open Source in general. It has modest licensing costs, extreme mutability, agility, and growing support of community users, and is having an effect not just in the data sphere, but on IT as a whole. Just look at entities such as Apache, Linux, Android, and Mozilla to name just a few.

At the forefront of the impact is Open Source analytics integral relationship with Big Data.  According to Revolution Analytics Chief Community Officer and co-author of An Introduction to R David Smith:

“We’ve had a Big Data revolution and a lot of that Revolution has been built on open source. And not just Hadoop but also data management platforms like Python and in particular open source R, which has really been a source of innovation for applying advanced analytics to Big Data sets for such a long time.”

Advanced Analytics

The viability of Open Source analytics is perhaps most visible in terms of advanced analytics and how it is wedded to Big Data technologies. Gartner’s Lisa Kart, Alexander Linden, and Gareth Herschel note that advanced analytics is responsible for the fact that, “Many organizations are moving beyond traditional BI reporting, descriptive analytics and diagnostic analytics to advanced analytics, such as predictive modeling, clustering, affinity analysis and optimization.”

Open Source analytics are at the forefront of the technologies utilized by the major players in the advanced analytics platform sphere, which includes vendors such as Revolution Analytics and Rapid Miner, and may provide the closest means to accessing all of the boons in the rhetorical scenarios discussed above with a reliance on R. In addition to providing the tools for Data Scientists to design analytics to suit the needs of specific (Big Data) applications, products offered by Revolution Analytics, for example, illustrate the fact that, as denoted by Gartner “much of this analysis is predictive in nature, although elements of descriptive analytics are not uncommon.”

In addition to providing an array of customer support options, a robust community of users and an array of partnerships with proprietary BI and analytics providers, Revolution Analytics’ reliance on R facilitates an agility and degree of customization that (for a fraction of the price) surpasses that of commercial analytics and BI providers.

Integrating BI with R

It also explains the ubiquity of R among proprietary analytics and BI providers, and explains the widespread support for this script that is found among eminent vendors including the likes of SAP, Tableau, Qlik View, MicroStrategy, and others. There are several points of interest of utilizing Open Source analytics through commercial vendors. Although users are still responsible for paying their substantial licensing fees, they can get greater out-the-box functionality by utilizing the commercial analytics capabilities of the particular product they have obtained. R’s potential is used to augment that functionality, which can create more comprehensive utility sooner with less need for difficult to find Data Scientists.

Users should be weary of the integration capabilities of commercial vendors with R; as is the case with all products, some are better than others at granting accessibility between this script and their own data streams. The more competitive offerings have a point-and-click GUI that produces R and can embed R calculations within its various functions. Still, the fact that most vendors are extending support for R alludes to the reality that, as indicated by Forrester, “open source R is by far the most ubiquitous predictive analytics platform.”

Open Source BI

The trend towards open source analytics has also paved the way for Open Source BI options, some of the most salient include products from Jaspersoft and Pentaho, as well as offerings from Actuate and Jedox. Like most Open Source platforms these tools are available in both free and licensed versions, the former coming with limited functionality that is greatly increased in the latter and is attended by substantial support. In addition to offering Agile environments to foster innovation, one of the principle boons of Open Source BI is that it enables users to embed analytics in applications where the data resides—which is a benefit of most Open Source technology, and is particularly useful with Big Data initiatives. It is possible to embed analytics in applications with proprietary vendors, although there is typically greater license to do so with Open Source options.

Open Source Drawbacks and Fixes

Despite their comparatively low licensing fees, flexibility, and the virtually unlimited nature of their uses, adoption rates for Open Source analytics and Business Intelligence options are somewhat hampered by their necessity for highly trained personnel (Data Scientists) and a lengthy time to production—especially for organizations that are just getting started with them. Customer support and the wide range of features that proprietary analytics and BI offers are difficult for Open Source options to match (although Open Source options have recently increased their levels of support).

Cloud options for Open Source analytics, however, can provide correctives for a number of these issues. Several commercial vendors offer Cloud analytics services that incorporate R and Hadoop and not only reduce the costs associated with on-premise versions and the skill requirements for Open Source analytics, but also provide an amount of customer support that enterprises are comfortable with. An article from Gartner’s Daniel Yuen reveals that “Commercial business analytics vendors are trying to compete with open-source BI by offering their own low-cost solutions.”


Regardless of how Open Source analytics are accessed, their true importance transcends the facilitation of advanced and predictive analytics, low cost licensing fees, and an environment of agility and creativity. The real significance of this movement pertains to the way in which it is effectively restructuring the IT landscape—most importantly by reconfiguring the role of infrastructure—and solidifying the trends towards Big Data, the Cloud, and analytics.

More than anything else, Open Source analytics has revamped the enterprise’s conception of infrastructure, which primarily consists of hardware, applications, application platforms, and platform management personnel. Open Source analytics tools such as R can readily supply the applications for analytics, the Cloud can provide the hardware, and Open Source Hadoop supplies the Big (and otherwise) Data platform—which just leaves personnel requirements and a huge variation in skill requirements depending on which open source analytics option is chosen.

Open Source analytics and BI have not overtaking commercial analytics and BI just yet—rather they are providing a critical means for reshaping how organizations implement infrastructure and aid in the increasing reliance on Big Data, predictive analytics, and the Cloud to augment their business needs.

You might also like...

Don’t Call it a Data Lake, its a Data River. Here’s Why.

Read More →