The Relevance of Open Source (Advanced) Analytics

By on

open source analytics x300by Jelani Harper

There are few limitations to the ways in which the confluence of Data Science and advanced analytics can aid the enterprise.

The predictive capabilities of the latter can enable forecasting for Business Intelligence tools, accurately presage the needs, services, and advertising products most sought by consumers, or design applications that can proactively assuage disgruntled customers by basing action on social media data and close to real-time feedback.

Perhaps most importantly, there is a growing body of evidence that indicates that the most meaningful way to access predictive analytics and enhance the reputation of Data Science is through open source analytics, which greatly hinges upon the free open source programming language R. Open source analytics benefits include:

  • Pricing: Since R is free, open source analytics vendors charge substantially reduced licensing fees for their products—which typically apply to performance capabilities—or base the focus of their fees on support services.
  • Flexibility: R is a programming language designed for innovation and flexibility, which provides significant adaptability for integration and building capabilities.
  • Community: There are very few aspects of advanced analytics that are not shared and supported by the growing open source community of R users, including models, code, and best practices.

And, perhaps most importantly, the delivery of open source analytics from vendors such as Revolution Analytics (which was created approximately seven years ago to deliver R to the enterprise) is starting to alter the way conventional software vendors deliver their advanced analytics solutions. According to Revolution Analytics Chief Community Officer David Smith:

“Just about every proprietary software vendor out there has announced some kind of connection with open source R, really on the basis that their users want access to R on the platforms their using. To us this just means more validation of the utility of R in organizations, more people using R, and in particular for us as a company, more people coming across R’s limitations that we address with Revolution R Enterprise.”

Revolution R Enterprise

R is practically the de facto programming language of choice for Data Scientists, and is the language most frequently taught within university programs for this discipline and a valuable point of commonality in the facilitation of analytics, Data Science, BI, Big Data, and app building. It provides Data Scientists with virtually all of the capabilities they need to access, manage, visualize, and model data, and is highly adaptable to work with a variety of platforms, warehouses, and data stores. Moreover, there are a number of innovations in Data Management methodologies that are frequently accompanied by R codes, which aids in its use as a dynamic means of transforming the contemporary data landscape.

However, since R was designed primarily for research and development (specifically with in-memory systems), it is somewhat circumscribed in its scalability and performance—particularly for Big Data applications. These limitations are easily overcome by a number of open source analytics products such as Revolution R Enterprise, which includes Big Data applications (in the form of Big Data algorithms and multi-threaded processing) to account for the scalability of massive quantities of data in real-time.

In addition to provisioning the open source R and an engine designed for it, the product also has a web service API to help IT and a host of descriptive, predictive, and prescriptive analytics, as well as valuable simulation capabilities which enables organizations to get realistic depictions of data they don’t have for predictive purposes. Smith noted that:

“The key to making predictive analytics useful in an organization is being able to deliver the results that Data Scientists do through their programming with the R language directly into the spreadsheets, or the Business Intelligence tools, or the internet pages or the applications that the line of business is already using so they can get the results of those Data Scientists methods, and don’t need to know the R language.”

Use Cases

As a code for Data Scientists to create and tailor analytics and to parse through myriad types of data, it may be difficult for end users and the business to get excited about R. Its use cases, however, tell a different story:

  • Advertising: R played a significant role in the Web 2.0 revolution and of Google’s domination of the online advertising market. The search engine multimedia conglomerate used several R deployments to exploit the predictive analytics and Big Data markets with its Google Ads, which not only provided ads that online users wanted to see, but also accurately gauged how much traffic such advertising would warrant. In this respect, Google serves as one of the best models for online advertising.
  • Insurance: One of the most prudent applications of predictive analytics and Big Data is found in the insurance industry, as companies are literally calculating a host of factors—including all of the varieties of types of data on natural disasters—to help determine premiums. By modeling the frequency of events such as hurricanes, tornados, floods and the like in a particular region, and factoring other variables such as property value, they can more accurately ascertain the level of risk involved in a particular resident’s premium.
  • Business Intelligence: The predictive capabilities of most BI applications stem from advanced analytics. Without such analytics, users can analyze historical and even real-time data with discovery tools. With advanced analytics, however, they ca determine best case and worst case scenarios, and the likelihood of them happening.

It is difficult to dispute the value that predictive analytics adds to the BI user whose tools have been sufficiently modified by Data Scientists or relevant IT personnel. Smith reflected that:

“There’s this whole new generation of Data Scientists who are exploring these brand new data stores that companies have never collected before, and they’re generating these brand new applications that companies have never thought of. Now they’re able to identify—through someone’s Twitter stream—whether they’re upset with a product so that customer service can proactively reach out. This is based on Big Data and analytics, but it’s an application that we just never would have thought of three or four years ago.”

Lack of Vendor Lock-in

Ultimately, the transition to open source analytics options reflects the larger transformation within the sphere of data management that was begun in earnest with the advent of Big Data, the emergence of Data Science, and the practicality of the NoSQL movement and other Big Data store options such as Hadoop. All of these innovations have fundamentally changed the role that data plays in daily business and operations processes, as well as in the way that data is accessed and used. The result is an increase in utility, agility and scalability for data driven processes and applications.

Open source analytics helps to facilitate this growing trend in Data Management by providing the sort of adaptability, applicability and longevity that is required of the best of technologies within this discipline. With R at its core, open source analytics provides the proverbially bridge between the traditional IT/business divide, and its effects are emanating within the traditional analytics marketplace as proprietary vendors are adjusting their prices and platforms to account for this technology.

This final aspect of open source analytics may very well be its most significant one. In addition to allowing users to access a growing community of open source aficionados, its attractive pricing options and its mutability, open source analytics provides a crucial alternative to the typical way in which products and services are offered within Data Management—in much the same way that NoSQL options do. Smith observed that:

“Investing in R, whether from the point of view of an individual Data Scientist or a company as a whole is always going to pay off because R is always available. If you’ve got a Data Scientist new to an organization, you can always use R. If you’re a company and you’re putting your practice on R, R is always going to be available. And, there’s also an ecosystem of companies built up around R including Revolution Enterprise to help organizations implement R into their machine critical production processes.”





Leave a Reply

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept