The Big Data hype will dissipate as quickly as it arrived, unless organizations find a way to produce business value from these new technologies. Doing so requires more than sandboxes for Data Scientists, real-time analytics, or Cloud-based storage and scalability.
It requires the development and production of business-driven applications that readily convert data into monetary action.
According to Chief Technology Officer of Concurrent, Chris Wensel, it requires doing so at a rate and in a manner that is more consistent, thorough, and swifter than those of the competitors:
“You have the opportunity now to differentiate yourself purely through your innovation – not because you have more money than the other guy, but because you’re smarter than the other guy. You have great engineers. You now need great tools to get those products to market much, much quicker. That’s the difference.”
Concurrent recently unveiled its latest tool (Driven) to assist in the production and monitoring of Big Data applications, which it touts as the world’s first performance management product specifically for Big Data. Driven was disseminated as a Cloud-based service to its user community on February 4. It was designed to complement its popular open source Big Data application development framework (Cascading) which principally runs on Apache Hadoop.
Driven directly impacts business value produced by Big Data apps by:
- Reducing Development Time: Driven provides a means of visualizing the processes for enterprise apps (both comprehensively and individually) via a detailed user interface, which considerably speeds up the time spent in development and production, and gets apps to the marketplace quicker.
- Strengthening App Reliability: Due to the ease of visualizing apps during the development phase, engineers will be able to identify problems and even readily anticipate them so that they can build more dependable apps before the product reaches the market. The capacity to visualize also assists with determining the most appropriate algorithms and application metrics to ensure accurate and consistent results.
- Pinpointing Failures and Optimization: The visualization capacity for Driven is ideal for scrutinizing apps at the user level – including which data sets are manipulated, what the workloads are, and which apps are running. As such, the process of identifying points of failures in apps has accelerated considerably, enabling operations personnel to identify slow periods and optimization peaks in a matter of minutes – as opposed to days or weeks. The enterprise version of Driven 1.0 (which is projected for a second quarter release and has an annual fee) contains notifications which tells developers which App has failed, where the failure is, and who owns the app.
When used in conjunction with the framework provided by Cascading, the management features of Driven help organizations to ensure that they are exploiting Big Data to derive maximum value by building applications that reinforce business objectives.
As Wensel noted, such action transcends mere insight: “You’re not just asking data a question, you’re actually building products, models, or whatever it is out of the data, and you’re using that to make your business better.”
Driven is compatible with apps created through Cascading, although subsequent versions are projected to support Pig and Hive. Part of Cascading’s popularity – which includes approximately 6,000 production deployments in a range of industries and companies such as Visa, CBS, and Best Buy, as well as over 130,000 downloads from the open source community each month – is attributed to its support for MapReduce and Hadoop (support for Quartz and Sparc is planned for later in the year).
Additionally, Cascading includes an ANSI SQL JVC driver which enables users to create apps in Hadoop with languages that are compatible with both Java and SQL, as well as to utilize any third-party tool that is as well (such as Business Intelligence or analytics platforms). By including Driven, developers can design apps with the language they are most familiar with and visualize each statement while doing so. The combination allows them to utilize one of the most popular frameworks for accessing Big Data (Hadoop) while visualizing the particulars of the app which directly relate to business concerns: how great the load is on the Hadoop cluster, what data is being accessed, and how much data consumption is taking place.
Best of all, Driven provides a degree of transparency which facilitates an innate ease of use. Once users initially connect its included plug-in, all telemetry data (such as metadata and other descriptions of what each particular app is doing or is used for) becomes accessible to the app performance management platform, which then enables users to visualize it. Concurrent CEO Gary Nakamura observed that:
“The plan for Cascading was to make it easy for the rest of the world to build data applications on top of fabrics like Hadoop. What the framework does is it abstracts the complexity away so you can just think in terms of business logic and also separate things like data integration so you can isolate each of the problems. Then at one time Cascading brings them together and runs them on Hadoop.”
Although the objective of a Big Data initiative largely depends on the industry and business objectives of the organization, one of the most common use cases for Big Data is to leverage its technologies for increased advertising revenue. The plethora of sentiment data sources, however, requires particular algorithms and analytics requirements that involve specific applications to process.
Concurrent’s partnership with Twitter began in part due to the latter’s need to pair users with the most relevant advertising based on a variety of data pertaining to advertising content and trending topics on its site. Its revenue department accessed Cascading to create an API that reduced the complexity of defining workloads and testing data sources while integrating user functionality with a domain-specific query language. Now, revenue personnel can readily analyze the most relevant data for advertisers to target consumers.
Wensel pointed out the necessity of application building for Big Data:
“The point of Big Data is to apply different algorithms or computations that just aren’t expressible in SQL or in other Hadoop languages. What you really want is the full power of Java to leverage Java engineers to solve problems like if you’ve got a better stream matching algorithm if you’re a gene sequencing company. Or you’re actually building a recommender engine, creating a scoring model and getting that deployed to a website so people buy more. There are no tools that do that.”
Smart Application Management
More than anything, the release of Driven and the burgeoning success of Cascading indicate that a Big Data initiative is only as successful as its applications are specifically targeted towards generating business. Such applications are the real reason that analytics are necessary and that algorithms are such an integral component of many data-driven processes today. Business-focused apps enable organizations to refine products and services and, with tools such as Driven and Cascading, allow them to do so before the competition does. There is a growing community of open-source users that realize this fact; the sooner more enterprises do as well the sooner the mainstream adoption of Big Data will flourish. Nakamura commented on this fact:
“Businesses have made tremendous amounts of investments in their Hadoop requirements, so accelerating the time it takes them to deploy Cloud applications on their Hadoop clusters and get yield from it are extremely impactful to the business. Making sure that they are reliable is also very important to them, as is being able to optimize applications when needed based on business requirements.”