Big Data Applications: Managing Complexity with Success

In essence, Unravel Data makes processing Big Data easier. The program was designed to resolve the complicated and disconcerting problems that emerge when processing Big Data. These applications can become confusing and difficult to operate. Never-before-seen challenges arise with chronic regularity, leaving research teams constantly struggling with issues such as allocating resources, scheduling, and debugging. These redundant issues act to slow down the actual processing, and can make it difficult to use Big Data Applications effectively, or to even profit from it. Unravel Data streamlines the process by detecting, and correcting, concealed defects that block advertising and analytics tracking, and online conversions.

Big Data can no longer be considered an experimental side project. NoSQL, Spark, Kafka, and Hadoop are gradually becoming core IT elements for many large and small businesses. Organizations are now running various types of Big Data Applications and providing their customers with much more useful information. The uses and practices of Big Data and are evolving remarkably fast, and businesses avoiding its use risk being left behind.

Kunal Agarwal is the CEO at Unravel Data. He has a computer engineering background and handled sales and implementation for Oracle along the East coast. He met Shivnath Babu at Duke University when Shivnath was working on a research project called Starfish. He said, “Starfish was the first, and only, auto-tuning system for Hadoop. Which meant you could feed it a Hadoop app, and it would come out with the most optimized settings.”

As Kunal approached various enterprises using Hadoop, with the intent of promoting Starfish, he heard customers describe what they really wanted. They would explain that when a problem with Hadoop came up, they had no way of resolving it. There was no visibility into the system when a problem presented itself. They also lacked the expertise to fix the problem, even if they knew what it was.

According to Mr. Agarwal,

“Shivnath Babu and I saw that managing the chaos and complexity of Big Data Applications was taking up the majority of the time spent by Big Data professionals, versus working to deliver results to the business from this Big Data stack. We also saw that these problems weren’t unique to one organization, but were common in companies employing Big Data technology. Complicating matters, the Big Data ecosystem is expanding at such a rapid pace that practitioners are unable to keep up. Lack of expertise is often cited as one of the primary reasons for Big Data projects failing or slowing down.”

Solving the Problem

Creating Unravel required a broad spread of scientific knowledge and industry experience. With this in mind, Shivnath and Kunal put together a team of creative researchers. Together, the Unravel team used their combined experience to develop a program capable of presenting analytics-based solutions in simple English, and allowing users of any skill level to resolve their problems quickly.

Unravel Data eliminates the need for multiple tools in handling the problems of Big Data. Because Unravel maintains an overview of the clusters, the operations staff can offer their users a realistic service level agreement that guarantees certain levels of performance.

Kunal Agarwal describes the situation, saying:

“Companies move more workloads to production grade, on Big Data stacks. Then they realize, Wow, we never thought about these problems, and we need something to help us out with these problems. I’ll give you an example. Box has been a customer of Unravel for about a year and a half. When they started, it was just like any other company. When they started their Big Data practice, they had five guys running it, they had four or five developers, one operations guy, and they were running twenty to thirty jobs or Big Data Applications every day. Then they figured out, Wow, this can help solve x,y, and z use cases, that we didn’t think it would solve. That let them increase the number of users, and the number of jobs, or applications permitted on this cluster.”

Fast forward two years. Box is now processing fifteen thousand to twenty thousand jobs every day. They have about fifty to seventy end users on any given day submitting those jobs. But guess what? They still have one operations guy. And it’s not because they can’t pay for another one. “There’s no Hadoop operations guy out there that understands all the internal workings of Hadoop and Spark and things like that who can effectively manage these systems. There is a lack of knowledge within the entire industry.”

“So that guy was becoming the bottleneck. The end-users may only know NoSQL. They may not understand how Spark works.” The end-users get stuck. They use to crowd around the operations guy’s work table, and ask, “Why is my Big Data Application freezing? Why is it moving so slow?” He had his own problems to figure out, like how can he meet all those separate SLA’s?” He was becoming the bottleneck, with the entire organization slowing down, and a backlog of problems that were not getting solved.”

“That is why they chose Unravel,” Kunal said during the interview.

Operational Transparency and Efficiency

Unravel provides a “monitoring and optimization solution” targeted just on the Big Data Application stacks. It offers full usage analytics, describing what is happening within the system and automatically resolving a number of common problems. Additionally, it performs a root cause analysis and gives a remedy for solving the problem. Unravel provides:

Automatic alerts regarding production issues
Identifies a root cause in less than a minute
Monitors the complete stack and locates bottlenecks
Automatically locates errors and inefficiencies during production applications
Offers understanding of individual applications

Previously, the visibility of a Big Data Application had not been available. The situation was made more confusing by the fact a problem within an application could come from anywhere inside the stack. Inefficient data partitioning, a bad code, mismatched system configuration settings, or infrastructure issues can all cause problems. Another problem in running multiple applications are the “prioritization and resource contention” consequences, which runs the risk of slowing both individual applications and the system’s overall performance. Unravel Data helps resolve these problems by automatically pinpointing and correcting them.

Unravel and the Cloud

The Cloud is becoming quite a popular place for processing Big Data. It offers nearly infinite data storage, and freelance services to process the Big Data. There has been hesitation by some large companies to move to the Cloud though, resulting in a slow, piecemeal moving process. On-premises software and hardware infrastructure are so expensive, companies tend to waffle on the decision of completely abandoning their on-premise system, and take on yet another learning experience while moving to the Cloud. One option allowing them to have one foot in both worlds is the hybrid model, which provides a gradual transition to the Cloud.

“Large companies,” Kunal said, “With on-premise data clusters, that were already on Big Data stacks, are thinking of expanding their use cases on the Cloud. That’s where the hybrid model comes in.” Companies that are already using a number of Big Data Applications can now more easily employ Cloud technologies through new, more efficient models.

“Now if you are not one of these big companies, and have been sleeping for the last four years, and now you’ve woken up, and you’re like, Hey, we should try this Big Data thing, then you figured that, with all these added capabilities, security, and Big Data Applications, being able to run on the Cloud, why do I need to go with an on-premises model?”

Unravel is designed to be a multi-platform system. It can support various systems such as Spark, Hadoop, and Kafka, whether on-premises, or in the Cloud. With Unravel Data, teams can solve on-going operations problems. The system provides actionable insights, allowing applications to run faster using fewer resources.

Using Unravel

“Big Data systems are extremely complex, and getting them to work efficiently is considered a black art,” said Kunal Agarwal. “Setting up a Hadoop or Spark cluster is just the beginning. To truly derive business value, these systems and the applications running on them must be high-performing and easy enough for everyone to use.” Unravel Data is comprehensive, solving the interoperability and performance challenges of multiple application engines in a complex Big Data Application environment. “That’s the value we bring to the market.”

YP.com and Autodesk are two companies already using Unravel to “handle” their Big Data platforms. Charlie Crocker, Autodesk’s product Analytics Director commented,

“Unravel Data improves reliability and performance of our Big Data Applications and helps us identify bottlenecks and inefficiencies in our Spark and Oozie workloads. It also helps us understand how resources are being used on the cluster and forecasts our compute requirements.”

The world of Big Data is complex, but with new platforms available to more easily handle such complexities, Big Data is beginning to not seem as big as it used to.

TAKE OUR DATA MANAGEMENT CERTIFICATION PREP COURSES

Data Topics

Big Data Applications: Managing Complexity with Success

Leave a Reply Cancel reply