How to Dive into Data Lakes and Not Drown

By on

jz_datalakes_120716Kyvos Insights has a question for Enterprise Analytics executives: How are your Data Lakes working out for you?

For many that embarked on journeys to stock up Hadoop based Data Lakes, with everything from structured transaction records to non-relational data such as log files, Internet clickstream records, sensor data, social streams, and so on, the answer has been mixed. On the plus side, data from multiple places that could never have been combined before because of the burdens of size and scale now could have a home.

On the downside, being able to query data and run reports in a reasonable amount of time was a challenge, especially for large customers in industries such as financial services, telco, and technology. A financial services organization, for example, looking to do risk analysis across all its asset classes for a holistic view of its exposure, may have tried solutions such as Hive, Impala and Spark SQL, but wound up still struggling for a reasonable response time given the massive number of rows of data accumulated.

“Using standard SQL queries on data inside Hadoop wasn’t working,” says Ajay Anand, Kyvos Vice President of Products. Such outcomes are disappointing to those who went into Hadoop and dived into Data Lakes with a lot of excitement about their potential for the average user, he says, only to find that “it’s really hard to get insights into that data once you got it in.”

Data Scientists or other technically-oriented employees with the patience and the skills to learn new ways of dealing with these systems and writing complex programs to help expedite processes were getting the most value out of the data. But “closing that last mile to the business user was a problem. As a result Data Lakes were not delivering on the promised ROI and a lot of Hadoop projects were languishing,” he says.

Kyvos recently delivered a new version of its Kyvos massively scalable, self-service analytics solution to help make Data Lakes more accessible to the average business analyst user. Its system enables the building of multidimensional online analytical processing (OLAP) cubes on Hadoop. It extended an idea that rose to prominence in the 1990s, when OLAP cubes appeared “as one of the first pervasive forms of analytical visualization, enabling a dataset to be depicted in a multi-dimensional manner in a cube format, for slicing and dicing, to see more granular detail,” according to a research spotlight by 451 Research. The same report notes that OLAP is now being championed by startups applying cube building to the issue of interactively analyzing data in Hadoop.

Data Lakes for All

Users organizing data into these cubes can build out all the aggregations, indices and so forth and store them on a Hadoop cluster in a distributed way to query and get responses instantly, Anand says. With what Kyvos calls its BI Consumption Layer atop all the data residing in Hadoop, users can use their existing BI tools to connect directly to the data in their Data Lakes without experiencing the performance drag on response times.

With all the information stored in cubes, users can drill down to the lowest level of detail interactively from their familiar tools, giving them lots of power without forcing them to learn to use new solutions or methods. Anand says that running queries with even hundreds of billions of rows of data won’t choke the process. “They can interactively look at different aspects of the data, query and quickly get insights from it that they couldn’t before,” he says.

With it in place, not just Data Scientists, but any users, in an organization can get easy access to and efficient use out of the data in the lakes from their desktop using their tools of choice. It’s data democratization at work, Anand says.

“With our Kyvos 2.0 solution in the middle now you can connect your daily business tools like Excel seamlessly, so users don’t even need to know it’s in Hadoop,” he says. Whether using Excel, Business Objects, IBM Cognos, MicroStrategy, Tableau or TIBCO Spotfire, the BI Consumption Layer provides them with all the advantages that Data Lakes promise, with instant response times, he says.

One technology company that Kyvos has worked with has been able to use the solution to combine data from its marketing and financial organizations with other customer information sources to provide the insight that its CMO and CFO were looking for.

“For the first time they were able to show interactively how to get the questions answered by bringing data into the Data Lake and running our solution on top, connecting via Tableau and Excel,” he says.

It was so exciting to see Hadoop mature in this way, start delivering on value so the people can see the fruits of their big data strategy, he says.

The need for interactivity and speed is critical for anyone trying to get to the bottom of anything. Imagine, he says, that a Web search took you ten minutes – while technically not a lot of time in the real world, waiting ten minutes to find a review of a product you’re interested in buying or the ten closest restaurants to where you are would completely change the user experience. “It’s the same thing for analytics,” he says. Having an instant response lets you “follow your chain of thought, and changes the whole experience for the business user.”

The New Norm

Anand says this is becoming the new normal in the analytics realm. Since it launched its solution last year, he says, other tools are coming onto the market that change the expectations around query responses, though he believes Kyvos remains alone in building multidimensional OLAP cubes as it does. “That’s our IP,” he says, explaining that other vendors’ approach to doing some levels of aggregation tend to be more manual, require more advance knowledge of queries to come, and are less easy for customers to grasp.

He also says that Kyvos’ approach differs from others in that response times are fast the first time a query is run, not after its results already have been cached. “You can’t anticipate what query they will run, so a consistent fast response time across whatever you want to analyze is important,” he says. The future for Kyvos includes expanding its presence in new markets and dealing with more ways of exploring data in OLAP cubes, as well as real-time access of new data coming in.

Asked whether customers’ users have any resistance to taking the first step to an approach like this, Anand says no. He notes that many of them, even as they moved to Hadoop Data Lakes, continued to spend millions of dollars every year on data warehouses.

“They jumped into Hadoop as a possible way out, but they’d get started and not be able to complete migrate off the data warehouse, so now they were paying for that and the new structure,” he says. “How was money being saved? That’s where we come in, to help you migrate your processes from there to here.”

Leave a Reply