Kinetica Aims to be The Data Science Company with its GPU-Accelerated Database

By on

“I feel the need…the need for speed.” That quote could be as easily applied to business users eager for the power to move forward with Advanced Analytics as it could to generations of Naval pilots inspired by Tom Cruise in the 1986 movie Top Gun. With Version 6.0 of its GPU-accelerated database, released in January, Kinetica is ready to help them out.

The company’s technology was built upon the proposition of providing tens to hundreds of times faster workload performance, especially for speedy analytics requirements, with the help of GPUs featuring thousands of cores and purpose-built for parallel tasks. GPUs are well-suited to the types of vector and matrix operations found in Machine Learning and Deep Learning, the company says, and Kinetica makes it possible for Machine Learning and Data Science workloads to be performed in-database and to take advantage of the GPU.

“By way of parallelism you can do amazing amounts of work in a fraction of time,” says Eric Mizell, VP, Global Solutions Engineering, at the company.

Support for user-defined functions (UDF) that can take advantage of the parallel processing nature of GPUs in the new version paves the way for business analysts to themselves perform in-database analytics, versus having to select data and have it moved over for Data Scientist teams to go to work on – and then waiting (and waiting) for results. “We say, ‘Why not democratize this?’” Mizell says.

To that end, why not make it possible for business users to make calls directly to Machine Learning and Artificial Intelligence libraries, such as TensorFlow or BIDMach, for running a simulation like Monte Carlo on their own within a database, for instance? “They can make these calls and get results in a reasonable amount of time and be able to do analytics,” he says. “So you really are changing the way analytics can be done.”

Data Science teams will still be needed for critical work such as finding the best algorithms for the business and performing data explorations. But it’s to everyone’s benefit if the business users who have a keen understanding of their own data can be directly exposed to Advanced Analytics so that they can move fast to drive business value. When it comes to regular workloads, Mizell says, “we need to move them up the stack [for analysis by] people who understand the data.”

These individuals may not have Data Science skills per se, but they’re talented enough SQL coders that if Advanced Analytics capabilities can become available to them through their normal avenues, “companies can move to the next level,” he says. “And that’s what they want.”

Location, Location, Location

It’s worth noting that Kinetica has deep roots in native geospatial analytics. And its position is that by now combining the power of GPU-accelerated database technology with UDFs, customers can gain a valuable edge in real-time, location-based analytics that will be a requirement in the new era of cognitive computing.

“Almost every company has location-based analytics problems that only grow when you add in mobile devices,” says Mizell. As an example, he points to the talk that’s been underway for years to try to target real-time ads to consumers in the vicinity of retailers that want to make an offer to them via their smartphones, but adds that few do it because the real-time analytics behind accomplishing this has been too hard.

Reveal is a web-based interactive visualization platform that’s now included with Kinetica to support interactive real-time data discovery, including conducting interactive location-based analytics on massive datasets. That includes most Internet of Things (IoT) datasets, for instance, which have a spatial requirement: As more companies move to embrace the IoT, it enhances the need to be able to perform analytics on spatial data.

“It’s not enough to know that something happened but when and where,” says Mizell. Now that business users can dive in to Advanced Analytics, how can they easily and quickly explore billions of geo-located data points faster to perform tasks such as filtering and sorting? “Reveal is about exploring the data that’s in Kinetica,” he says, noting that one prime value of GPUs is that they’re so good at rendering real-time visualizations. “You can filter and render 4 billion data points and send it back to the UI in under 200 ms,” he says.

Reveal features include enhanced mapping capability and integration with major mapping providers, including Google, ESRI, MapBox, and Bing, as well as the ability to make real-time dashboards by picking, dragging and dropping an assortment of analytical widgets.

Another Boost Included

Kinetica’s database now also includes a VRAM Boost Mode, which allows users to prioritize their data tables and can force datasets to always sit in very fast cluster-wide GPU Video RAM (VRAM) for lightning fast query performance, the company says.  This gives customers even better performance, while also still being able to leverage cluster-wide system RAM to both scale up and scale out to multi-terabyte in-memory processing.

The boost, Mizell says, lies in the notion of pinning specific data sets into VRAM for ultra low-latency performance. A query could take 800 ms when data has to move from system memory to VRAM. “But if I pin it in VRAM it might take just 20 ms,” he says, thanks to completely eliminating the need to move the data.

That’s a dramatic decrease in latency and a good strategy to take for the 20% of data that 80% of queries run against in a typical company. “If I had 100 gigabytes of super important data and wanted the queries to be 20 ms, I could do it,” he says.

Enterprise Class     

Mizell also comments that Kinetica has taken a lot of steps in the latest version of its solution to tighten up enterprise gaps. Built-in and multi-level security and permission-based widgets, views and dashboards are part of Reveal, for instance.

Also included in Version 6.0 are full GPU-accelerated SQL-92 query support through certified JDBC and ODBC connectors and NVIDIA NVLink support for accelerating database performance. Data can move between the GPU and CPU three times faster on average compared to traditional PCI Express, the company says.

“We did a lot to close the enterprise gap this year,” Mizell says. “But we’ll be doing more and more.”

In the meantime, he thinks it’s important for potential users to consider the fact that Data Science has been something of a black box. But now it can be an open book. “When you put data in the hands of business users and they can take action on data with these frameworks, you will see things quickly advance.”

Leave a Reply