by Angela Guess
A recent press release reports, “DataScience, Inc., today announced the release of Grunion, a patent-pending query optimization, translation, and federation framework built on top of Apache Calcite and integrated into Apache Spark. Designed to bridge the gap between data science and engineering teams by removing the need to manually translate code from one language to another, Grunion is the first project out of DataScience Labs, the company’s testing ground for experimental data science projects. Grunion limits the need for expensive and slow ETL processes by providing a unified query language and APIs to push down complex query operators, joins, functions, and aggregations into SQL and NoSQL databases. But Grunion’s most compelling feature is its ability to integrate with Spark SQL’s Catalyst optimizer, essentially turbocharging its capabilities.”
Jason Slepicka, senior data engineer at DataScience, commented, “Spark’s level of support for pushing down queries into data sources is limited… With Grunion, you can push down just about anything into a SQL or NoSQL database that the database supports, and at an accelerated speed. We tested Grunion on the TPC Benchmark™ DS, an industry standard for measuring performance in big data systems, and discovered that it can fully push down and parallelize Spark SQL queries against a relational database to achieve execution times 10 to 30 times faster than Spark can achieve alone.”
The release adds, “Grunion enhances DataScience’s enterprise platform, the DataScience Cloud, where users can deploy models built in their language of choice without rewriting code into a production stack language or PMML. The platform also allows notebooks, models, and other files to be grouped together in the same repository or project, regardless of the language they were written in.”
Read more at Marketwired.
Photo credit: DataScience