by Angela Guess
According to a new article out of the company, “Snowflake Computing, the cloud data warehousing company, today announced Snowflake Data Source for Spark — a native connector that joins the power of Snowflake’s cloud data warehouse with Apache Spark. This tight integration provides Spark developers a ready-to-use platform for diverse data that offers advanced security, high concurrency, a robust ANSI SQL dialect, and exceptional performance at any scale of dataset or workload — all without cluster setup or database management and tuning complexities. Until now, developers using Spark had to plan and build an infrastructure to store and query all of their Spark data. This included the complexity of implementing a distributed or clustered infrastructure to scale capacity and support large datasets, incurring increased effort and complexity to provision, manage, secure, and govern that environment.”
The article continues, “The new Snowflake Data Source for Spark, which is built on Spark’s DataFrame API, provides developers a fully managed and governed warehouse platform for all their diverse data (such as JSON, Avro, CSV, XML, machine data, etc.) that offers a fast, higher level connection to data with Spark’s API. The results are increased developer productivity and a simple, agile, and easy-to-deploy platform that makes it significantly easier and faster to develop and execute successful Spark projects. Companies using Snowflake’s Data Source for Spark are able to concentrate on implementing Spark-based applications without creating unnecessary complexity and delays due to needing to secure and manage their Spark data storage infrastructure. In addition, Snowflake’s unique architecture and workload management provide a high level of concurrent query workload support for multiple Spark workgroups and the ability to fully query relational and nested Spark data stored in Snowflake.”
Photo credit: Snowflake