In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]
NVIDIA Accelerates Apache Spark, World’s Leading Data Analytics Platform
According to a recent press release, “NVIDIA today announced that it is collaborating with the open-source community to bring end-to-end GPU acceleration to Apache Spark 3.0, an analytics engine for big data processing used by more than 500,000 data scientists worldwide. With the anticipated late spring release of Spark 3.0, data scientists and machine learning […]
Case Study: Deriving Spark Encoders and Schemas Using Implicits
Click to learn more about author Dávid Szakallas. In recent years, the size and complexity of our Identity Graph, a data lake containing identity information about people and businesses around the world, begged the addition of Big Data technologies in the ingestion process. We used Apache Pig initially, and then migrated to Apache Spark a […]
StreamSets Launches StreamSets Transformer
A recent press release states, “StreamSets, Inc., provider of the industry’s first DataOps platform for modern data integration, released today StreamSets® Transformer, a simple-to-use, drag-and-drop UI tool to create native Apache Spark applications. Designed for a wide range of users — even those without specialized skills — StreamSets Transformer enables the creation of pipelines for […]
Ten Myths About Data Science
Click to learn more about author Daniel Jebaraj. Introduction Data Science is now being used as a competitive weapon. As with other technologies and processes that can transform the way companies operate, there’s a lot of contradictory information about it that’s causing considerable confusion. Most of today’s business leaders have heard that Data Science can […]
Datawatch Angoss Simplifies Data Science and Analytic Tasks on the Apache Spark Platform
A recent press release reports, “Datawatch Corporation today announced the general availability of Datawatch Angoss KnowledgeSTUDIO for Apache Spark, enabling organizations to act more confidently with their data and rely on consistent, trustful results in making better business decisions. In combination with its market-leading data visualization approach for building, exploring and segmenting data using patented […]
Databricks Introduces Global Partner Program
A recent press release reports, “Databricks, the leader in unified analytics and founded by the original creators of Apache Spark™, today launched the Accelerate Impact Partner Program. Through the program, Consulting and Systems Integrator partners can leverage Databricks’ Unified Analytics expertise, comprehensive training programs, and global team to empower customers. Over the last 12 months, […]
Talend Speeds Apache Spark and Machine Learning Implementations without Coding
A new press release reports, “Talend, a global leader in cloud integration solutions, today announced it will debut at the Strata Data Conference in New York City a new sandbox that brings sophisticated machine learning technologies to the hands of developers and data engineers so they can easily create smarter data pipelines. With the Talend […]
Paxata Announces Apache Spark-Powered Data Preparation Runtime Fabric
According to a new press release, “Paxata, the pioneer in self-service data preparation for analytics, today announced the general availability of its Fall ’18 release, the next major update to the company’s award-winning Adaptive Information Platform. The latest release includes a new Adaptive Workload Management capability, which delivers an elastic resource allocation service on a […]
Out in the Open: Where Big Data and Open Source Coincide
Click to learn more about author Gilad David Maayan. Big Data is a term used to describe large volumes of data in disparate formats that streams into various organizational systems at high-speed. This data requires the use of special tools to analyze it and derive insights from it that can give businesses a competitive edge. […]