Apache Spark is an open-source, distributed computing system that provides a fast and scalable framework for big data processing and analytics. The Spark architecture is designed to handle data processing tasks across large clusters of computers, offering fault tolerance, parallel processing, and in-memory data storage capabilities. Spark supports various programming languages, such as Python (via […]
Data Science Tools: Smart Assistants for Data Analytics
In a globally competitive world, businesses that haven’t invested in Data Science tools can have difficulty making timely decisions or applying actionable insights. Today, data scientists’ greatest asset is the technology platforms or the tools that they have access to. The tools of the trade can mine and analyze petabytes of data in seconds and […]
A Development Environment for Data with CI/CD
Click to learn more about author Einat Orr. Data engineering is the science and art of producing good and timely data. Its goal is to deliver data to users even more than to deliver applications. There are great methods and tools that help deliver applications with consistently high quality. What are the methods and tools […]
A Brief History of Open Source Data Technologies
Openly sharing information has been a part of human culture since the beginning of civilization. Information would be shared with the general community and the practice has had a powerful impact on the development of tools and machinery. In opposition to this practice, is the concept of ownership and control over new ideas and concepts, […]
Case Study: Deriving Spark Encoders and Schemas Using Implicits
Click to learn more about author Dávid Szakallas. In recent years, the size and complexity of our Identity Graph, a data lake containing identity information about people and businesses around the world, begged the addition of Big Data technologies in the ingestion process. We used Apache Pig initially, and then migrated to Apache Spark a […]
Top Programming Languages for Data Science and Machine Learning
Click to learn more about author Manan Ghadawala. Software developers love arguing about which programming language is the best. However, the criterion for what is “best” is confusing. When we discuss software development for the machine learning and data science fields, this question is timeless and will never lose its relevance. Most useful programming languages […]
How Interactive Technology is Revolutionizing Data Analysis in 2019
Click to learn more about author Pippa Edelen. The 21st century has seen some big developments in data analysis. It’s seen: The birth of the cloud – arguably in 2000 The development of Big Data – in 2005 The creation of technologies such as Hadoop (2006) and Spark (2014) that allowed computation on enormous data […]
Alteryx Acquires ClearStory Data to Accelerate Innovation in Data Science and Analytics
A recent press release states, “Alteryx, Inc., revolutionizing business through data science and analytics, today announced that it has acquired ClearStory Data, a privately held software company based in Menlo Park, Calif. ClearStory Data is an enterprise-scale, continuous intelligence analytics solution for complex and unstructured data. Since its founding in 2011, ClearStory Data has focused […]
Aerospike Unveils Aerospike Connect
According to a new press release, “Aerospike Inc., the developer of the world’s most trusted and reliable enterprise-grade, non-relational NoSQL database, today launched the Aerospike Connect family of add-on modules, as well as a new REST API, to make it even easier to integrate the Aerospike Database into both new and existing enterprise infrastructure systems. […]
Three Big Data Fears (And Why You Should Not Worry)
Click to learn more about author Mathias Golombek. We live and work in a world that has seen Big Data come to the forefront of nearly every sector of our lives. From medicine and mechanics to technology and retail, data gathering is big business, and now more than ever, it’s shaping the way we live. […]