The Data Management industry has seen a significant rise in the recent interest of data containers. As Cloud Computing has gained popularity, methods for transporting data and its processing instructions, have been investigated, with data containers coming in as a viable a solution. Data containers solve the problem of getting software to run reliably, while […]
A Brief History of the Hadoop Ecosystem
In 2002, internet researchers just wanted a better search engine, and preferably one that was open-sourced. That was when Doug Cutting and Mike Cafarella decided to give them what they wanted, and they called their project “Nutch.” Hadoop was originally designed as part of the Nutch infrastructure, and was presented in the year 2005. The […]
Deep Learning Demystified
The “deep” in deep learning refers to the number of hidden layers involved in the design. Deep learning is a way of training artificial intelligence (AI) to recognize specific data, such as speech or faces, and to make predictions based on previous experiences. Unlike machine learning, which organizes and sends data through predefined algorithms, deep […]
So You Want to be a Data Manager?
A data manager develops and governs data-oriented systems designed to meet the needs of an organization or research team. Data Management includes accessing, validating, and storing data that is needed for research and day-to-day business operations. Currently, a wide array of organizations are using big data to gain insights into customer behavior and to provide […]
Case Study: Cox Automotive Solves Data Drift and ETL Challenges
According to Pat Patterson, Community Champion at StreamSets, “data drift” is such a problem now that “only about one fifth of a data analyst’s time is actually spent analyzing the data.” The remainder is spent “wrangling it into shape and getting it from where it is to the actual analysis platform.” Speaking at the Enterprise […]
Fundamentals of Self-Service Business Intelligence
It’s clear that there is considerable recent market movement towards self-service business intelligence (SSBI) in the numerous vendor offerings available. There is also a growing concern among the Data Science community that ordinary business users may misunderstand or misinterpret the available data, leading to incorrect results. Experienced data scientists have a tremendous ability to analyze, […]
Fundamentals of Robotic Process Automation and Data Management
Data Management software is essential to providing organizations with critical insights about their customer’s behavior. Robotic process automation (RPA) is a process in which software programs perform repetitive Data Management tasks, such as data validation, email responses, normalization, and metadata organization. Put another way, RPA automates the mundane. It does this by observing and imitating […]
Messy Data Shouldn’t Stop Machine Learning in Its Tracks
Click to learn more about author Jon Reilly. Businesses are creating data at an incredible pace that will only accelerate. In fact, data storage company Seagate predicts it will pass a yearly rate of “163 zettabytes (ZB) by 2025. That’s ten times the amount of data produced in 2017.” Moore’s Law – the principle that […]
Closing the Data Science Skills Gap at Your Organization
Click to learn more about author Itamar Ben Hamo. Data scientists are some of the most in-demand professionals on the market. A LinkedIn Workforce Report in 2018 found 151,000 unfilled data scientist jobs across the United States, with “acute” shortages in San Francisco, Los Angeles, and New York City. And the demand for data scientists […]
Scaling Machine Learning Applications
When the number of users for a predictive model grows, it is expected (albeit often wrongly) that the machine learning powered systems will automatically scale to keep up with this growth. If the system fails to scale, processing requirements may outpace performance. Using an example from a LinkedIn article, a sample recommender system fails to […]