In a data-driven business climate, data is playing a key role in capturing market intelligence and “actionable insights” to augment business operations. Thus, Data Management platforms, tools, and associated technologies are increasingly getting a global focus. Two Data Management technologies have created controversies and have become the topic of hot debates: the data lake and the data mesh. So just what are these technologies?
The data lake technology allows both raw, structured, and unstructured data to reside in one repository and enables comprehensive analysis of big and small data from a single location. Data lakes are considered “high performance” and “low latency” systems with proven capacity for storing and analyzing huge volumes of data. Data lake also offers the flexibility of conducting data analysis at a later time. Its most useful purpose is storing huge amounts of data.
On the other hand, a data mesh, which initially focused on domain-orientation and self-service and later moved toward “data as a product,” is a unique technology that enables collection, integration, and analysis of data from disconnected systems concurrently, so there is no need to pull in data from disparate systems into a single location and preprocess them for analysis. Data mesh is most suitable for a business environment where data needs to be integrated from many disintegrated systems or processes for fast analysis. Here is a resource that explains the difference between the data warehouse, the data lake, and the data mesh: A Quick Primer for Business Success.
Which One Should You Choose for Your Business?
If your business is just starting out on big data projects, and it already has a data warehouse, then you may use a data mesh to consolidate enterprise-wide data in a data lake. In this scenario, the data mesh and the data lake are playing complementary roles.
Otherwise, a data mesh is typically suitable for business environments with some infrastructure and no time for setting up a new infrastructure. The upside of data mesh is that it allows scalability and system integration. Here’s a data mesh case study.
If your business is just venturing into Data Management, then data lake is by far the best option to create a system from scratch. A data lake is also a great option for big data efforts, where unlimited amounts of data will have to be stored, prepared, and analyzed over a long period of time. However, data lakes are not the best choice for disconnected systems.
Additionally, you may benefit from this comparative study of the two technologies. In a Solutions Review article, the Data Management strategies of data lake and data mesh have been critically compared.
The Data Lake vs. The Data Mesh: Relative Advantages
Here are some common rules of thumb to apply when selecting one of the two technologies for highly specific business needs:
- If you have huge amounts of raw, structured, and unstructured data to store for later processing, go for a data lake
- If you have to archive massive amounts of raw data, select the data lake
- If you are looking for affordable storage space for big data, then data lake is the answer
- If you need real-time insights, go for a data mesh
- If you need near real-time reporting support, go for the data mesh
- If you have to quickly gather data from many disconnected systems for instant processing, go for a data mesh
- When you need speed and faster response, data mesh is the only choice
- If your business needs scalability, then you can either choose a data lake or a data mesh, based on your exact needs
Review what a Forbes author thinks of the data mesh.
The Data Lake vs. The Data Mesh: Relative Disadvantages
- The biggest drawback of a data lake is that it takes weeks or months to deliver insights from raw data.
- Data lake is not scalable.
- Data lake is not suitable for real-time or near real-time analytics.
- Data lake may involve costly software subscriptions on the cloud.
- The downside of the data mesh is that it’s not available with cloud service providers. It operates on your on-premise servers or a third-party web server.
Though the relative advantages and disadvantages of the two technologies have been shared here, the final selection decision rests with you, the business owner or operator. You have to make a careful decision after considering all the factors affecting your business. In the article Could the Data Mesh Solve Your Data Lake Scaling Issues?, the author tackles a common concern related to data lake.
Can Data Lake and Data Mesh Complement Each Other?
Yes, they can in specific business situations. Here is an interesting post that takes a fresh look at
Zhamak Dehghani’s 2019 paper “How to Move Beyond a Monolithic Data Lake to a Distributed Data Mesh.”
In a nutshell, data lake is a good solution for storing vast amounts of data in a centralized location while data mesh is the best solution for fast data retrieval, integration, and analytics needs.
While a data lake is a one-stop repository for storing data from different business units or different departments, data mesh is great for integrating and analyzing that data. If a business already has data warehouses set up with many disparate or disconnected processes or systems, the data mesh can integrate data from those disconnected systems and store the data in a data lake.
Data mesh is the solution for combining all types of structured and unstructured data for analysis. Organizations can also set up purpose-specific data meshes, like one for all reporting needs and one for data sharing across the enterprise.
Data mesh is a terrific solution for quick data analytics and insights. A Forbes article discusses cloud-based, data mesh infrastructure using Kubernetes and terraform to deliver fast healthcare insights.
This insightful blog post makes a case for use of data mesh for big-data projects in a “distributed domain-driven architecture.” This approach challenges the data lake technology in several ways:
- It resolves scalability issues
- It views data as a product
- It offers domain-based analytics with data responsibilities vested with domain owners
- It allows for data sharing among domain owners
Here is an interesting debate about dethroning the data warehouse. See how much you agree or disagree with the author. Your next step might be a data lake or a data mesh for your business environment.
Image used under license from Shutterstock.com