ThoughtWorks consultant Zhamak Dehghani created the concept of data mesh as a self-serve, domain-oriented design that later evolved into a data-as-a-product design. By integrating and analyzing data from disconnected systems all at once, the data mesh architecture benefits the organization by eliminating the need to pull data from multiple systems and preprocess it.
In a traditional data architecture, such as a data warehouse or a data lake, the data is collected, stored, cleaned, and processed in a single location for further analysis. In a data mesh, however, the data remains in their respective domains, and domain teams use their domain data to develop data products for their own needs, as well as to sell those products to other consumers.
In this scenario, domain teams fully own their data infrastructures, data pipelines, and data products. The individual domain teams complete their data-processing tasks within the data mesh – in the centralized unit controlling the backend storage and compute for each domain.
Data mesh promotes distributed data across domains, self-service operations by non-IT staff, and domain-centric control of data. The data mesh’s operating principles include domain-centric data control, self-service data access, data-as-a-product (DaaP), and standardized governance within the data mesh framework.
Moreover, according to author Cameron Turner, the DaaP approach enables domain owners to move away from “centralized data lakes,” and directly sell their domain data to other domains, thus creating new avenues for revenue generation.
When Should an Organization Use a Data Mesh Architecture?
If a business has sudden scaling needs that must be executed in a short span of time, then data mesh could come to the rescue. With ever-growing data sources and data consumers, a single, central approach to Data Management can lead to unmanageable scaling issues.
In many cases, the centralized control of enterprise data creates undesirable bottlenecks. In a data-first business ecosystem, enterprises now need to think of data platforms that organically support scaling.
Data mesh promotes distributed Data Management, which naturally aligns with the distributed data creation patterns across the organization. As a general rule, if your company is interested in cloud migration, domain-driven development, or microservices, now is a good time to consider implementing the data mesh architecture.
The Benefits of Using a Data Mesh Architecture
Data mesh architectures can help businesses find quick solutions to day-to-day problems, discover better ways to manage their resources, and develop more agile business models. Here is a quick review of data mesh architecture benefits:
- The data mesh architecture is adaptable, in the sense that it can adapt to changes as the company scales, changes, and grows.
- The data mesh enables data from disparate systems to be collected, integrated, and analyzed all at once, thus eliminating the need to extract data from disparate systems in one central location for further processing.
- Within a data mesh, the individual domain becomes a mini-enterprise and gains the power to self-manage and serve on all aspects of its Data Science and data processing projects.
- A data mesh architecture allows companies to increase efficiency by eliminating the data flow in a single pipeline, while protecting the system through centralized monitoring infrastructure.
- The domain teams can design and develop their need-specific, analytics, and operational use cases while maintaining full control of all their data products and services.
- The data mesh architecture helps tackle Data Governance bottlenecks without sacrificing scale by judiciously distributing IT teams across domains with independent controls over all data-related activities.
- The data mesh maintains centralized governance standards by making the domain teams adhere to all standards, policies, and regulations while rewarding them with quick access to data, quick turn-around times, and tailored data solutions.
- The data mesh uses data catalogs to make the domain-driven data products and services discoverable.
- The infrastructure-as-a-platform (IaaP) present within the data mesh offers an automated approach to data standardization and data-product-lifecycle monitoring.
Data Mesh Architecture Challenges
This section merely touches upon some technical and implementation challenges surrounding data mesh, which require further exploration:
- Data mesh adoption forces many changes users may resist, so a lot of organizational support is required for implementation. Data engineers must be kept in the loop so that they do not work against the interests of a data mesh.
- Interoperability between different data systems is a prerequisite for a data mesh to work.
- While maintaining domain-specific control of data pipelines, domains still have to adhere to all standardized Data Governance protocols.
- In a data mesh, data access revolves around appropriate permissions, so without adequate permissions, data access may be denied.
- Data catalogs have to be kept updated for the data products to remain discoverable.
- In a data mesh, it is especially important to ensure that there is a consistent approach to configuring the cloud infrastructure, developing clear standards for data products, and adhering to standardized governance protocols.
- Migration from monolithic data warehouses and data lakes to a data mesh requires more than technological and logistic preparations – cultural and mindset changes, and commitment to adopting more of a cross-functional approach to business domain modeling.
- The minimal shared governance between domains may pose its own challenges.
The data mesh architecture, while providing direct data access to domains, retains the flexibility of incorporating a data warehouse or a data lake within its framework if needed. In a data mesh environment, the IT and business teams collaborate to build data products or to provide DaaP services to other data consumers throughout the organization.
The biggest beneficiaries of a data mesh are organizations with many domains and disconnected systems. The domain teams generate and own domain-specific data for their daily needs, but they also have the capability to build their own data products and sell them (DaaP) to other domains or other external consumers.
The data mesh architecture, instead of relying on one central data-engineering team and one data scientist for the entire Data Management operation, propagates even distribution of IT (data) teams across the organizational units. Despite potential challenges, the decentralized domain teams and IT teams get an excellent opportunity to work closely to deliver value-added products and services.
Image used under license from Shutterstock.com