Data contracts are formal agreements between a data provider and a data consumer. They abstractly describe the data as well as the exchange structure, format, and characteristics and schema of information.
Why is a data contract required? Data contracts establish guidelines and rules for data sharing, storage, deletion, or archival. Moreover, they ensure that the data is reliable, high-quality, and can be trusted by all parties involved. This can include further guidelines on if the primary data consumer can share data with a secondary consumer. It can also include ethical boundaries and constraints around data processing, privacy, and security.
The concept of service level agreements (SLAs), also known as service contracts, is familiar to most business personnel. Customer expectations for a service provider are outlined in these written agreements, as well as what might happen if those expectations aren’t met.
What Is the Difference Between a Data Contract and a Service Contract?
A data contract defines the structure and format of the data exchanged between the data producer (client) and the data consumer (service).
A service contract defines the functionality and operations exposed by the service.
Some important aspects to have in data contracts:
- What data is required for processing by the data consumer (e.g., application consumer, reporting analyst, data scientist, etc.)
- Definitions of data that is required for consumption
- Ingestion type and frequency of ingestion
- Data processing purpose, and entitlements
- Means of delivery to databases on-premises, cloud, or hybrid multi-cloud
- Data owners, stewards, and data personas involved like custodians
- Personal data classification, entitlements, and data access required
- Control requirements for security and governance (e.g., anonymization, masking, etc.)
- Validity of the contracts in time, and controls associated with storage, archival, and deletion
- Data Quality requirements and thresholds of acceptance for trustworthy consumption
- Schema definition and formats in which data is required
- Real-time or batch delivery of data to the consumption landscape
Is It Important to Define and Validate Contracts?
In an organization, a data engineering team must not only define interfaces between different functional areas but also demonstrate that they are functional and reliable.
- By using contracts, the interface author ensures that it does not inadvertently cause quality drops or breakages downstream.
- The consumer of the interface relies on contracts to check that the interface is not and will not be broken.
Data contracts can be a collaboration as well as a governance mechanism that can induce requirement-thinking early into the lifecycle of software. They can also introduce non-functional requirements that can be continuously validated on data pipelines. Often managing data requirements is an ignored discipline that is required for efficient and quality data consumption for business processes, analytics, or reporting.
Are Data Contracts an Architectural Construct?
Organizations are modernizing their tool stacks and distributing their architectures with thriving data products. With the growing volume of data, the metadata associated with it is also growing, as is the need to define data and manage it. Data must be discovered and defined as it is created in a central name space called a catalog. These spaces serve the objective of creating a common understanding of data, its meaning, and its usage in business processes. Catalogs discover data as it’s produced, and the people producing data and consuming it can define it.
As stated above, one significant aspect of data architecture is maintaining data meaning consistently. The technology architecture describes the strategy and design of the technology components that work together to achieve these data capabilities. Overlaying data contracts over the top of the engineered pipelines or APIs can result in better governing the architecture. Data contracts are driven by semantics and can be an excellent governance mechanism to manage data architecture, and associated piping and reverse-piping of data.
Do Data Contracts Require a Culture of Data Democratization and Marketplaces to Thrive?
Defining data and making it available by continuously discovering data produced by people and systems can help personnel create data contracts that are sustainable rather than a one-time success. Moreover, data democratization is a culture change based on a concept that enables easy access to data. As data becomes more accessible and available, it becomes easier to monetize it directly and indirectly. This can be realized through an internal marketplace where data can be shopped for and acquired for various projects.
Further, a marketplace can help to foster a culture of collaboration and innovation across the organization. It can also help to encourage the personnel to go to a portal and search for data, and request for it. This, in turn, promotes sharing and reuse of data, which can lead to better insights and outcomes. Allowing business users to source and consume relevant data for their instantaneous reporting or generation of insights, can reduce significant turnaround time in acquiring or sourcing data traditionally. Another advantage of democratization is having the data consumers appraised on new data acquired along with changes to existing data.
Marketplaces can provide the capability of having consumers shop for data while also putting together requirements for data that can translate into formal contracts that can be shared with engineering teams for fulfillment.
Benefits of Data Contracts
There are numerous benefits of using data contracts. Here are the top seven:
- Data contracts provide the ability to enhance code while maintaining integrity with the downstream consumers.
- They also foster the ownership of data to the producers, while building collaboration on the quality of data between consumers and producers.
- As governance mechanisms are agreed upon between data producers and consumers, data contracts provide formality for the auditability of data management controls.
- By managing the data lifecycle using entitlements and time limits associated with storage, archiving, and deleting data, data contracts provide objective ways of managing data risks.
- Data contracts foster responsibility of stewards in gathering data requirements and associated exchange structure, format, and characteristics of data are elicited from business, operations, process, and product SMEs.
- Trust in data is imbibed into the consumers, with a capability to define quality requirements publicly and globally to be reused by other consumers.