As some of you already know, I am dedicating these summer days to the writing of my new book, “99 Questions About Data Management,” which follows in some way the book “20 Things You Have to Know About Data Management.” After the many questions I have received, I find it interesting to answer this one: What is the difference between a data marketplace and a data catalog?
Data Marketplace vs. Data Catalog
A data marketplace is a platform where data sets can be contracted and made available. It differs from a data catalog in its approach, since it presupposes that there are “transactions” between parties – not necessarily “external,” monetized data.
If in a data catalog we make available the data of an organization, a data marketplace presupposes that there are a series of controlled processes so that this data can be used.
While a data catalog usually deals with a single organization, even if it has different domains, a data marketplace can be deployed so that data assets can be made available to other organizations such as business partners or stakeholders.
The data is made available, can be tested, and at the time of access perhaps – due to the proprietary nature of the data – it will be necessary for the user to sign a “contract” and for the data owner to be able at all times to track the use of the data and for there to be at all times traceability of the processes. The data owner grants permission to access the data or a set of data for a specific period of time. He can also grant permission to “subscribe” to the data and receive all the variations that the data may represent over time.
A Data Marketplace Uses “Data Sharing Agreements”
In a data marketplace the presence of “data sharing agreements” is essential. Data sharing agreements are contracts or legal agreements between the parties involved in the exchange of data. These agreements establish the conditions, terms, and responsibilities related to the use and sharing of data between participating organizations or entities.
Data sharing agreements are important because they define the rights and obligations of each party involved in data sharing. These agreements generally include the following elements:
Purpose and scope: Establishes the purpose and objectives of the data sharing, as well as the scope of the data sharing.
Definitions and terminology: Defines the key terms and definitions used in the agreement to ensure a common understanding between the parties. These terms should be compiled from a business glossary.
Responsibilities and obligations: Specifies the responsibilities and obligations of each party involved in the data exchange. This may include how data will be collected, stored, protected and shared, as well as the security measures and privacy practices to be followed.
Permissions and licenses: Determines the permissions and licenses required for the use and sharing of data. This may include copyright, intellectual property or other legal considerations.
Confidentiality and privacy: Establishes the terms related to confidentiality and privacy of the shared data. This may include clauses on non-disclosure of confidential information, compliance with data protection regulations, and management of privacy-related risks.
Data retention and disposal: Defines timelines and procedures for retention and disposal of shared data and controls the life cycle.
Regulatory and legal compliance: Establishes compliance with laws, regulations, and standards applicable to data sharing. This may include compliance with data protection legislation, information security, and other relevant legal requirements. Different industry compliance such as RDA, IFRS17, IDMP, or ESG may also have an influence.
Conflict resolution: Establishes the mechanisms and measures for the resolution of conflicts that may arise in connection with data exchange.
Term and termination: Determines the duration of the agreement and the conditions under which it can be terminated by either party.
Traceability: At all times it must be possible to trace the use of data for compliance or audit purposes.
Data sharing agreements are fundamental to establishing a sound legal and contractual basis for data sharing, ensuring that regulations and legal obligations are met, and protecting the interests of all parties involved. These agreements also help establish trust and transparency between the organizations exchanging data.
Key Features of a Data Marketplace
For a data marketplace to be effective and attractive to users, it should have the following characteristics:
Data variety: It should offer a wide variety of data sets from different sources and in different formats. This will allow users to find the data that fits their specific needs. Ideally, semantic search, ontologies, taxonomies, or another form of querying should be available to facilitate effective searching for users.
Data quality: The data offered in the data marketplace should be of high quality and reliable. This implies that the data is current, accurate, complete, and relevant to its intended purpose. Ideally, the data marketplace should have a data verification and validation process to ensure data quality. In addition, the user should even be able to know the trend of compliance with the quality dimensions agreed upon by the organization.
Privacy and security: The protection of privacy and data security is fundamental. The data marketplace must have robust security measures in place to protect the stored data and ensure that only authorized parties can access it. In addition, it must comply with applicable data privacy regulations, such as the General Data Protection Regulation (GDPR) in the European Union. At all times there must be legitimate access to the data and security must not be an impediment to accessing the assets.
Ease of use: The platform must be easy to use and navigate for both buyers and sellers of data. It should have an intuitive interface that allows you to search, view and compare data sets, as well as perform transactions in a user-friendly manner. To make an example it should resemble something like Spotify, where I find what I am looking for or my feature users can find suggestions depending on the role and history of my queries.
Discovery features: There should be advanced search and filtering features that allow users to quickly find the data they need. This can include the ability to search by categories, keywords, specific attributes and other relevant criteria. It should even be able to “advise” the most useful data sets depending on the habits of the user or their department. As we have already said, the ideal is to use semantic search, ontologies, or taxonomies.
Transaction transparency: The data marketplace must be transparent about the terms and conditions of transactions. It should provide clear information on pricing, licensing, usage restrictions, and any other relevant details so that users can properly evaluate the data and its acquisition.
Comments and ratings: Users should have the option to leave comments and ratings on the datasets they have acquired. This helps maintain the quality and reputation of the data offered in the marketplace, and allows other users to make decisions based on past experiences.
Integration and compatibility: The platform must be compatible with different data formats and systems to facilitate integration with existing tools and applications. This will allow users to use the data efficiently in their projects and analysis.
Technical support: A good data marketplace should provide adequate technical support for users, either in the form of documentation, tutorials, discussion forums, or personalized assistance. This will help resolve any technical issues or answer questions related to the data and the platform. It has to present an authorization flow and a ticketing system to resolve doubts or suggest changes that have to be presented to the data owners and have to be tracked.
A data marketplace and a data catalog are two related concepts, but with slightly different approaches and functions:
A data marketplace focuses on facilitating transactions and trading (albeit internal) of data by providing an environment where “sellers” can list their data and “buyers” can search, browse, and purchase or subscribe to it – as long as the authorization flow is met and the data owner grants permission.
A data catalog is a tool or system that acts as a centralized repository of metadata about the datasets available in an organization. It provides a description and inventory of available data, including information about its structure, content, quality, source, owner, permitted use, and other relevant details. The primary purpose of a data catalog is to facilitate data search and discovery by enabling users to find and understand what data exists within an organization and how they can access it. As highlighted in DAMA’s DMBoK2, the data catalog is the last step in making data assets available to the organization after the successful creation of both the data dictionary (basic metadata) and the business glossary (business metadata).
In summary, the main difference between a data marketplace and a data catalog lies in their primary focus. A data marketplace focuses on the transaction while a data catalog focuses on providing an overview of the data available within an organization, facilitating search and discovery of the data. Both concepts can complement each other in a broader data ecosystem, where a data marketplace can leverage a data catalog to organize and present data sets available for sale. It will always be necessary to have a Data Governance environment and a solid integration layer for all of this to work and be embedded and incremental.