A data catalog informs customers about that available data sets and metadata around a topic and assists users in locating it quickly. A data catalog differs from a data dictionary in its ability for searching and retrieving information.
“May have started as little more than repositories for database schema, sometimes accompanied by business documentation around the database tables and columns.” But, “instead looking up a table name and reading its description, users can search for business entities, then find data sets related to them, so they can quickly perform analysis and derive insights.”
While business terms, found in a data catalog, can be also found in business glossaries, a data catalog looks more like a directory. Data catalogs assume users already know or have easy access to business definitions. Data catalogs self-service capabilities make them valuable in Business Intelligence.
Data Catalog Use Case Examples Include:
- Harvard Open Door Project (HODP), created “to increase transparency and solve problems on campus.”’
- IBM Watson connected customer data and advertising information, for an automotive company, to better target the right audiences at the right time.
- Kansas City, MO used open data catalog to “drive decisions to save money through more efficient repairs and maintenance of streets, water lines and other infrastructure.”
- Financial Industry Regulatory Authority (FINRA) created a data catalog “that stores technical metadata to support querying and data fixes. In addition, it features a UI that allows data scientists and other consumers to explore the data sets.”
(Image Reference: The Costal and Marine Geology Data Catalog)
Other Definitions of Data Catalogs Include:
- “Business-oriented directories that help users find the data they need, quickly.” (Sokolovsky and Mahajan)
- “Solution designed for business users to solve data-centric issues that hold decisions, business processes and outcomes hostage ( TechRepublic)
- Accessible data for self-service analytics and Data Science initiatives through a 360-degree view (IBM)
- A platform to share and discover otherwise hard-to-find data sets, while keeping ultimate control over the data in researchers’ hands (Health Sciences Library System, University of Pittsburgh)
- “A searchable and browsable online collection of data sets.” (NYU Health Sciences Library)
Businesses Need Data Catalogs to:
- “Utilize, enrich, manage, and value a company’s information.”
- Find and classify data at scale.
- Drive digital transformation such as Machine Learning and AI.
- Enhance marketing, sales, operations and just about every other area of an organization.
- “Improve data visibility and better enforce data security policies.”
- “Allow any users, from analysts to data scientists or developers, to discover and consume data sources.”
Image used under license from Shutterstock.com