A data architect provides clear specifications, models, and definitions, translating a business’ Data Strategy into a Data Architecture and implementing this structure to align with an organization’s Data Governance. An architect is one who designs and advises on the construction of something. Data architects take an organization’s raw data and data assets and builds a […]
What Is a Data Catalog?
A data catalog centralizes access to all of an organization’s available data assets through a metadata inventory. This repository facilitates dataset search and retrieval so that users and systems can easily find the information needed for business. A data catalog differs from a data dictionary in its ability to search and retrieve information. Data catalogs “may […]
What Is a Data Container?
A data container is a transportation solution for a database required to run from one computer system to another. A data container is a data structure that “stores and organizes virtual objects (a virtual object is a self-contained entity that consists of both data and procedures to manipulate the data).” This is similar to the packaging of a […]
What Is a Data Democracy?
A data democracy describes a methodological framework of values and actions that benefit and minimize any harm to the public or the typical user. Organizations like Data for Democracy, initiated by Bloomberg and BrightHive, and projects like Data for Democracy, established by the University of Washington to help Myanmar transition to a data democracy, are spearheading […]
What Is a Data Dictionary?
A data dictionary is a description of data in business terms, also including information about the data such as data types, details of structure, and security restrictions. Unlike business glossaries, which focus on data across the organization, data dictionaries support data warehouses by defining how to use them. The content of the data dictionary often […]
What Is a Data Engineer?
Data engineers build Data Architecture through infrastructures and foundations. A data engineer is tasked with designing and maintaining the architecture of data systems, which incorporates concepts ranging from analytic frameworks to data warehouses. Responsibilities also include configuring, managing, and scaling data pipelines. Data engineers: Have a programming background (e.g. Java, Scala or Python) Emphasize distributed […]
What Is a Data Fabric?
A Data Fabric: “Is a distributed Data Management platform whose objective is to combine various types of data storage, access, preparation, analytics, and security tools in a fully compliant manner to support seamless Data Management.” This concept has gained traction as technologies, such as the Internet of Things, need to have a consistent way of […]
What Is a Data Lake?
A data lake is an environment where a vast amount of data, of various types and structures, can be ingested, stored, assessed, and analyzed. Data lake technologies can scale to massive volumes of data, and combining datasets is easy with data stored in a relatively raw form. A data lake architecture can centralize data over distributed storage, providing a scalable, […]
What Is a Data Lakehouse?
A data lakehouse is a data storage space containing unstructured and structured data. While data warehouses can store structured data and data lakes are designed for unstructured data, the data lakehouse was developed to provide a resource for both types of data. Data lakehouses allow users to combine the structure and features of a data warehouse with the low-cost […]
What Is a Data Mart?
A data mart is a subset of a data warehouse designed to service a specific business line or purpose. Data warehousing pioneer Ralph Kimball conceived of data marts to “begin with the most important business aspects or departments.” This bottom-up dimensional approach creates a user-friendly, flexible data scheme that delivers reports rapidly, without having to […]