Metadata is a small amount of data designed to provide reference information about other data. For example, in 280 BC, the Great Library of Alexandria attached a small, dangling tag to the end of each individual scroll. The tags gave the title, subject, and author, allowing library users to assume the content, without having to unroll each scroll, and so the scrolls could be returned to their proper location. Although librarians of those times did not call the information on the tags “metadata,” this is one of the earliest recorded examples of metadata, and eventually evolved into the card catalogs (another version of metadata), used in libraries a few decades ago. The first mention of metadata for computer systems comes from MIT’s Stuart McIntosh and David Griffel, in 1967, as they described the need for a digital “meta language.”
A primary goal of metadata is to assist researchers in finding relevant information and discovering resources. Keywords used in the descriptions are called “meta tags.” Metadata is also used in organizing electronic resources, providing digital identification, and supporting the preservation and archiving of data.
Metadata assists researchers in discovering resources by locating relevant criteria and providing location information. In terms of digital marketing, metadata can be used to organize and display content, maximizing marketing efforts. Metadata increases brand visibility and improves “findability.”
Different metadata standards are used for different disciplines (such as digital audio files, websites, or museum collections). A web page, for example, a may contain metadata describing the software language, the tools used to create it, and the location of more information on the subject. A museum collection, on the other hand, would contain metadata describing the type of art, the artist’s name, and the date of its creation.
The Late 1900s
In 1979, the International Press Telecommunications Council (IPTC) defined metadata standards and attributes that could be inserted into images. In the late 1980s, the IPTC began working on the Information Interchange Model (IIM), a file structure with metadata attributes that could be applied to images, text, and other media forms. It was completed in the early 1990s and expedited the exchange of news among national and international newspapers. Their first standard, the IPTC 7901, bridged the gap between teleprinters and computers.
Metadata attributes and standards were advanced again in 1994, when Adobe developed a technique that actually embedded metadata “into” the digital image files (IPTC headers). Adobe adopted IPTC’s IIM metadata definitions, but did not adopt the overall IIM structure. Photos containing IPTC Headers appear to be normal TIFF or JPEG images.
Though there are a variety of metadata systems and standards, there are also specialized and well-accepted models for categorizing “types of metadata.” In 1994, Francis P. Bretherton and Paul T. Singley presented a paper titled Metadata: A User’s View, which developed two distinct forms of metadata: guide metadata and structural/control metadata. Guide metadata helps researchers find specific items, usually keywords (meta tags) using natural language. The labeling of database objects (tables, columns, keys, and indexes) is called structural metadata.
In 2001, NISO (National Information Standards Organization), which focuses on creating industry standards for the information industry (publishers, libraries, and software developers), decided to get serious about metadata, and published the document, Metadata Made Simpler: A Guide for Libraries. The manual states:
“There are several different types of metadata, including descriptive, administrative, and structural. Descriptive metadata describes a resource for purposes such as discovery and identification. It can include elements such as title, abstract, author, and keywords. Administrative metadata provides information to help manage a resource, such as when and how it was created, file type and other technical information, and who can access it. Rights management metadata is a form of administrative metadata dealing with intellectual property rights. Structural metadata indicates how compound objects are put together, for example, how pages are ordered to form chapters.”
In 2001, Adobe introduced what is called the Extensible Metadata Platform (XMP). XMP represents the same types of metadata as IPTC, but includes Extensible Markup Language (XML), which uses coded instructions for displaying text, and the Resource Description Framework (RDF), a simple, all-purpose digital language representing information.
An XMP-enabled application lets metadata be captured during the creation of content, and then embedded in the file, as well as a content-management system. Useful descriptions, such as the title, the author, searchable keywords, and copyright information, are recorded in an easily understood format.
In 2002, Ralph Kimball, in his book, The Data Warehouse Toolkit, defined metadata as “all the information that defines and describes the structures, operations, and contents of the DW/BI system.” Kimball then described three types of metadata:
- Technical (structural/control) metadata is the information stored in your data source. It is the physical schema (columns, tables, and the stored data within those objects). This kind of metadata is often used to build the data dictionary. When the data dictionary and the metadata repository are compared, a gap analysis of any missing or incomplete data can be accomplished.
- Business metadata refers to the contents of a data warehouse, including the data that is available, where the data came from, and its relationship to other data.
- Process metadata is about the data warehouse’s operational results. Process metadata is information that ties to the metrics capture when a system executes, including traceability, lineage, and auditing info. When did the system run? For how long?
In 2003, Priscilla Caplan, an Assistant Director for the Digital Library Services at the Florida Center for Library Automation, decided to take the evolution of metadata to the next level. She broke metadata schemas into distinct categories, reflecting their key aspects of functionality, such as:
- Descriptive metadata refers to discovery, identification and selection. Descriptive metadata can also include collocation and acquisition.
- Structural metadata describes internal organization. In the digital environment, logical resources are often made up of multiple physical files. Structural metadata relates physical files to one another and to the structure of logical objects.
- Administrative metadata provides information designed for the management of resources. This includes when and how objects were created, the person responsible for controlling access to it, and the control or processing activities performed in relationship to it.
- Rights Management metadata refers to intellectual property rights. In systems, the management of rights must be approved against user’s profiles (proven by proper identification) to ensure the material is properly distributed and proper payments are being made to the rights holder.
- Preservation metadata is essentially about management. This contains information used to archive and preserve resources. Digital preservation describes a process designed to assure a resource will be accessible.
- Technical metadata schemes are generally very large and detailed, because they are often used by IT, or for computer-to-computer communications. Technical metadata describes information about technology (ownership of the database, physical characteristics of a database, performance tuning, and more). Technical metadata is the term used to describe the software and hardware needed to reproduce digital records, including video formats (mpeg) and pdf formats.
Metadata and Marketing
In 2007, Google shifted the way their search engine worked. Google had previously been based on a list of appropriate links (some were paid for, some were not). But at this time, Google expanded its search platform to include news, images, and video. As a result, new metadata would be introduced to make websites and information searchable and relevant for SEO.
Metadata found online and in digital marketing is a crucial tool for modern marketing. Metadata can help people find a website. It makes web content more searchable, and when used efficiently, metadata can increase the number of visits. Marketers can organize their metadata online to maximize the content’s reach. Accurate, organized metadata is key to creating a website that can be easily found.
Additionally, metadata has had a substantial effect on Search Engine Optimization (SEO), as it is a part of Google’s search process, and is displayed on their Search Engine Results Page (SERP). Optimizing the metadata, making it keyword-rich and conversion-oriented, can increase traffic going to a site.
Octopai offers machine learning technology that automatically maps and manages metadata collected from an organization’s various information systems, while using a single searchable interface. Its three cofounders were frustrated with having to manually trace the journey of data each time they wanted to find specific metadata. It would take hours and was often inaccurate.
Amnon Drori, Gal Ziton, and Itai Kahalani channeled their frustrations into finding an efficient technology solution, and created an automated platform allowing BI researchers to efficiently discover shared metadata. Their platform dramatically increased productivity, shortened the time to market, and reduced risks caused by inaccurate data.
Metadata Management and GDPR compliance
On May 25, 2018, the General Data Protection Regulation (GDPR) became law. The new GDPR regulations requires any EU customer data that allows identification of consumers to be made anonymous, or completely deleted. Essentially, data can remain available for purposes of Big Data research, but cannot be used to “stalk” an individual. To comply, businesses must achieve a level of awareness regarding their data that did not previously exist.
To gain a good understanding of the data a business possesses, one must access the associated metadata. Metadata Management helps tell where data came from, its location in different systems, and how it’s being used. Metadata is used to govern data, and is necessary in becoming GDPR-compliant.
Image used under license from Shutterstock.com