You are here:  Home  >  Conference and Webinar Communities  >  Current Article

Datasets Addition Promising Extension For Schema.Org

By   /  July 11, 2012  /  No Comments

A call for comments is out for a proposal for a ‘Datasets‘ addition to schema.org, via the W3C’s Web Schemas task force group that is used by the schema.org project to collaborate with the wider community.

The proposal extending schema.org for describing datasets and data catalogs introduces three new types, with associated properties, as follows:

Writing at the Schema.org blog, Dan Brickley calls it a “small but useful vocabulary,” with particular relevance to open government and public sector data.

He also references this week’s post at Data.gov by Chris Musialek, the chief software architect for the site. Musialek writes that, following a review of the draft proposal, “we are comfortable with the current state of things,” and that any work left to do seems very resolveable.

“We’ve been watching the schema.org datasets schema space for a while now, as Data.gov is very interested in adding schema.org support for our listing of over 450,000 datasets. We think this will help the major search engines create better relevance rankings of Federal government data, where many searches begin,” Musialek says. And he notes later in the post that, “We’re really excited to see this schema move in the direction of official addition to schema.org. We really hope to see it be included in a schema.org release soon.”

The Tetherless World Constellation at Rensselaer Polytechnic Institute – where Professor James A. Hendler is now the head of the Department of Computer Science – has a demo available that contains automatically-generated dataset descriptions based on TWC’s International Dataset Search and which uses the schema.org extension for datasets and data catalogs. A few weeks back, at the Semantic Technology & Business Conference in San Francisco, Hendler told The Semantic Web Blog in an interview that, while a vocabulary for describing datasets and data catalogue was not yet part of schema.org, efforts were underway to make that happen.

In that interview Hendler also disclosed that the number of open government data sets on the web has hit the million mark. In his schema.org blog posting, Brickley says the proposal is exciting because of the “huge number of datasets that have been made  public in recent years. While each dataset may ultimately be expressed in detailed, domain-specific form (e.g. using specific scientific or statistical schemas), the Datasets proposal focuses on the high level common characteristics that are shared across thousands of otherwise diverse datasets.”

The proposal includes a table mapping Datasets extension types and properties (including supporting schema.org vocabulary) to and from their approximate equivalents in Data Catalog Vocabulary (DCAT), Asset Description Metadata Schema (ADMS), and VoID. The next steps for the proposal are to get feedback from publishers of applicable datasets that the extension would be useful to them and is a good fit to available metadata.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Press Room

Press Release: DATAVERSITY to Host a Virtual Conference to Teach Data Governance Strategies

Read More →