Yesterday The Semantic Link Podcast featured Karen Coyle, a consultant in library technology who’s consulted for esteemed institutions including the Library of Congress. Coyle discussed libraries’ long history with metadata, including with the MARC (machine-readable cataloging) format for nearly 50 years, and of sharing that metadata. That history helps explain why libraries, she said, are looking at semantic web technology – but also why changes to established processes are huge undertakings. “The move toward Linked Data will be the most significant change in library data in these two centuries,” she said, requiring the move from mainly textual data into using identifiers for things and data instead of strings.
Today, The Semantic Web Blog continues the discussion by sharing some perspectives on the topic from OCLC technology evangelist Richard Wallis. As noted in yesterday’s podcast, change has its challenges. “Getting the library community to get its head around Linked Data as a replacement for MARC … will be a bit of a challenge,” Wallis says. While more members of the library community are starting to “get” Linked Data, and what can be accomplished by extracting entities and linking between them, some still struggle with why change can’t just occur within the MARC format itself or its successor Resource Description and Access (RDA), that provides atomistic, machine-actionable data and machine-interpretable relationships. RDA, Wallis reminds us, took a decade from inception to publication and business model.
“The ramifications of turning into the Linked Data world are quite deep and meaningful but it will be a few years for that to be established in the library world,” Wallis says.
A start is Bibframe.org, a site that gives an overview of the beginnings of a vocabulary for the industry based around Linked Data terms. The Bibliographic Framework Transition Initiative by the Library of Congress includes determining a transition path for the MARC 21 exchange format to more Web-based, Linked Data standards, with Zepheira’s help in developing a Linked Data model, vocabulary and enabling tools / services. The proposed model is referred to as BIBFRAME, short for Bibliographic Framework.
The OCLC, which is the owner of Worldcat, a global catalog of more than 258 million library records and 1.8 billion-plus holdings in traditional library metadata format, joined with other major library industry figures in helping to flesh this out. The upshot so far includes a sample collection from the OCLC translated via the BIBFRAME pipeline. Other collections of Linked Data to help define and explore the BIBFRAME data model to support the requirements of a flexible and extensible bibliographic framework include a collection of the Library of Congress MARC records representative of those found in a public library that have been translated via the BIBFRAME pipeline, as well as sample collections from the British Library, the Deutsche National Bibliotek DNB, the George Washington University Library, the National Library of Medicine and the Princeton Library. The OCLC, Wallis says, might also be able to help with analyzing all MARC records it can get its hands on and listing in priority which elements of the records are used most, and then attacking how to describe the data that’s normally held in those in the BIBFRAME format.
What to expect sooner – this summer, possibly, if accepted by schema.org – will be recommendations of the W3C Schema Bib Extend Community Group, of which Wallis is chair. Its charge is to prepare proposals for extending Schema.org schemas for the improved representation of bibliographic information markup and sharing. The initiative is complementary to the BIBFRAME work, Wallis says: “You will never use schema as a vocabulary to run a library off of it. It won’t get deep and rich enough for all the subtleties in MARC data.” But what it will do is help on the search engine front, so that bibliographic data can be marked up in a way that search applications can understand and so use appropriately – and that’s what the average user probably will appreciate more.
“The library community has the right intentions about creating a new data interchange format/standard to help them share with the wider web bibliographic data,” Wallis says. “So the ambitions are right but probably the average person not in the library community won’t get BIBRAME when it’s finished.”
But for the library community, the move to Linked Data opens up a lot of intriguing possibilities that are begging to be explored. Like? Suggests Wallis, “Instead of having author details in every MARC record on the planet, if we have an authority set of author details and everyone can start linking to it, what does that do to the cataloguing process? We can start to ask these questions now.”