The Library of Congress’ National Digital Information Infrastructure and Preservation Program works to catalyze and sustain a national network of digital preservation partners. From the beginning of the project one of the key ideas has been that the partnership, now with 185 partner organizations across 45 states, needed to work toward a distributed architecture. To that end, NDIIPP has worked with its’ partners to connect different platforms for storage and verification, data and metadata management, and access and discovery of preserved digital materials.
This national network of partners has already preserved a substantial amount of our digital cultural heritage, including among other things; geospatial information, web sites, audio visual productions, images and text, and materials related to critical public policy issues. As this set of preserved materials continues to grow it has become all the more apparent that we need tools that provide ways for users to discover and access those distributed collections.
To this end, NDIIPP has collaborated with Zepheira to create Recollection, a free platform for generating and customizing views, (interactive maps, timelines, facets, tag clouds) that allow NDIIPP partners and their users to better explore and understand these growing digital collections.
The Recollection workflow
Recollection allows our partners to ingest collections from spreadsheets or MODS records, and in the near future through OAI-PMH. It then allows partners to augment that data; letting them generate points of latitude and longitude from plain text place names, derive ISO dates from textual expressions of date information, and break apart data in a field into lists based on simple patterns. From there it allows partners to create views of their data, including maps, timelines, and faceted navigation. Finally, the platform allows users to publish the views they create and embed them on their own sites. Each stage in this process is uniquely tailored to the set of problems in digital preservation. In the remainder of this article I will talk through how the stages of this workflow are tailored to the particular challenges presented in our work on digital preservation.
Data Ingest and Augmentation: The Heterogeneity of our Partner’s Data is an Asset not a Problem
The library and archives community has established a variety of metadata schemas for digital content. Those metadata schemas are invaluable locally, but the 2003 Archive Ingest and Handling Test (AIHT) revealed that each partner institution employed a different grammar within the same schemas. Clay Shirky, technical advisor to the project suggested that, “The goal should be to reduce, where possible, the number of grammars in use to describe digital data and to maximize overlap, or at least ease of translation, between commonly used fields. But it should not be to create a common superset of all possible metadata”(Shirky, 2005). The last sentence of Clay’s remarks remains especially poignant.
Attempts to bring work with our partners’ diverse sets of materials need to respect that the local decisions about how to organize and keep data are uniquely tuned to partners’ circumstances. In this sense, the heterogeneity of our partners’ data is an asset, not a problem. This is a core design principle behind the Recollection platform. By importing partner data and allowing the partners to perform augmentations on that data to quickly create visual interfaces to their collections the tool lets partners work with their data without requiring them to make any changes to the metadata or materials they store in their own repositories.
Recollection Needs to Appeal to Partner’s Local Needs
While we can (and do) appeal to various common goods that come from exposing these materials through common set of interfaces, the only way that this kind of platform can succeed is if it provides functionality that partners feel is valuable to their users and supporters. To this end, once users have pulled in their information into the system it provides two valuable services, first to allow partners to quickly generate interfaces for better understanding their collections and second by allowing them to publish and embed those interfaces as a means of providing access to the materials they are preserving.
Building Views: Visual Interfaces for Understanding
We are increasingly hearing about the “data deluge” as one of the central problems of the digital age. Simply put, all manner of individuals and institutions are generating ever increasing amounts of digital material. The first value that recollection offers to our partners is the ability for them to quickly generate interfaces to their collections. The information inside these collections is often thought of in library contexts as records, that is as individual entries of information. Through the import and augmentation process it is possible for partners to quickly treat these records as data and to see and explore the information from those records through views of that data.
For example, by displaying categorical information extracted from records in a weighted tag cloud or a facet with frequency information it becomes very easy to get a sense of the relative frequency at which your categories are used. Further, it becomes easy to see any problematic issues in your classification which might suggest a need to remediate your data at the source (misspellings, typos, inconsistencies in capitalization or punctuation). Beyond this, the ability to augment plain text place names and dates into plot-able points of latitude and longitude and ISO dates lets partners quickly process the information contained in their records on a map or timeline. In these cases, the visual interfaces which recollection lets partners create are valuable tools for making sense of the data they are collecting and preserving in this deluge. By quickly allowing partners to visualize important components of their collections the tool offers ways for them to better understand their records as data and make informed discussions about prioritizing remediation.
Publishing and Embedding: Visual Interfaces for Discovery and Access
In the final stage of the Recollection workflow a partner can publish the views they build and share the resulting published link to their collection. This is valuable in its own right, but more importantly the tool allows our partners to embed the resulting interfaces to their materials on their own web pages. To this end, the final stage of this tool is the creation of a dynamic visual interface to the partner’s materials that they can bring back to their local context, to the sites they have built to serve their specific communities and constituents. If the platform is successful it will result in the creation of communal set of interfaces to these diverse collections. However, that success will only be possible if we can ensure that this kind of local value to the partners is present in the system.
Shirky, C. (2005, December). AIHT: Conceptual issues from practical tests. D-Lib Magazine vol. 11, (12). http://www.dlib.org/dlib/december05/shirky/12shirky.html
Trevor Owens is an information technology specialist with the National Digital Information Infrastructure and Preservation Program (NDIIPP) in the Office of Strategic Initiatives at the Library of Congress. Before coming to the Library of Congress Trevor was the community lead for the Zotero project at the Center for History and New Media and before that Trevor worked for the Games, Learning, and Society Conference in Madison Wisconsin. Trevor received a BA in the History of Science from University of Wisconsin: Madison, and an MA in History, with a focus on American history and Digital history, from George Mason University. He is currently completing a PhD in Research Methods and Instructional Technology in the Graduate School of Education at George Mason University. Photo Credit: Barry Wheeler
Kathy MacDougall is a Partner at Zepheira which provides solutions to effectively integrate, navigate and visualize data across personal, group and enterprise boundaries. Kathy has extensive experience leading enterprise-wide initiatives to help companies evaluate and leverage their corporate data to increase revenues and uncover new business intelligence. Successes during her 20-year tenure in this field include creating data-based and knowledge management solutions for companies ranging in $500M to $11B in size, including such names as General Electric and Sun Microsystems. At Sun Microsystems, Kathy and her team led the first known large-scale corporate implementation of Semantic Web technologies which provides the foundation for dynamic delivery of product-related content from across the organization.