Earlier this week I spent an enjoyable hour on the phone, discussing the work done by a venerable world-class museum in making data about its collections available to a new audience of developers and app-builders. Much of our conversation revolved around consideration of obstacles and barriers, and the most intractable of those proved something of a surprise.
Reluctance amongst senior managers to let potentially valuable data walk out the door? Nope. In fact, not even close; managers pushed museum staff to adopt a more permissive license for metadata (CC0) than the one (CC-BY) they had been considering.
Reluctance amongst curators to let their carefully crafted metadata be abused and modified by non-professionals? Possibly a little bit, but apparently nothing the team couldn't handle.
A bean-counter's obsession with measuring every click, every query, every download, such that the whole project became bogged down in working out what to count and when (and, sadly, that really is the case elsewhere!)? Again, no. "The intention was to create a possibility" by releasing data. The museum didn't know what adoption would be like, and sees experimentation and risk-taking as part of its role. Monitoring is light, and there's no intention to change that.
Lawyers, doing their job by flagging every minuscule risk and potential liability until everyone is too scared to move forward? Yet again, no. Funded by the state, the museum liberally interpreted their government's open data mandates and used that interpretation to justify everything that followed.
The thorny problem that continues to perplex the team is, at least superficially, far simpler than any of these. It's the problem of names.
At one level, this problem is probably not surprising. The museum is in the Netherlands, where people speak Dutch. One of 100,000 freely available images and catalogue records is for a famous painting by Rembrandt, for example. It is catalogued in the language shared by artist, curator, funders and audience; it's "Officieren en andere schutters van wijk II in Amsterdam onder leiding van kapitein Frans Banninck Cocq en luitenant Willem van Ruytenburch, bekend als de ‘Nachtwacht’." MoPostsst readers of SemanticWeb.com might better know the painting Nachtwacht by its English translation; Night Watch.
Potentially more serious than the issue of human-readable names is the hidden challenge of naming things so that software can reliably tie different statements together. In the world of museums, names, identifiers, and concepts are to be found everywhere you look. Museums give objects identifying numbers as they are accessioned. The Getty assigns labels to Artists, styles, periods and more in widely used resources such as the Art & Architecture Thesaurus (AAT) and the Union List of Artist Names (ULAN). In the Netherlands in particular, Iconclass provides even more identifiers that are used to describe the iconography in paintings, photographs, and more.
In principle, at least, this wealth of identifiers should be a dream come true. Imagine the potential of a Linked Data-powered application, reaching across the web to pluck unambiguously identified content from museums, galleries, and auction houses around the globe. Every Rembrandt painting at the click of a mouse, with none forgotten, and nothing by Vermeer sneaking in by accident? Every painting by a Dutch Impressionist that was painted on canvas and featured the concept of Morality?
All theoretically possible, powered by persistent, unambiguous identifiers... and the humble link.
But the real picture is less rosy. Those museum-assigned identifiers for physical objects are often crammed with codes that relate to the status of the object (on loan, purchased, donated, off for conservation, etc), the collection it's part of, and more. And then the museum buys an object that it had previously been exhibiting on loan from some rich European Duke or Wall Street banker... and the identifier changes.
And the museum is wary of creating web links that depend upon external sources (such as the Getty or Iconclass) which have so far failed to demonstrate a real and sustainable enthusiasm for web addressability. Our Dutch painter is unfortunately therefore represented in the metadata for his painting by the human-readable text 'Rembrandt Harmensz. van Rijn,' rather than by a more machine-friendly link to
http://www.getty.edu/vow/ULANFullDisplay?find=rembrandt&role=&nation=&prev_page=1&subjectid=500011051. The identifier (5000011051) might be good, persistent, and trustworthy, but the URL it's part of is just too application-specific to stand much chance of surviving the Getty's next website refresh.
The Rijksmuseum is one of several museums around the world that is actively and enthusiastically working to open up its data, so that it may be used, enjoyed, and enriched by a whole new audience. But until some of the core infrastructure — the names, the identifiers, the terminologies, and the concepts — upon which this and other museums depend becomes truly part of the web, far too much of the opportunity created by big data releases such as the Rijksmuseum's will be wasted.