Earlier this year The Semantic Web Blog reported that the Getty Research Institute has released the Art & Architecture Thesaurus (AAT) as Linked Open Data. One of the external advisors to its work was Vladimir Alexiev, who leads the Data and Ontology Management group at Ontotext and works on many projects related to cultural heritage.
Ontotext’s OWLIM family of semantic repositories supports large-scale knowledge bases of rich semantic information, and powerful reasoning. The company, for example, did the first working implementation of CIDOC CRM search; CIDOC CRM is one of these rich ontologies for cultural heritage.
We caught up with Alexiev recently to gain some insight into semantic technology’s role in representing the cultural heritage sphere. Here are some of his thoughts about why it’s important for cultural institutions to adopt Linked Open Data and semantic technologies to enhance our digital understanding of cultural heritage objects and information:
- When Linked Open Data is leveraged, everyone across institutional boundaries can openly refer to cultural concepts. Consider the idea or need for a cross-collection search for a work of art or literature. The paintings of many artists, especially the Old Masters, are spread all over the world. Also, some manuscripts are split in parts and kept in different libraries. Semantic technologies and ontologies like Open Annotation allow them to be virtually reunited on a shared canvas, he explains. Semantic technologies give you two good handles on the problem of comprehensively searching across these collections.
- Collections whose metadata has the richness of LIDO (Lightweight Information Describing Objects) or CDWA (Categories for the Description of Works of Art) schemas or the UK Spectrum standard can be semantically represented in RDF for easier integration and search across those integrated data sets.
- Additionally, concepts such as a person, place, animal or picnic in the painting will have a global URL, so you will know that something is a painting with, say, pigs or horses. For example. If you use the Getty URL or Library of Congress subject headings and co-reference tables between them, then you can search for a concept and you will be able to find paintings in any collection that feature a pig or a horse because the classification is consistent.
- It becomes possible to easily find both hierarchical and transformative relations. Using the hierarchical structure of thesauri, you’ll find paintings featuring a horse or pig in a search for mammal, as well, since that is a supertag of horse or pig. This is an example of poly-hierarchy, he notes.
- Similarly, you can find paintings featuring ropemakers during a hierarchical search for occupation, but you also can discover associative relationships, such as ropemakers work in roperies, because of the “works/lives in” relationship.
- Semantic technology allows you to express precise relations. For example, the association of an object to a place could be: place of production of a book or work of art, finding (findspot), depicted or represented place, or place of major use of the object (for example, a crown in a particular kingdom). Or, an association of object to person could be: created by, ordered/sponsored by, influenced by (such as "in the manner of", "after", "in the style of"), person depicted or represented, and so on. Simpler metadata models like Dublin Core don't allow such semantic precision, he says.
- When you put out your cultural information as LOD, you also enable people outside the cultural institution to find problems in your data. Critically, you also allow others to inter-link their information to yours, creating a global network of cultural heritage information. That's the major reason people have been asking Getty to publish their vocabularies as LOD, Alexiev says, since they are reference items that others want to use in their data.