Inventorying and managing cultural heritage data turns out to be a pretty complicated undertaking. The construction of a famous site may have lasted across different time periods, and its present location may span multiple districts. Buildings may be associated not only with famous architects but also with well-known residents. Or structures may have been constructed atop pre-existing entities.
Helping sort it all out is the work of The Arches Project, collaboration between the Getty Conservation Institute (GCI) and World Monuments Fund (WMF). The Arches effort grew out of GCI’s and WMF’s work to develop MEGA-Jordan, a purpose-built geographic information system (GIS) to inventory and manage archaeology sites at a national level for that country. But for this more generic and open-source take at accommodating any country, region or other institution worldwide responsible for the protection of immovable cultural heritage, the focus expanded from the geo-spatial to the semantic.
“We became very familiar with the CIDOC Conceptual Reference Model ontology,” says Alison Dalgity, who manages the Arches project on GCI’s side. The CRM provides definitions and a formal structure for describing the implicit and explicit concepts and relationships used in cultural heritage documentation. “We realized we needed something like that. Now, the GIS piece is only part of this – it’s nice to know where something is, but all the other relationships – the who, how, what and when and so on – have to be represented, too.”
Activities, actors, documents, corresponding heritage assets, materials and techniques, designations, descriptions and locations, measurements, names, the condition of a site as it changes over time, and the relationships among all these, were priority considerations about what the new project needed to accommodate. “If the data is entered and relationships are established, the system itself may make connections that no one else has made before between – that this piece of heritage is related to that one which is related to that person. That could be significant,” she says.
Helping with the project is Farallon Geographics. “We realized early on that to do this right, we weren’t going to be able to do it by defining a bunch of traditional database tables and filling them up with data,” says CEO Dennis Wuthrich. ”We needed to think about building an app on the ontology that let people define the kinds of things they wanted to track and the relationships between the attributes of things.” With the help of the English Heritage and the Flanders Heritage Agency, which were familiar with the ontology and had the domain expertise, Farallon developed a graph database that represents the relationship of a site to its name, period, location, actors, activities, architectural heritage, and so on. It’s now building the forms necessary for persons to create and manage that information.
“Think of it as depending on what kind of new entity you want to create – say, a piece of architectural heritage,” he says. “You have a pre-defined graph of how to relate that entity with the other minimum required data elements, and because they are mapped in accordance with the CDOC CRM, they are essentially semantic. It is a semantic mapping.” So, if you have a building, for example, there is a semantically defined way of associating a name or set of names with it, and a semantically meaningful way of associating a suite of materials the building might be made up from, its cultural periods, its location and the various ways you can define it (geospatially or relative to its address or an administrative area). “Basically, you have a system that knows how to track these semantic relationships and how to map between something like a physical feature in the field and the data elements defined as the necessary suite of information to manage these cultural artifacts.”
A relational database underpins the work, tracking the data relationships within one piece of cult heritage but not necessarily across them. A Python app then builds the complete graph that fully describes one of those heritage assets and produces it as a JSON object, Wuthrich says. RDF and Linked Data aren’t in the picture yet, but that doesn’t mean they aren’t being thought about. “What we do know is the information within any one instance of Arches would be really a lot more valuable if it were made accessible and could participate in the LOD cloud,” he says. “We feel the precursor for doing that is there. Everything is mapped to the ontology with semantic relationships that define how these various entities are related.”
The current thinking is that, as it uses Apache Lucene and elasticsearch to flatten out the data for presentation to Arches users, it may be able to do something similar and publish data as RDF through some kind of data transformation process. “We think we could take what is in the current database and transform into a triple store,” says Wuthrich. “Someone then could decide either to connect a SPARQL endpoint to it or actually publish it as LOD.” But, he adds, it’s still early on in the project, so more thought has to go into how to properly carry out such an extension.
Moving in this direction would seem to fit with GCI’s and WMF’s hopes. “They see the value of exposing Arches data as LOD because it really meets their overall objectives: Getty wants to make it easier for people to understand the value of culturally significant things and to protect and preserve them, and WMF wants to promote publishing and academic research into these. And it’s easier to explore and link and understand data in these repositories.”
Dalgity concurs that the semantic web is very much on the mind. “We want to make sure when people put data into Arches there is an easy path to publish it as LOD,” she says.