At I-Semantics, the 7th International Conference on Semantic Systems held in Austria in mid-September, winning notices were given to some interesting semantic web projects.
One of them, DBpedia Spotlight, which annotates mentions of DBpedia resources in text to link unstructured information sources to the Linked Open Data cloud and which the Semantic Web Blog covered here, won the Best Paper Award. The open government data triplification track award went to John Erickson, Yongmei Shi, Li Ding, Eric Rozell, Jin Zheng and Professor Jim Hendler for“TWC International Open Government Dataset Catalog“, covered here.
And the open data triplification track award went to Daniel Garijo, Boris Villazón and Oscar Corcho for the contribution, “A Provenance-Aware Linked Data Application for Trip Management and Organization. (Corcho also has been involved in Fortunata, a tool for helping developers and graphic designers that aren’t well-versed in semantic web technologies create Internet applications that use and generate semantic data, covered here.) The Semantic Web Blog wanted to catch up our readers on this winning trip-management entry, as well, and engaged in an email conversation with the developers to get better acquainted with the application entitled El Viajero, for exploiting, managing and organizing Linked Data in the domain of news and blogs about travelling.
A Web n+1 project, it integrates content from newspapers and digital platforms belonging to the Prisa Digital Group in the domain of news and blogs about travel. Uploaded a couple of months ago to the CKAN open source data software portal, the dataset is available on the Linked Open Data Cloud. The current number of triples is 9,462,350, but it goes up every month or two, when the developers update the content with new guides.
Q: So let’s start off with what issue/problem you believe needs to be addressed in the online trip management environment. What’s inadequate today?
A: Blogs that are providing information about places are numerous, and potential travelers face the issue of information overload about places to visit, hotels to stay at, restaurants to eat at, etc. In fact, every blog, and every blogger, produces information that is also contradictory, depending on the experiences of each traveler, which makes it even more difficult to be able to determine which are the best options when planning a trip. In this Web context, sometimes travel guides created by professionals are not considered in the search for travel information. That is, they do not belong to the ‘blogosphere,’ and hence they are not found easily. However, many of them are useful. Trustung a professional travel guide writer rather than any unknown blogger could be an advantage.
Q: How does El Viajero aim to address this issue?
A: El Viajero is an accompanying supplement to El País, [a Prisa Digital Group property] which contains such travel guides made by professionals.
In our application we make information included in the guides available by geolocating it, so that it can be shown visually in a map, and also by covering trips in cases where they are talking about several places. This is also connected to additional geographical information from IGN (Instituto Geográfico Nacional – National Geographic Institute), available as Linked Data, and DBPedia, which provides additional information about places to visit.
In addition to the travel guides being linked to GeoLinkedData and DBPedia, we are also creating links to GuiaSantillana, which contains hotels and restaurants in Spain. On this way we’ll provide more useful information to the user
Now, all that geographical information is available as URIs (Universal Resource Identifiers). Now, the Embalse del Atazar, a reservoir located in Madrid, has a common identifier to be used in all public datasets, which is this, and there we have interesting information about the reservoir, together with its location. The El Viajero Guides often refer to certain locations around the world. Instead of creating new identifiers (URIs) for those locations, we looked for them on the Linked Open Data Cloud using semantic tools like SILK, and we reused the existant identifiers. By linking to these resources, we were able to extract additional knowledge from them (like the coordinates), which led to our being able to enhance our original information about the guides.
The ability to connect to other external data sources, so that more information can be added, is especially useful. Also, important is that we are using standards for the representation of geographical information and of the provenance (what, where, how) of the information from the travel guides.
Q: Let’s talk about the provenance issue. Can you tell us more about the Open Provenance Model and its role in this work?
A: Its role is to specify the what, how and why of the pieces of information that are made available. The OPM is a provenance model with a lot of discussion from the community during the last years. It has been widely used in scientific applications (workflow modeling), and many of the concepts currently discussed in the W3C Provenance Incubator Group come from its specification. Its role in our work is to model the creation processes of the guides, their references (links, images used), and the creation of the guide. It also allows defining trips as interactive blogs, where we can explore every addition from the user afterwards. We model the trip’s dynamic creation process as a workflow.
OPM creates a provenance graph, where the nodes are artifacts (guides, images, videos, etc.), agents (users) and processes, and the edges are the relationships that connect them: usage, generation, control and derivation. Thus, now it is easy to navigate through the references of a guide or a post and check where they come from.
Q: In what ways does this perhaps go beyond other approaches that have tried to leverage semantic web for trip planning purposes, such as TripBlox, TripIt? Is it complementary or how might you distinguish what you are doing from what has gone before?
A: This is complementary, and our approach could be well-combined to those in order to provide more inter-links between information sources. Our focus is on professional travel guides, and those are more focused on personal travels and blogs, so a good combination of all could be very useful for users, who may decide which information to follow.
Q: What were some of the challenges in getting to this point?
A: The main challenge is how to deal with the heterogeneity of the data. Guides are in a special IPTC format, while the names of the locations are in CSV format and the blogs are mainly html! We have had to adapt existant parsers to each of the formats. Other challenges have been the creation of the URIs following Linked Data good practices (a lot if the identifiers have special characters) or getting familiarized with the tools for exploiting the RDF.
Another challenge is how to present the information for the final user. The final user does not need to know anything about SPARQL or RDF. The application should be able to present the information to the user in a transparent way. Tools, like map4rdf, are useful for displaying user-friendly interfaces for the users.
Q: What are future plans such as going beyond providing visualization of data to more user interaction/contribution of data and adding more datasets as linked data? Can you give an update on status of these efforts and why you think it’s important to add these features?
A: We have currently set up the infrastructure for adding user-created content dynamically to the endpoint. We are currently building a prototype for the Prisacom group with other partners of the Webenemasuno (Web n+1) project, and hopefully we’ll get some users from their platforms to try the beta product. We currently have also contacted the people in charge of LUF (Linked User Feedback), in order to add ratings to the trips or other resources available in El Viajero, and be able to provide recommendation of the trips.