SAP, the international enterprise software vendor, is just one of the many large companies whose core needs in a globally connected business environment can be addressed in part by the development of a multi-lingual semantic web.
Web services are the heart of SAP’s software, and its requirements to make those services searchable in different languages is one of the use cases for the new European project Monnet (Multilingual Ontologies for Networked Knowledge), whose goal is to provide a semantics-based solution for integrated information access across language barriers. As the Germany-based company’s development work expands further into other parts of the world where the European country’s native language isn’t widely knownâ€”India now and probably China in the futureâ€”it’s important for programmers there to be able to access the description of SAP’s web services in their native languages, or perhaps English. Making it possible to deal with information at the semantic level will help solve this challenge, allowing for more advanced and uniform integration, aggregation, querying and presentation of information across languages.
The need to deal with such global business issues is just one reason the tech industry is soon to have its First Workshop on the Multilingual Semantic Web, set for April. But it’s not the only one. International non-profit organizations such as the United Nations are seeing the benefits of transforming their systems to semantic web technologies, and it’s very important to them that the semantic web be able to manage multi-lingual information so that they can communicate with users all over the world who have very different levels of comfort with the English language. “We have the special enterprise, and business intelligence, and certain international organizations that need not only to manage information in many different languages, but also need to customize the information for those different linguistic communities,” says Elena Montiel-Ponsoda, Ontology Engineering Group, Departamento de Inteligencia Artificial Universidad PolitÃ©cnica de Madrid in Spain. She is one of the members of the workshop’s organizing committee.
There is more to this than what goes on within the (admittedly) extended walls of an enterprise or other organization. Philipp Cimiano, Semantic Computing Group, Cognitive Interaction Technology Excellence Cluster (CITEC), Germany’s Bielefeld University, and also one of the members of the workshop’s organizing committee, points this out: “The important issue is that if we look at the web we know todayâ€”the syntactic web of HTMLâ€”that web is inherently language specific. Either I opt to publish information in English or whatever, so there’s a need to translate if I want to publish that information in other languages, and that’s a bottleneck,” he says. “The semantic web presents the potential to support multiple languages in a more clever way. We can use the RDF data model to publish in a language-independent fashion.”
The requirements driving a multi-lingual semantic web extend from large organizations and those publishing to the web down to every-day users of the web. English has been the de facto language of the web to date, but 70 percent of Internet users now actually live in countries where that is not the official language (though it may be widely used or at least understood). Representing information in a language-independent way by means of RDF is a big step to helping these individuals in their knowledge quests, but it doesn’t solve the initial interaction problem. “People query the web of data in their own language,” explains Paul Buitelaar, who is based in the Unit for Natural Language Processing at the Digital Enterprise Research Institute (DERI), National University of Ireland, Galway. He is also a member of the workshop’s organizing committee.
In principle at least, every language and the country it hails from carries the same weight in a multi-lingual semantic web, regardless of its wealth or status among the world community. “So there’s the possibility for people in undeveloped countries to participate in that web as much as other people participate,” says Cimiano. “In the sense that if information is language-independent, as it should be on the web of data, then all people need to do is map the data to their own language and gain access to all the data available.” In that sense, he says, the semantic web overcomes political barriers, and offers undeveloped countries access to information in other tongues that they could use to spur their own growth.
So, there are the various needs and the extraordinary promise. What are the challenges? “Someone publishes the data source, another publishes information on how to aces the URIs in this data source in their language, and step by step we are constructing the multi-lingual web by adding more and more data, including lexical knowledge about how URIs are realized in different languages,” says Cimiano. “The key step is to create the infrastructure to do this, formalisms to represent lexical knowledge, and strong incentives for people to do this. Because at the end it’s something to be made by the people themselves. These are the key three challenges,” he says.
And they are starting to be addressed. As an example, the Linked Open Data movement provides some strong incentives. More governments are on board with getting public data online, which provides an incentive to people to create bridges across languages to query data sets across countries â€“ say, the rate of development in Spain compared to Germany. Buitelaar also points to other practical steps already taken, including his and Cimiano’s LexInfo, a model for connecting lexical linguistic information to ontologies that is being promoted for publishing linguistic information as linked data together with the linked data itself. Montiel-Ponsoda’s team is working on a similar model, the Linguistic Information Repository (LIR), specifically with respect to multilingual information. “Extending the lexical model to different languages in the Monnet Project will further combine and develop this,” Buitelaar says. “The Workshop will have further discussion on this, as there are potentially other models out there to compare and see the best way to go forward.”
How far out, then, is the multi-lingual semantic web? There’s optimism on the Workshop’s team. “I think that it could be rather a short-term thing because really the infrastructure is already there,” says Cimiano.
Adds Buitelaar, “It comes from the applications. And I get the impression that the connection between semantic web development and language technologiesâ€”text mining and things like thatâ€”seems much stronger than only a couple of years ago. People are talking about semantic search, they are talking about some combination of semantic web technologies and text mining. So if that is so, it makes sense that the multi-lingual semantic web will rapidly develop. If text mining is such an important part of semantic web development overall, then multi-lingual text mining will be an important part too.”