You are here:  Home  >  Data Education  >  Current Article

CHAIN-REDS Project Enhances Semantic Search And Extends Reproducibility Of Scientific Data

By   /  March 24, 2014  /  No Comments

chainredspixby Jennifer Zaino

The CHAIN-REDS FP7 project, co-funded by the European Commission, has as a goal building a knowledge base of information, gathered both from dedicated surveys and other web and document sources, for largely more than half of the countries in the world, which it presents to visitors through geographic maps and tables. Earlier this month, its Knowledge Base and Semantic Search Engine for exploring the more than 30 million documents in its Open Access Document Repositories (OADR) and Data Repositories (DR) became available in a smartphone and tablet app, while the results of its Semantic Search Engine also now are ranked according to the January 2014 Ranking Web of Repositories. So, users conducting searches should see results in the order of the highest-ranked repositories.

The project has its roots in using semantic web technologies to correlate the data used to write scientific papers with the documents themselves whenever possible, says Prof. Roberto Barbera, of the Department of Physics and Astronomy at the University of Catania, as well as with applications that can be used to analyse the information. To drive to these ends, the CHAIN-REDS consortium semantically enriched its repositories and built its search engine on the related Linked Data. Users in search of information can get papers and data and, if applications are available, can be redirected to them on the project’s cloud infrastructure to reproduce and reanalyze the data.

“There is a huge effort in the scientific world about the reproducibility of science,” says Barbera.

In many cases papers are presented with plots and graphs but not the data, which makes it impossible to extend the analysis or connect that information with other analysis. “We want to connect all this knowledge chain, from paper to data to app and to computers in the cloud where people can reproduce the data, the results of a paper and maybe extend those results to do more science,” he says. “In many cases the data is collected using public funds and at least in Europe there is a strong push from the European Commission to make data collected with public funds publicly available and reusable.”

Today its 30 million documents reside across more than 3,000 repositories, and its Virtuoso RDF-compliant database contains over 600 million triples. That database offers a SPARQL endpoint but the Semantic Search Engine better serves the needs of those who aren’t expert in using Semantic Web technologies, he notes. Users just put in a keyword (geology, natural gas, cardiology, for example), and it generates a SPARQL query on the fly. “When you get the resources [in the results], you also get, if they exist, the links between the data and the documents. This was all web-based but we thought it would be nice to have it as a mobile app, too.” A RESTful API is available so that other mobile and web apps can query the semantic database to conduct searches, as well.

The Semantic Search Engine demonstrates the huge power of access to this large number of repositories, Barbera says. “One advantage is that you can semantically combine, using the Linked Data paradigm, the data from our Knowledge Base ot other kinds of Knowledge Bases and mesh the different sources,” he says, such as the Engage project platform that contains open government data from EU countries. Engage also is funded under the EC FP7 program.

Plans are underway to also leverage Google Scholar to rank not only the repositories but the papers themselves in delivering results. “That will provide researchers with a quantitative weight of the documents they are looking for,” Barbera says, based on the number of citations retrieved by Google Scholar.

In addition to refining its semantic search features, the Consortium continues to work to collect more repositories, and generate use cases, potentially including commercial ones. “There will be a huge amount of semantic Linked Data that can be used for so many purposes, “says Barbera. The project itself lasts until May of next year, but the semantic search capabilities will live on as the CHAIN-REDS consortium looks to get funds to reuse the same technology and tools in other projects.

About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Property Graphs: The Swiss Army Knife of Data Modeling

Read More →