Case Study: Semantic Web Ontologies and Geoscience Collaboration Helps the Planet

By on

In the geoscience community, collaboration is critical. Different disciplines — engineering geologists, geochemists, hydrologists — need to share their findings with each other to address big questions about the earth.     

Take climate change. What factors contribute to it? What impact will it have? Oceanographers who study the dynamics of oceans do their work individually from atmospheric scientists who study the global dynamics of climate. A study done by the one may impact the work being done by the other. But they won’t know that until they share their data, and that creates a better opportunity to make progress on the issue.

The Earth Science Information Partners (ESIP) facilitates making those connections across institutional and domain boundaries for just these reasons. Those collecting earth science data in government agencies, academia, and other institutions have a forum to connect with peers.

FAIR is the banner name for the mission of the nonprofit organization: Making data Findable, Accessible, Interoperable and Reusable on the web. ESIP is “the jungle gym for everyone to play in,” said Dr. Annie Burgess, Lab Director at ESIP. “There are broadly scientific domains, especially in earth science data, that are dealing with some of the challenges like collection, usage, and preservation of data.”

ESIP Makes a FAIR Difference

The FAIR data principles are crucial to the mission of the Monterey Bay Aquarium Research Institute (MBARI) in improving the dissemination and impact of its scientific knowledge and data in the oceanographic community and beyond, said Carlos A. Rueda, Senior Software Engineer. “They enhance my organization’s ability to communicate about our data systems and tools to a much wider scientific and resource management community.”

Before ESIP, it was a struggle to find a like-minded community of project managers, software developers, and data curators who cared about best practices for earth science data management, according to John Graybeal, Project Manager, Marine Metadata Interoperability Project. MMI promotes the exchange, integration, and use of marine data through enhanced data publishing, discovery, documentation, and accessibility.

“Communicating with and learning from that community was hard,” he said, and even more challenging was to effectively communicate the value of the group’s work to the larger scientific community. “ESIP plays a huge role in filling that gap, both to improve communications within and beyond the earth-science-data-oriented community, and to promote and increase adoption of better practices and solutions.”

Felimon Gayanilo, Systems Architect working at the Harte Research Institute (HRI) at Texas A&M University Corpus Christi and with the Gulf of Mexico Coastal and Ocean Observing System (GCOOS), notes that HRI and GCOOS were promoting data findability, accessibility, interoperability, and reusability before ESIP was created.

Harte is focused on science-driven solutions to advance the long-term sustainability of the Gulf of Mexico, and GOOS is engaged in helping stakeholders such as governmental and non-governmental organizations combine their data to provide timely information about the Gulf.

Joining ESIP helped forward its goals, making it easier to network, exchange, or promote ideas with like-minded colleagues. It previously had been hard to find a good venue to test its ideas. “Importantly, ESIP also provides a way to remain abreast of new or developing technologies in data and information science,” Gayanilo said.

Core Collaboration

ESIP, a nonprofit organization, doesn’t collect or house scientific data. But its community wanted a place to locate their ontologies, so the ESIP Community Ontology Repository (COR) was born.

ESIP has hosted the web application and service for creating, updating, accessing, and mapping ontologies and their terms for about two years. COR offers the geoscience community a broad variety of earth science artifacts encoded according to semantic and linked data principles.

“This is really a resource for the community who wanted to build it out and treat it as an open source project with governance and a tech team,” said Burgess.     

COR has given a new public life to the official repository for Semantic Web for Earth and Environmental Terminology (SWEET) ontology and helped propel community improvements for it. Disparities between SWEET and the Environment Ontology (Envo) for the concise and controlled descriptions of environments are currently being evaluated to come to an understanding of how to make the two ontologies interoperable.

Graybeal said, “Each year the earth science community improves the value of its hard-won research and data sets and grows its trove of reusable historical resources.”

Franz’s AllegroGraph knowledge graph technology is leveraged with COR to manage and exchange terms and vocabularies that assist scientists in publishing, discovering, and reusing data. “COR connects the individuals working on those projects in the larger effort of specifying actionable semantic information,” Rueda said.

Dr. Lewis McGibbney, data scientist at the Jet Propulsion Laboratory, California Institute of Technology — which counts among its duties physical distribution of oceanography data related to its earth science missions —said in a statement that there is a critical mass of experts and organizations around the globe who realize the need for knowledge-intensive applications.

“The semantic technology stack is a crucial piece for building intelligent apps for knowledge-intensive use cases within the geoscience area,” said McGibbney, who is also co-chair of the NASA ESDSWG Search Relevance Working Group.

Data Curation and Citation

Regarding Data Governance, the ESIP Data Stewardship Committee has created data citation guidelines. It is also developing a uniform metric for assessing the state of curation to work across all types of earth science data repositories so that users have an easier time determining what data fits their needs and so that repositories understand the current state of any given data set.

Data citation is beginning to become a common practice, said Gayanilo, especially now that data is being shared more often across researchers. HRI maintains the GRIIDC information system and other data repositories, and includes ISO 19115-2 metadata files to accompany all data being distributed, and for data being distributed with a persistent identifier (Digital Object Identifier) to promote data citation.

“ESIP’s support and efforts promoting FAIR data practices will make it easier for HRI and organizations like GCOOS to convince data providers or potential data providers to share their data with fewer or no conditions on their use and application,” Gayanilo said. “The more quality data we have that can be shared with no pre-conditions on their use will promote better science, and hence, better management decisions down the line.”

Image used under license from

Leave a Reply