The EPA Explores A Linked Data Ecosystem

By   /  October 19, 2011  /  No Comments

The Environmental Protection Agency is facing some tough challenges in the run-up to the presidential elections. The House wants to cut its funding by 18 percent, and it’s dealing with criticism from some corners that new regulatory proposals could impose another burden on businesses that may hurt economic recovery efforts.

Hands up if you think this makes an even stronger case for investigating the role of semantic technology and Linked Data projects in the government sector. Semantic technology doesn’t favor any political parties’ agendas, of course. But Linked Data approaches do help the government (and the citizenry) do more with and get more out of data for less money, as Bernadette Hyland, co-chair W3C Government Linked Data Working Group and CEO of 3 Round Stones, explains here.

Add to that that its application could some day have a softening impact on regulatory requirements, too. While it’s too early in the government’s exploration of Linked Data for such scenarios to have been played out yet, the idea seems to have merit. “I’m part of the EPA’s IT organization so I’m not really involved in policy work, but there’s huge potential there,” says  David G. Smith, Information Management Specialist at the U.S. EPA. Smith will be speaking at the upcoming Semantic Tech & Business Conference in Washington D.C. about the work the EPA has underway related to Linked Data.

Consider, for example, that modeling and publishing and mashing linked government data sets – the EPA’s but other agencies’, too – could lead to new understandings into the cumulative regulatory burden on companies.

“Are there opportunities to reduce their reporting requirements since there is a lot of duplication, and can we build something like a semantic model of regulations and use that” to find double-whammies or even conflicting sets of regulations, Smith ponders. Or, are there ways the technology can be applied in rulemakings, so that regulatory requirements can be more focused in their application?

Industries could gain in another way, too: “If [data is] made available as open linked data, then that gives the potential for the private sector and third parties to figure out better ways about how they can make sure they are in compliance, or do an analysis of how regulations impact them,” he says. “They can answer those questions more quickly with Linked Data models.”

While the EPA is still in the early adoption phase, the work so far linking together different data sets has generated “some really fantastic things,” Smith says. The data sets involved include the Facility Registry System (FRS) with 2.7 million place-based pieces of information on everything from plants to brown-fields to clean-up sites, the Substance Registry (SRS) and the Toxic Release Inventory (TRI).

“The exciting thing there is we now can actually search seamlessly from my data at the FRS level to the TRI that talks about chemicals and quantities released into the air and water by some facilities. So we can drill into what is that substance, what do we know of that, does it have other names that it goes by,” he says. “We didn’t have that before. Before you had to have domain expertise to know, for example, if there is another name for this chemical this facility says it is releasing, and what are the health impacts. You had to take a few more steps to find that stuff out.”

The project has exposed some issues around data quality – not surprising given that there is still a lot of manual entry of reporting data and errors tend to go hand-in-hand with that. Semantic technology presents an upside opportunity here, too, says Smith. “Instead of having someone type in information about dealing with chemical X, [we could] actually query that on the fly, embed some technology right into the data entry front end.” Think Google Auto-complete, with a semantically- enriched back end to query against. He envisions that, for example, enabling a visual prompt about an entry, prompting that a certain chemical  property is specifically of a certain type or goes by a certain technical name. That additional prompting could drive better-quality data starting right at the front end. “That’s a big challenge with the FRS. It’s a huge universe of data from 32 federal databases and 57 state or territory databases, and data quality is hit or miss from one system to the next,” he says. “The more we can leverage this technology and clean it up right as it’s being entered is a huge opportunity.”

In addition to a future ecosystem of linked  government agency data sets for speeding analysis and insights, Smith sees potential in leveraging general-purpose vocabularies such as DBpedia and GeoNames.  “If everyone talks about a community in the same way using the same vocabulary, that is a great way to quickly pull together a whole lot of information about a community. You can see that it has this power plant, or water treatment facility – we have some information on that. And you can kind of quickly build a dynamic snapshot of what goes on in a given place.”

Put EPA linked data into the hands of corporate sustainability officers, and they can drive value internally in their companies, too. “I would anticipate some of these companies that are more forward-thinking would have a corporate dashboard to track internal metrics, plus they can pull in data from us on how the EPA says their performance is, and have it all integrated into CXO-level dashboards for tracking performance. Some industries would welcome that,” he says.

All that said, Smith has funding pressures on his mind, too. “We’re definitely feeling the crush of rapidly shrinking budgets and trying to deal with a lot of legacy technology,” he says, pointing out that the EPA is still using some processes built on Oracle scripts from the late ’90s. “There’s still a lot of labor-intensive work we have been doing with traditional data warehouse approaches. And there’s the recognition that we either have to innovate quickly and find some better ways to do things or be painted into a corner with barely enough money to maintain the basic O&M (operations and maintenance).”

For more about the upcoming Semantic Tech & Business Conference on Nov. 29 to Dec. 1, visit this page.


About the author

Jennifer Zaino is a New York-based freelance writer specializing in business and technology journalism. She has been an executive editor at leading technology publications, including InformationWeek, where she spearheaded an award-winning news section, and Network Computing, where she helped develop online content strategies including review exclusives and analyst reports. Her freelance credentials include being a regular contributor of original content to The Semantic Web Blog; acting as a contributing writer to RFID Journal; and serving as executive editor at the Smart Architect Smart Enterprise Exchange group. Her work also has appeared in publications and on web sites including EdTech (K-12 and Higher Ed), Ingram Micro Channel Advisor, The CMO Site, and Federal Computer Week.

You might also like...

Case Study: Three Strategies for Data Governance Success

Read More →