The revolution is coming. The spreadsheet revolution, that is.
Last fall, senior enterprise architect Brand Niemann of the Environmental Protection Agency issued a challenge to the semantic web industry: Who will step forward and show how to take the reams of government data currently locked away in spreadsheets to the semantic web? This spring, at the Semantic Technology Conference, May 18-22 in San Jose, Calif., Niemann and Lee Feigenbaum, VP technology and standards at Cambridge Semantics Inc., will demo the solution to that question.
Cambridge has developed technology that, according to Niemann, lets you build an application or put data in a spreadsheet, change the numbers, perform operations on them, and so on, and then have that information and changes automatically show up as a web display of the data on a web site, in an RDF data cloud on the web, and even as a real-time alert sent to a device such as a cell phone.
“When this paper was reviewed, the reaction was that this could revolutionize the spreadsheet industry to the benefit of the semantic web and the semantic technology community,” says Niemann.
The technology has been tried with federal statistical data, with the EPA’s Children’s Health Data, and it will be put to the test at the conference. Niemann has taken the information off the conference’s web site and put in a spreadsheet, and Feigenbaum will make changes to that spreadsheet as they happen in real time (from venue changes to speaker cancellations), and that information will then be reflected conventionally on the web site, as RDF data, and in real-time alerts.
“That’s where most people are trying to go with this,” says Niemann.
What makes the spreadsheet such an important application to semantify? Put it down to a couple of things. The first is that lots of government data is stored in spreadsheets, inaccessible to the Google crawlers of the world. And that data is there because government end users, like people in business, love their spreadsheets — they love the macros they have developed for them, they love their familiarity with the technology, and they’re not necessarily interested in spending weeks learning to develop a semantic web application and the ins and outs of marking up their information and building an ontology. They may see the value of making their information more useful, reusable, and accessible, but they want to get that value without having to do much, if any, more work. This technology means they can continue using and working in the application they love, but develop semantic web applications at the same time.
“Essentially, the idea is you enable a familiar front end, you enable it with the semantic web back-end infrastructure, and it all happens more or less without much additional work,” Niemann says.
For example, the Census Bureau collects 1,500 spreadsheets from across the government every year, Niemann says, including some 60-plus from the EPA. The Census Bureau isn’t funded to do more with the data than that, but this technology opens up the possibilities of sharing the data, simply by connecting the spreadsheet to the semantic middleware, and check the ways they want the data to be replicated.
“That’s a no-brainer,” he says. “Because it’s all connected with semantic web middleware, it can show up in three ways automatically: [conventionally] on the census web page, as RDF data, and as an alert that a spreadsheet update is available now.”
The spreadsheet idea makes the semantic web accessible to the masses, and Niemann sees possibilities for other “mass” applications to follow in its footsteps. “PowerPoint is the next and email,” he believes. “It’s marching through these conventional applications or desktop applications, and connecting them to semantic technologies and standards. PowerPoint is the hardest one, because it’s sort of structured and unstructured, but one people use a lot.”
As for email, Niemann says there’s some testing underway using a standard set of emails that were not classified with Radar Networks’ Twine, to explore its ability to work with unstructured information in that capacity and make some sense of it. He sees many applications for this that are exciting, such as dealing with the thousands and hundreds of thousands of emails that the EPA receives. The EPA leads government agencies in e-rule making, and the public by law is allowed to comment on proposals. The idea would be to automate responses to those emails as much as possible.
“Twine holds the prospect of letting us build a semantic graph of all those emails, and from that graph we can tell which emails we received are asking the same general kinds of questions and which are truly (unique). Then we can be more efficient, constructing a general response for 90 to 95 percent of the general emails and then only have to deal with the 5 or 10 percent that would require human involvement in carefully constructing a response to very unique questions,” he says.
“Once we prove that 1) you can extract a semantic graph that is useful from one or one million emails or document and 2) that you can identify useful things from comparing semantic graphs, then that opens up a tremendous market, we think,” Niemann says. “Just like we think there’s a tremendous market for spreadsheets.”