At a session discussing open data on the web at the Semantic Technology and Business Conference last week, W3C eGov consultant Phil Archer had this to say: That in his mind and the minds of the semantic web technology business people gathered at the event, “Open data is strongly associated with Linked Data, but the world doesn’t necessarily agree with us.”
What they are thinking about: “JSON and CSVs are the kings,” he said. “If you look at open data portals, CSVs [which get converted to JSON files] outweigh Linked Data by a mile,” he noted. And, he said, religious wars between those who see the world as triples vs. CSVs won’t be good for anyone. “If we keep telling the public sector to aim for 5-star data, vs. CSV 3-star data, we are in danger of the whole open data movement collapsing.”
No one wants that, and to address the big picture of realizing the promise of open data, April saw The Open Data on the Web workshop take place. It was organized by the W3C, the Open Data Institute, founded by Sir Tim Berners-Lee and Professor Nigel Shadbolt, and the Open Knowledge Foundation.
The OKF is responsible for the CKAN platform that the U.S. open government data portal, data.gov, now incorporates. “CKAN,” Archer said during his presentation, “is a really important platform and basically it’s about publishing CSVs, and it spits out a bit of RDF data.” He also noted that Dr. Rufus Pollock, founder and co-director of the Open Knowledge Foundation, has proposed a new standard for a data package that includes CSV and JSON. Frictionless Data, now in alpha, includes as principles using web-native formats like JSON. It defines a data package for delivery, installation and management of datasets, with a Simple Data Format (SDF) at heart whose key features are CSV for data, single JSON file (datapackage.json) to describe the dataset including a schema for data files, and the reuse wherever possible of existing work including other Data Protocols specifications.
“He sees data as an egg,” Archer said, raising a little model egg, in contrast to the W3C RDF interconnected icon model he’d held up earlier, “a hermetically sealed thing… Pollock says to grab the data and metadata in package, open it up and make something of it.”
During his presentation, Archer noted that at the workshop, Pollack commented that RDF is not web-native. As recorded in the Open Data on the Web report here, Pollock commented that, “RDF isn't natural — and therefore is barely used — by the average Web developer or data wrangler. CSV, by contrast, is. And you are going to need to win the hearts and minds of those folks for whatever approach is proposed.” While some may disagree with that description (see this discussion), Archer pointed out that Pollock’s position represents a very large community. “Ignore people like him at your peril,” he said.
No Limits For Linked Data Either
At SemTechBiz, Archer discussed that the Open Data on the Web workshop also bore witness to exciting use cases of LOD. He pointed out, among other things, work that NXP Semiconductor is doing to publish product information catalogs as Linked Data for internal and possibly external use, too, and a vision of future product Linked Data from global standards body GS1 that includes a URI for every product and class of product.
To bring the various open data constituencies together, Archer discussed efforts including a CSV on the Web Working Group, part of the W3C Semantic Web Activity, that the W3C’s Ivan Herman will be spearheading. It cites as its mission providing technologies whereby data-dependent applications on the Web can provide higher interoperability when working with datasets using the CSV format. Archer also is developing a Data Best Practices Working Group, part of the W3C Semantic Web Activity, too, to provide guidance to publishers that will improve consistency in the way data is managed, thus promoting the re-use of data.
It’s closing the circle “so that publishers can see that data is being used and application developers can see [it is quality data] and it is being updated,” Archer said at the event. It’s about building “an ecosystem [in which open data operates] or else there’s the danger of data disappearing.” That ecosystem has to work for both open and closed data, considering how common it is for organizations to often use the former to augment the latter, he noted. “We have to talk about that closed data too and how it is handled in a way to make some sense,” Archer said.