It cannot be denied that Stephen Wolfram knows data. As the person behind Mathematica and Wolfram|Alpha, he has been working with data — and the computation of that data — for a long time. As he said in his blog yesterday, “In building Wolfram|Alpha, we’ve absorbed an immense amount of data, across a huge number of domains. But—perhaps surprisingly—almost none of it has come in any direct way from the visible internet. Instead, it’s mostly from a complicated patchwork of data files and feeds and database dumps.”
The main topic of Wolfram’s post is a proposal about the form and placement of raw data on the internet. In the post, he proposes that .data be created as a new generic Top-Level Domain (gTLD) to hold data in a “parallel construct.”
Wolfram writes, “But what would be the point? For me, it’s about highlighting the exposure of data on the internet—and providing added impetus for organizations to expose data in a way that can efficiently be found and accessed.”
He continues, “My concept for the .data domain is to use it to create the “data web”—in a sense a parallel construct to the ordinary web, but oriented toward structured data intended for computational use. The notion is that alongside a website like wolfram.com, there’d be wolfram.data.”
Not surprisingly, this approach has garnered some strong early responses, several from leaders in the world of Semantic Technologies and Linked Data.
Paul Miller, SemanticWeb.com columnist and SemanticLink Podcast host, wrote on his blog, cloudofdata.com, a post titled, “Top Level Domain for data answers the wrong question.” Miller writes, “Whilst wholly in favour of Wolfram’s stated aim, I can’t help feeling that his suggested solution is at best unnecessary and at worst a worrying segregation of data from the ‘proper’ web that everyone else will continue to exploit.”
Paul points out that, “At the end of the day, the machines don’t actually care. The existing data.open.ac.uk-type sites are human conveniences, not machine enablers. The computers, and the software they run, are quite capable of crawling the public web and finding accessible data wherever it lies on a site. There are plenty of reasons to continue embedding little snippets of data inside human readable web pages, regardless of whether you have a data.wolfram.com or a wolfram.data site. Content negotiation is becoming increasingly capable, such that there really is no need for what Wolfram calls a ‘parallel construct to the ordinary web’ at all. A human being arriving at a web site sees human readable content, whilst various software tools would automatically be presented with very different data or functions, optimised to their capabilities and requirements.”
Another intriguing rebuttal was not actually written in response to Wolfram’s post at all, but rather is a document written by Tim Berners-Lee in 2004 called, “New Top Level Domains .mobi and .xxx Considered Harmful.” In it, Berners-Lee addresses many of the basic concerns in creating gTLDs such as .data.
Several Semantic Web thought leaders have commented on Wolfram’s original post, including Kingsley Idehen, Mark Montgomery, Martin Hepp, and Danny Ayers.There are others, and I expect that we will hear more on this topic as ICANN rolls out the new gLTD program.