Today the Web celebrates its 25th birthday, and we celebrate the Semantic Web’s role in that milestone. And what a milestone it is: As of this month, the Indexed Web contains at least 2.31 billion pages, according to WorldWideWebSize.
The Semantic Web Blog reached out to the World Wide Web Consortium’s current and former semantic leads to get their perspective on the roads The Semantic Web has traveled and the value it has so far brought to the Web’s table: Phil Archer, W3C Data Activity Lead coordinating work on the Semantic Web and related technologies; Ivan Herman, who last year transitioned roles at the W3C from Semantic Activity Lead to Digital Publishing Activity Lead; and Eric Miller, co-founder and president of Zepheira and the leader of the Semantic Web Initiative at the W3C until 2007.
While The Semantic Web came to the attention of the wider public in 2001, with the publication in The Scientific American of The Semantic Web by Tim Berners-Lee, James Hendler and Ora Lassila, Archer points out that “one could argue that the Semantic Web is 25 years old,” too. He cites Berners-Lee’s March 1989 paper, Information Management: A Proposal, that includes a diagram that shows relationships that are immediately recognizable as triples. “That’s how Tim envisaged it from Day 1,” Archer says.
One of the Semantic Web’s own landmarks harkens back to 1995 and PICS (Platform for Internet Content Selection), the first fully designed-at-W3C spec, as Archer terms it, “which was a metadata standard for Web resources, which became PICS-NG, which became RDF.” Miller wrote an introduction to the Resource Description Framework back in 1998 for D-Lib Magazine; only the year before the W3C had formed the Metadata Activity Group, which included the RDF Working Group, and announced the first public draft of a work-in-progress of the framework. There, it cited its use of XML as the transfer syntax in order to leverage other tools and code bases being built around XML.
“The Zeitgeist at the time was XML for everything,” Archer says. “I was at a Sem Web meetup … on the day [this February] that RDF 1.1 became a Rec – and got a laugh when I apologized for RDF/XML.” The main issue with RDF/XML, Herman believes, is that it was not a “very elegant XML vocabulary for RDF, [and] it mixed up, in many people’s minds, RDF as a data model (which is, I claim, not really that complex) with the syntax. The complexity of the latter did cast a negative image on the former, and we are still paying the price for this.”
Still, the March 1999 W3C proposed recommendation of the first RDF schema specification must be given its due as a starting point, as was the 2004 publication of RDF and OWL, Herman says. While what was, for many, the very complex approach taken by model theory in both RDF and OWL have been the subject of many discussions and even controversy, Herman says, “nevertheless, these were clearly historical starting points, the first ‘big bang’ in terms of standard technologies. It was followed by many others. SPARQL is probably the most important one among those, but it all started there.” (Add last year’s SPARQL 1.1 Update language for RDF graphs to that list, too.)
Regarding the standardization of the first RDF specification, Miller adds that “the caliber of the individuals and companies from all over the world involved in the process was impressive. Intersecting this with the diversity of business interests and objectives even more so. Addressing these requirements in a generalized data model, grounded in the Web as a platform was an important milestone in helping jump start the process.”
Today, RDF 1.1 uses Turtle, a textual syntax that allows an RDF graph to be completely written in a compact and natural text form, with abbreviations for common usage patterns and datatypes. Archer says that the widespread use of Turtle, many years before it became a standard, was an important step for the Semantic Web.
A Bigger Stage…
Berners-Lee’s ‘Linked Data Principles’, published in 2006, was the start of a whole ’nuther Semantic Web Wave. The Linking Open Data Project, which like the SWAD Europe project that led to FOAF and the Redland RDF Libraries, were “among the many that have been funded by the European Commission, funding that continues to this day,” says Archer. The LOD project, he comments, gave us the LOD cloud diagram and DBpedia, which extracts structured information from Wikipedia and makes the information available on the Web.
The creation of DBpedia in 2007, Herman says, “was the start for the Linked Open Data movement. It was the seed for moving the Semantic Web out of the labs to practical publishing and usage of data on the Web.” Think of IBM’s Watson, which partially relies on the facts made available by DBpedia, he says. It beat Jeopardy champs Ken Kennings and Brad Rutter; and IBM envisions a major role for it in healthcare, among other vertical sectors, this year launching the IBM Watson Group to create an ecosystem around Watson cloud-delivered apps and services.
The next step in driving “absolutely massive adoption of data on the Web which is, in practice, bona fide Semantic Web data,” according to Herman, was the creation of schema.org in 2011 – as controversial as it was then and may still be for some. Although it was hugely controversial (and still is, I guess), it was indeed the next step, after the LOD, to get absolutely massive adoption of data on the Web which is, in practice, bona fide Semantic Web data. According to Google Fellow Ramanathan V. Guha, as of November last year, more than 5 million sites were using schema.org.
More recently, this past January, JSON-LD became an official W3C recommendation; the JSON-based format to serialize Linked Data, Manu Sporny told the Semantic Web Blog at the time, is designed “to make Linked Data accessible to Web developers that had not traditionally been able to keep up with the steep learning curve associated with the Semantic Web technology stack.” It’s another power tool in an increasingly powerful toolset. Think, says Herman, of that fact “that we have an incredible amount of data encoded in Web pages, using microdata, JSON-LD, or RDFa, that are bona fide Semantic Web data; I believe we have not yet begun to exploit the possibilities that are made available by those (beyond what the schema.org partners do, of course),” he says. “I do not think that seven years ago, when I took over the job from Eric, many people had hopes to have billions of such data sets around; I definitely did not.”
Application areas heavily influenced by these technologies run the gamut from life sciences to libraries, which worldwide “are now looking at Semantic Web technologies as a new way of organizing their services and catalogues,” says Herman. While discussions continue in that community, “I think that it is now reasonable to believe that in the foreseeable future Semantic Web will provide a linked set of information on humanity’s cultural, [and] literary heritage.”
Miller looks at the focus now on a Web of Data as “exactly the right one and reflects (at least my) vision of the Semantic Web. Making the simple things simple and the complex things possible was an early mantra of RDF,” he says. “It’s incredible to see the potential that Linked Data now has on many business, governments, NGOs, etc.” While he says that he had hoped to see more verticals adopt these Web data principles as a way of accelerating their industry, such examples as life sciences, libraries, museums, and archives are “great indicators as to what is just around the corner.”
…But Struggles To Raise The Curtain All The Way Up
Archer names many other important moments of significant to the Semantic Web’s development:
- Adoption of RDF by the Dublin Core Metadata Initiative;
- The start of the Open Data movement, which Archer defines as beginning with The Guardian’s call to free public data whose collection was funded by UK taxes in 2006; now have a look at how things have progress across local and national governments and other organizations here, at The Open Knowledge Foundation’s CKAN portal; and
- The growing acceptance of URIs as persistent identifiers for things that may not be Web pages (for instance, Archer notes, the European Commission’s INSPIRE Directive now allows URIs as identifiers in the geospatial world, whereas before it insisted on UUIDs.
But there have been other things, beyond RDF/XML, that have not gone as well as could be hoped. Herman bemoans “the failure to define a simple, understandable, easy-to-implement standard rule language,” Notation 3 (N3), he says, “is largely ignored by the community, Rule Interchange Format (RIF) has been, I can say, a failure; there have been attempts like SPIN, SWRL, etc., but none are widely used.” As a result of this he sees the loss of the possibility of easy inferencing, “one of the major potential advantages of using Semantic Web technologies.” A simple, standard rule language, he believes, would have helped with the schism in the Semantic Web community between the camp that uses RDF(S) and SPARQL only, and the camp that relies on possibly complex OWL (mostly OWL DL) ontologies and the necessary heavy duty inference engines, technologies and approaches that are ignored by many due to inherent complexity. Says Herman, “In the absence of simple inferencing the value proposition of RDF and company is way weaker than it should be.”
Archer is concerned by the division between different technology camps of another sort. “Ask a typical Web developer to write a SPARQL query and s/he’ll run a mile,” he says. “Thinking in graphs is harder than thinking in tables, and to write SPARQL you need to think in graphs. The perception is that RDF is hard, that we had to re-brand it Linked Data having failed the first time.”
Outreach, outreach, outreach to the “traditional” web developers’ community is what’s needed, Herman says, but it isn’t what’s happened in the past. “The Semantic Web crowd evolved following its own logic, largely influenced by a (very exciting!) academic community, and it did not pay enough attention to the huge evolution of Web Applications, of Web 2.0,” he says.
“As a result, Semantic Web technologies have difficulties being adopted outside a small circle (that is really an understatement…). Some of the latest evolutions (schema.org, JSON-LD) are steps in the right direction, but the jury is still out whether the SW technologies will ever be adopted (in some form) by the Web developers.” With the focus now on data as a valuable resource on the web, he’s worried that “a parallel set of technology will be developed, possibly reinventing the wheel here and there, by a different community. I regard that as the biggest danger for the long term development of the Semantic Web.”
What The Future May Hold
While Archer would disagree with those who have characterized the Semantic Web as a solution looking for a problem, he also understands that it is not a panacea. “In reality, the areas where is it most useful is where it has gained enormous strength and is being used to great effect,” he says. “There are other areas where it doesn’t offer obvious advantages or there are better solutions – and that’s OK.” The W3C’s Data Activity is aiming to try to resolve any animosity that some parties may hold toward the Semantic Web, and he’d like that to start by getting people to use persistent URIs as identifiers for everything from people to places, and from rock types to traffic counts, he says.
Herman’s buoyed, too, by “the number of applications that are staggering and exciting, even if they are/were not necessarily in the shape and manner we envisaged.”
So, what can we expect of – or at least hope for – the Semantic Web when it officially turns 25? Says Archer, “That it will be seen as the normal and natural way to get a lot of stuff done.”
Adds Miller, “That we stop calling it that and it becomes simply the Web.”
Here are some W3C resources and events to enjoy on this special day:
* Opening of webat25.org, the hub of its activities this year, with many birthday greetings on the site and around the Web, including in various social media channels using #web25, and
* Tim Berners-Lee will participate in a Reddit “Ask Me Anything” at 3pm ET today.