As we prepare to greet the New Year, we take a look back at the year that was. Some of the leading voices in the semantic web/Linked Data/Web 3.0 and sentiment analytics space give us their thoughts on the highlights of 2013.
Phil Archer, Data Activity Lead, W3C:
The completion and rapid adoption of the updated SPARQL specs, the use of Linked Data (LD) in life sciences, the adoption of LD by the European Commission, and governments in the UK, The Netherlands (NL) and more [stand out]. In other words, [we are seeing] the maturation and growing acknowledgement of the advantages of the technologies.
I contributed to a recent study into the use of Linked Data within governments. We spoke to various UK government departments as well as the UN FAO, the German National Library and more. The roadblocks and enablers section of the study (see here) is useful IMO.
Bottom line: Those organisations use LD because it suits them. It makes their own tasks easier, it allows them to fulfill their public tasks more effectively. They don’t do it to be cool, and they don’t do it to provide 5-Star Linked Data to others. They do it for hard headed and self-interested reasons.
Christine Connors, founder and information strategist, TriviumRLG:
What sticks out in my mind is the resource market: We’ve seen more “semantic technology” job postings, academic positions and M&A activity than I can remember in a long time. I think that this is a noteworthy trend if my assessment is accurate.
There’s also been a huge increase in the attentions of the librarian community, thanks to long-time work at the Library of Congress, from leading experts in that field and via schema.org.
Bob DuCharme, Director of Digital Media Solutions at TopQuadrant and author of O’Reilly’s “Learning SPARQL”:
RDF and related technology like SPARQL are moving out from under the Semantic Web and Linked Data umbrellas to find more popularity in projects that didn’t set out to be Semantic Web or Linked Data projects. Companies like Yarcdata, MarkLogic, and IBM (with their RDF support in DB2) are finding that the flexibility of the RDF data model gives customers great ways to address challenges such as the Variety aspect of the 3Vs of Big Data without worrying too much about semantics, ontologies, or the four principles of Linked Data.
Seth Grimes, industry analyst, consultant and organizer of the Sentiment Analysis Symposium:
2013 saw continued slow adoption of Linked Data and expansion of the Semantic Web, far outpaced by the development of private knowledge graphs and focused search and query systems (often affording external access) from the likes of Facebook, Google, Wolfram Research, and Apple (Siri). A set of solution providers, as varied as NetBase, Digital Reasoning, and DataSift, are bringing similar capabilities, based on data mined from online, soical, and enterprise sources, to government and corporate users.
The biggest thing I saw was the growth rate of schema.org. Structured data markups have picked up quite a bit, and there are a whole bunch of new and interesting apps (like adding schema.org markup to Gmail and Pinterest’s schema.org-powered Rich Pins). We went from 2011 where people were asking why about structured data to 2012 when they were asking which vocabulary and now we are slowly reaching the stage where [the focus is more about] what is required of schema.org to make that work. We’re not there yet but that’s kind of heartening.
James Hendler, Tetherless World Senior Constellation Professor and Director, Rensselaer Institute for Data Exploration and Applications, Department of Computer Science and Cognitive Science Department, Rensselaer Polytechnic Institute (RPI):
The growing acceptance and use of schema.org in search is clearly an important trend. To me, the most exciting is not just the basic schemas released in 2011/12, but that a number of communities have proposed their own use of these, with the schema.org folks starting to take on more and more. Communities like genealogy, TV and radio, data sharing, civic services and other groups have been approved as extensions, and many more have been proposed.
This “lightweight” semantics is also seen in some of the work with JSON-LD, Open Graph Protocol and others. Many of these are not communities that have ever been involved in the early Semantic Web efforts or were among those seen as key players— but all of them are offering millions of pages marked up with machine readable metadata. I also was pleased to see Google hiring some people with Semantic Web background to help with Knowledge Graph and to see the hiring increasing across this area – my students with linked data and/or semantic web background have been hired at big companies, startups, and national labs – so we know we’ll be seeing more in the future.
Elisa Kendall, principal, Thematix Partners:
There has been so much hype around “Big Data”, where some of the infrastructure, including NoSQL technologies, have received lots of attention over the last several years. The lines between NoSQL and traditional database technologies seem to be converging, though, and supporting infrastructure is maturing. The challenge now is in getting access to and organizing the content flexibly and robustly (which from my perspective has always been the challenge). Recognition that we need more sophisticated ways of slicing, dicing, organizing, searching, and re-purposing large quantities of data, which may require rule engines and reasoners, recommender systems that build on those engines, etc., is evident on a number of fronts. Watson has moved out of IBM Research to become the commercial backbone for a number of very large initiatives in healthcare (e.g., Cleveland Clinic, Memorial Sloan-Kettering for healthcare decision support), insurance (e.g., Wellpoint for analysis of treatment options), and financial analysis applications, for example.
A number of standards initiatives related to infrastructure also were either kicked off or received increasing visibility this year, including Linked Data platform and API standards coming out of the W3C, APIs for Knowledge Bases and the Ontology Interoperability (OntoIOp) activities at OMG among them, with heavier-weight vendor support. IBM and Oracle are the primary authors of the Linked Data platform work at W3C, for example.
This year there also have been a number of revisions to and expansion of schema.org, and we’ve seen more and more examples of commercial usage. I believe this will continue into next year, with Google beginning to integrate even more vertical content into schema.org, and other developments in critical domains. There are a number of domain-specific ontologies in the works in the standards communities for various verticals, such as the Financial Industry Business Ontology (FIBO) and Financial Industry Global Identifier (FIGI) coming out of the OMG. The first building block of the FIBO effort was adopted as a standard just last week at OMG, in fact, and the pipeline with respect to financial instruments (securities, derivatives, equities, debt …) is growing with increasing participation from the banks and other stakeholders.
Semantic Web research is now firmly in sync with more established research areas such as NLP, machine learning, information retrieval and databases. On the business side, a reasonable number of practitioners is now in the process of evaluating and developing semantic web solutions in the day-to-day work process.
Bill Roberts, CEO, Swirrl
There was a growing appreciation that Linked Data works best in conjunction with other existing technologies: its strength is for web-scale data integration, but often end users of data want CSV or JSON. Luckily it’s quite easy to produce those from Linked Data. The W3C Data on the Web working group set up in 2013 could help take these ideas forward.
Amit Sheth, LexisNexis Ohio Eminent Scholar, Kno.e.sis Center, Wright State University
While, it is true that Semantic Web is not the “hottest” technology on the globe, and yes, it has not enjoyed the success of Machine Learning or the hype of Big Data lately, my view is contrary to the view that Semantic Web has failed.
Before I present my views on its continued broad-based progress, let me also identify two main reasons its progress has not been as fast as it could be. The first is scalability – my view that semantics can be scaled up is not widely accepted, yet. We need to succeed in convincing more people that indeed we can create domain models and background knowledge as fast as they can train machine learning algorithms, and that semantic techniques (e.g., for finding meaningful patterns, paths, and subgraphs) can be applied to deal with volume and velocity of Big Data, and no other approach can deal with variety as well. The second is lack of trained personnel who have the expertise to deal with the right part of Semantic Web – the part that focuses on Web of Data or Linked Data. The number of applications of the logic end of Semantic Web is still very small.
Now let me present why and how we are making smart progress at multiple levels – from small scale to large scale applications and impact.
- First, there are plenty of small, real wins, such as this effort to improve eCommerce.
- Second, there are a growing number of products that improve upon or address the deficiencies of widely-used techniques in IR, NLP, and ML (the importance of background knowledge has been widely recognized, e.g., see the discussion on Data Alone is Not Enough). This product requiring understanding of clinical notes is one of the rapidly growing examples.
- Finally, systemic changes are coming to Web scale systems where semantics is the primary differentiator. What can be better examples than the applications where lots of money is made: search and advertisement. The role of semantics for search/personalization/targeting/advertisement has been known and demonstrated in a commercial setting since around 2000 (see this interview, this patent, this talk, and this paper), but in line with the rule of thumb I have experienced, that technology maturation and scale out often take 15 years, semantic search and advertisements are receiving a full court press and are now indeed coming to an average consumer.
Nova Spivack, technology futurist, serial entrepreneur, angel investor and CEO, Bottlenose:
Google’s Knowledge Graph improvements, and the hiring of Ray Kurzweil, both signify significant interest at Google in semantics and machine intelligence as part of their core offerings. Kurzweil has stated recently that Google is going to read every Web page and book, using machine intelligence, to begin to change the paradigm of search to more of a dialogue experience. This clearly points towards a longer term goal at Google of becoming an intelligent assistant. I think Apple’s acquisition of Topsy showed further interest in this same theme. Perhaps Apple plans to use Topsy to learn in the wild in order to make Siri and other apps smarter. Both companies seem to be showing signs of pushing forward towards proactive intelligent assistance.