Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening. One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information.
The 2010 World Cup was a notable first not only for Spain, but also for publishing and the BBC. This is because the BBC’s coverage of the tournament marked a dramatic evolution in the way content can be delivered online. This new system was labeled dynamic semantic publishing (DSP) by the team of architects–including Jem Reyfield and Paul Wilton–that created it. DSP was soon defined as “utilizing Linked Data technology to automate the aggregation and publication of interrelated content objects.”
The use of DSP may be undetectable to the viewer, because it is more about the gathering and organization of information than about its presentation (the “plumbing”, so to speak, rather than the interface). Its implications, however, are astounding. Imagine every event of the Olympics broadcast live on 24 HD streams, all accessible over the internet, with live, dynamic data and statistics on athletes. That is exactly what the BBC is planning to do using its evolving DSP architecture for the 2012 summer Olympics. See for yourself:
How is this possible? In part, because DSP enables wider, richer, and more varied coverage by reducing the overhead necessary to organize content and decide what goes where. According to technical architect Paul Wilton, the BBC gained “efficiencies in the Journalist workflow by freeing up journalists to do what they do best and author content, and letting the semantic automation choose what content gets onto a page.” (To learn more about how news organizations approach Dynamic Semantic Publishing, check out this interview with Ontoba’s Paul Wilton.)
But wait, there’s more! One of the newest not-so-secret weapons in the BBC’s DSP publishing arsenal is fluid Operations’ Information Workbench. Information Workbench supports, in the words of Senior Architect for R&D Michael Schmidt, “the whole data interaction process.” For high-volume publishers such as the BBC, this tool supports the authoring, curation and publishing of ontology and instance data following an editorial workflow. The Information Workbench also helps other publishing houses to automatically generate content using both structured and unstructured data, enrich content using metadata, and analyze and visualize data from different sources. If you are thinking, Um, what?, don’t worry and read on.
More simply put: it is, just as the name suggests, a workbench to play with information. Combined with other technologies, it works as though living inside it is a virtual, extraordinarily intelligent librarian, telling you how the information you have is related to everything else you’ve published as well as all Linked Open Data on the Web, plus data (structured or not) in databases that you specify. “Well, I see that you have written about the high jump competition, that is related to the track and field, and here are links to everything you’ve published about the high jump, and here is what Wikipedia says about each competitor, and here is all the recent data published about the athlete from Greece… Would you like to make a graph showing how her jumps have improved over time?” …and so on and so on ad (almost) infinitum. Then, the virtual librarian also offers up, “and while we are at it, I will annotate your content now so it can be linked to all of this other stuff in the future.” The best part (for beginners) is, you don’t need to know much about RDF, SPARQL, OWL, or other semantic languages to use it. (Check out this interview with fluid Operations’ Michael Schmidt to learn more about how Information Workbench works in the context of DSP.)
The most exciting development offered by Information Workbench is the ability to “plug-in” one’s own preferred information sources for analysis, and easily create widgets and API’s to mash up, visualize and publish content/data. These capabilities will undoubtedly come into high demand as journalists (with and without math degrees) struggle to extract meaning from the billions of terabytes of open, linked data becoming available on the internet.
There are many companies that offer tools that use natural language processing and metadata extraction to help publishers categorize and organize content, pull in related linked data and information, and improve search. Zemanta goes one step further to help small-to-medium sized publishers syndicate content as well as augment their own publications with related content from across the web. Each of these tools has their own specific use-cases, and I look forward to learning more about these tools and the organizations that are using them at the 2012 Semantic Tech and Business conference in San Francisco next week and sharing what I’ve learned for Part 2 of this series.