Even as semantic web concepts and tools are underpinning revolutionary changes in the way we discover and consume information, people with even a casual interest in the semantic web have difficulty understanding how and why this is happening. One of the most exciting application areas for semantic technologies is online publishing, although for thousands of small-to-medium sized publishers, unfamiliar semantic concepts are too intimidating to grasp the relevance of these technologies. This three-part series is part of my own journey to better understand how semantic technologies are changing the landscape for publishers of news and information. Read Part 2.
So far we’ve looked at the “cutting edge” of dynamic semantic publishing (BBC Olympics) and we’ve seen what tools large publishers such as the New York Times, Associated Press, and Agence France Press are using to semantically annotate their content.
And we’ve learned how semantic systems help publishers “Do More With Less”- that is, automate a lot of the work organizing content and identifying key concepts, entities, and subjects- and “Do More With More” - combine their content with related linked open data and present it in different contexts.
You may still be asking at this point, “What makes this so novel and cool? We know that semantic tools save time and resources. And some people say semantic publishing is about search optimization, especially after the arrival of Google’s Knowledge Graph. But the implications of semantic publishing are about oh so much more than search. What semantic systems are really designed for, to use the phrase attributed to Don Turnbull, is “information discovery” and, if semantic standards and tools are widely adopted in the publishing world, this could have huge implications for content and data syndication.
Remember the “virtual librarian” analogy from Part 1? Consider that perhaps someday every publisher will have their own “virtual librarian” systems capable of both spitting out and bringing in semantically linked data and content - then your stuff could be discoverable by any publisher (“You are writing about water quality in the Great Lakes over the past decade? I see Kristen Milhollin has made a video about water quality in Lake Erie over the same time frame, do you want to link to it or embed it or provide an excerpt?”). The mere act of using this system to link my data/content has just made me a one-person syndicator.
Now, if I want to keep it open -- that is, I want anyone to be able to see and use it -- then it will be given the widest reach possible -- it could just turn up automatically when someone creates related content. If I want to limit access -- for example to be paid for my content -- perhaps I could establish a direct relationship with a publisher that gives their virtual librarian automatic access to my content, or maybe one day a built in browser payment system will enable all publishers to see my content, but pay me directly to use it in their publications. This is what could be possible if semantic tools and standards become more widely adopted- publishers would get much better at finding, connecting, and helping people to understand very specific, relevant bits of information from the unbelievably huge store available on the internet.
Already companies such as NewsCred are taking in all kinds of content, applying semantics, and presenting it in an extraordinarily organized fashion so publishers can drill down to locate and use very specific kinds of content to augment or supplement their own.
The downside is that most of us small-scale publishers aren’t interesting to NewsCred, and we are using WordPress, Drupal, Joomla, or maybe even Movable Type if we are fancy. We don’t have the money to pay for an elaborate semantic publishing system, nor do we have the development budget or staff to configure it to work with our CMS’s, which is unfortunate, because, did I mention- we are producing most of the popular content (created within content management systems) on the internet, and a lot of it is even useful/interesting.
Whatever are we to do while we wait for the dynamic semantic publishing system for the 99%?
As it turns out, there’s a lot we can do to make use of semantic publishing tools already out there, or even cobble together our own from open source and/or free APIs available to us. This list is by no means comprehensive, but is a good place to start:
Zemanta is a tool that uses natural language processing (NLP) to extract entities within the text of your blog and enrich it with related media and articles from Zemanta’s broad user base. What’s best about it is that it is really simple to use- check out this demo here- and is easy to integrate with most content management systems (such as Drupal and WordPress). Because it can not help you re-order your own content, this is not full-scale dynamic semantic publishing, but it does use semantic tools to enhance content. If you want to know more about Zemanta, check out this interview with CTO Andraz Tori.
OpenCalais: Developed by Thomson Reuters, OpenCalais is the best known open semantic analysis tool. Compared to similar tools such as Alchemy, Zemanta, and Evri, OpenCalais shines when you want to understand what’s happening in the text with respect to facts and events, which often reveal the relationships between entities identified in text. For news publishers, this is important, as is Reuters' longstanding commitment to keeping OpenCalais free (for up to 50,000 articles a day) and well-maintained. The latest upgrade, scheduled to be released next month, promises to do a better job recognizing people, organizations, new entities, facts, and relationships relevant to elections, politics, war and conflict.
OpenPublish: For publishers willing and able to jump onto a new platform, Reuters teamed up with Phase:// Technology (pronounced "Phase 2") to create OpenPublish, a Drupal distribution with OpenCalais built in that works pretty much “out of the box” and which can be be easily skinned and customized. OpenPublish makes it easy to automatically generate topic hubs from concepts and entities extracted using Calais, and if it chooses to adopt Drupal’s more recent modules on the multimedia end -- especially for video -- could prove to be a truly powerful dynamic semantic publishing tool for any kind of media.
Drupal: By including RDFa into its core modules for Drupal 7, Drupal has distinguished itself as the semantic-web friendly (widely-used) CMS. Here is a step-by-step guide to building a “semantic Drupal” site using the RDF and other modules available in Drupal 7.
WordPress Plugins: The most popular semantic web plugins for WordPress appear to be Zemanta (mentioned above), the OpenCalais “Tagaroo” Plugin and the PoolParty plugin, which allows a blogger to import a SKOS thesaurus and, automatically provide inline definitions (as mouse-over effect) and links to thesaurus terms in a text. Whether any of these can create a dynamic semantic publishing platform is debatable, but they are each useful in their own right.
Because this list is not comprehensive, it would be great to hear about additional open source (or just free) semantic publishing tools that are accessible to smaller publishers, and especially your opinion about their effectiveness and sustainability. For example, I’ve been reading a lot about the Interactive Knowledge Stack (IKS) Project to encourage the wider adoption of Semantic Content Management Systems based on Apache Stanbol, but can find few reports by people who have used such a CMS. Any takers?
Next up in this series... Dynamic Semantic Video.