At the Semantic Web Business and Technology Conference in San Francisco in 2011, a company called Pragmatech presented a prototype of its CTRL semantic engine. Now, a little more than a year later, it’s launching products and services, as well as an API, for business and general public use.
The Daily Star, an English language news publication in the Middle East, is one of the early adopters of CTRL for its web site. The semantic technology powers the news site’s surfacing of topically related stories, summaries of an article, and entities extracted from it. Soon, readers also will be able to follow topics related to articles as well.
“Many semantic technologies do entity extraction at a shallow level,” says Dr. Walid Saba, who leads the R&D team at Pragmatech. “We go deeper.” As an example, readers of The Daily Star wanting to explore stories by following a key topic – a particular world figure as a diplomat, rather than in his or her other past role as a businessperson, for instance – will be directed to stories specific to that. Within the first couple of weeks of deployment, the news site more than tripled user engagement, Saba says.
There are many semantic engines that support the key requirements of entity extraction, analysis of text to figure out key terms or key phrases, summarizations, and semantic comparisons across text. But what Saba says sets CTRL apart is its focus on key topics: “What we do that no one had done, and why we can do topic relevancy or topic comparisons of text even in different languages, is to extract or identify and then analyze and determine key topics. We don’t mean phrases. A topic for us is like a concept but a complex semantic concept that describes the subject matter.”
As an example, take a phrase like global warming. Put that into its engine and Saba says it will return documents where not only is global warming not mentioned but climate change is, and is actually a key topic in the other document. “That is the ultimate goal of semantic technology – that I describe subject matter in my own words, and it gets me documents that are the key topic expressed using a different set of words.” That goes for documents in the same or another language. “Because our engine is truly semantic all we need is a decent translator,” he says. “Because words are not the key things, we just need to understand the gist of a story and a decent translator can do that for us.”
Its API will be free initially, the idea being for enterprises to leverage it as they see fit. It also is making available a plug-in for Office environments; users reading or creating content in a Word doc, for example, will be able to select a piece of text and have semantically relevant content it knows about served up. That can be videos (which Pragmatech is scaling up now from a few hundred thousand) or images as well as text documents. It’s also developed an extension for Chrome web browsers, where users can grab text from a web page and also retrieve topically relevant documents. Other end user tools and services to be released include a PowerPoint add-in, and a blogger content enrichment tool. Its semantic search demo will continuously increase in database size as it crawls more websites daily.
The team at Pragmatech says it can envision many other apps for end users and the enterprise alike, where they can use the power of semantic technology in creating and following content. And some verticals, like news agencies and media monitoring services, all require good semantic technology in terms of comparing content semantically, summarize it intelligently and extracting key topics, he says. The company also has seen interest from a content management system provider, that Saba said told it its metatag extraction to identify key topics in a document on its own is of great value for a content producer.
Another aspect that Saba says is critical is performance. “Semantic engines suffer a lot from doing deep analysis for every word, meaning and phrase,” he says. But CTRL is getting around that; its team of software engineers, he says, “did technology, not at the linguistic level, to make it run like a bullet.”