Are you starting to hear more about patents that relate to the Semantic Web space? There was an interesting discussion by Erik Sherman here on Facebook’s patent for automatic search curation as feeding its semantic search ambitions, for instance.
Generally speaking, in fact, patents are big in the news, with the passage last week by the Senate of the Patent Reform Bill, which has among its goals getting patents issued sooner — but which also is spurring concern, especially in the tech industry, about its impact on patent infringement actions.
Against this backdrop, and perhaps flying a bit more under the radar, was a U.S. patent (No. 7,882,055) granted to Digital Reasoning for its distributed system of intelligent software agents for discovering the meaning in text. Company CEO Tim Estes calls what the vendor has applied to its Synthesys technology a “bottom-up” patent.
Specifically, it covers the mechanism of measurement and the applications of algorithms to develop machine-understandable structures from patterns of symbol usage, the company says, as well as the semantic alignment of those learned structures from unstructured data with pre-existing structured data — a necessary step in creating enterprise-class entity-oriented systems.
So, in plain(er) English, it’s about using algorithms to bootstrap the creation of semantic models from large-scale unstructured data with minimal a priori information – in other words, to let the data speak for itself. It aims at being a fast route to entity-oriented analytics for harvesting critical facts and relationships across a spread of information in documents.
“We assume there’s no time or luxury for doing a lot of domain modeling,” Estes says, where human resources are dedicated to reading and tagging documents or extracting entities they already know they’re looking for from the data. Parties that have tried to leverage the area of unsupervised learning have not been able to make it work at the entity or concept level, much less at scale, he says. And on top of that, it’s the associations that you don’t know to look for in data ahead of time that can be the most valuable.
That’s important in the defense and intelligence community where much of Digital Reasoning’s current customer base is. But Estes expects that industries such as financial services, health care, and other enterprises — which are dealing with multi-terabytes worth of unstructured data in emails, office productivity documents, and the like — are going to need an efficient, Watson-like way to make connections across those uncharted terrains, maybe to better understand who has important insight on specific issues or even who might be talking about something they shouldn’t to someone they shouldn’t.
“Enterprises now are held accountable for corporate-level concerns, but they are composed of individuals communicating all the time, which can be a source of enormous gain if utilized correctly, or of risk,” Estes says. Document-centric tools can only point users to piles of documents that might have some relationship to the area of information they seek. But they can’t, for instance, give actual answers to questions a lot of companies must resolve – such as top risks in order of priority to list on 10K filings for public companies.
“Those risks should be in order of priority but how do you know they really are?” Estes says. And could that come back to bite you if litigation arises and it turns out that people in the company really knew that what was listed as risk #5 is really risk #1, and expressed those views in email or other unstructured formats among each other – but that information never reached the C-suite. “You need a system designed to answer questions for mission-critical decision makers and to work at large, large scales,” he says.
He also points to the opportunities – say, for knowledge discovery in medical device and drug testing where the first line of warnings about potential issues manifest in doctors’ notes about patients, which are used for billing and record-keeping, not in structured data and not for knowledge discovery. “You can leverage those notes for knowledge discovery to understand the real efficacies of drugs, devices or other things, and that’s pretty untapped right now,” Estes notes.
As interesting as what Synthesis does and what it patents is the new ecology it’s living on. It’s building on Hadoop, the open source platform for consolidating, combining and understanding large-scale data, and the open source Cassandra NoSQL distributed data store. This enables the level of parallelization necessary for analytics at horizontal scale and for more cheaply and efficiently dealing with sparse data – where connections between individuals of interest to data sets usually are to just a few dozen or hundred words across an enterprise corpus.
The company’s philosophy is to apply for patents only around key elements for its core proposition — it’s only filed for two in the last decade or so — and it’s equally focused on sharing its learnings with the open source community. Even as steps are being taken to speed processing of patent applications, it’s important to strike a balance if you hope to interact with thought leaders in the community, for their benefit and your business’ benefit, too. Says Estes, “Talk openly, execute effectively on engineering and product, and find the things that are really yours and enhance and protect those.”