Have you been thinking about the concept of “weakly inferred meanings (WIMs)" – a machine-tractable representation of meaning from text – of late? An in-the-works organization, OpenWIMS.org, has.
OpenWIMS explains itself as an organization for defining standards in semantic text processing, and an open source implementation of a semantic analyzer, whose output is a collection of WIMs. The system, it further explains, is a semantic text analysis engine that lives on a syntactic semantic lexicon and expects as input a set of syntactic dependencies for the target text.
“The system takes natural language text and does its best to pull that text down to machine-tractable meaning representations that can then be auto-reasoned upon by computerized agents, and made available for data mining,” says Benjamin Johnson, one of the three people behind the emerging organization. “It’s how to get free-form text into something very well-defined and structured and that can be reasoned upon.”
As he notes, disambiguation is the big bugaboo in processing text automatically – Mary may have had a little lamb, but does that mean she is a little girl with a pet sheep, a young lady who ate lamb chops for dinner, or a newly minted mother ewe? “These are radically different interpretations of this same string of text, so you need to know which is correct or you are apt to make errors if you get it wrong,” he says.
A 70 or 80 percent accuracy has been the norm, but it can get better with knowledge-based methods like the organization’s own approach to semantic analysis, Johnson says. It won’t happen, however, unless you can penetrate the knowledge acquisition bottleneck. And the project seeks to overcome that by providing a community-driven, moderated and open knowledge base designed specifically for use in textual semantic analysis, including word sense disambiguation, semantic relation detection and so on.
Jesse English, also one of the founding project members, says the roots are in work that the team has been involved with in the past, including developing a large-scale semantic text analyzer whose theory is academically available and open but whose carefully structured knowledge base is not. “It’s the knowledge that all the theory lives on that is not available and that is a monumental feat to construct,” he says. “You can read how great a system works and say, ‘I want to do it, too,’ but if there’s no knowledge to back it up it won’t do anything.”
So far, the project has what English says are some very rough rudimentary knowledge resources as a starting point. It has as a base an ontology to provide a common language and context for the semantic representations of text for performing reasoning about a WIM, and a lexicon to allow the WIM processor to translate an input text into WIMs by applying macro-theories. Both can be updated and customized.
Everyone Into The Knowledge-Building Pool
For the vision to really take off, however, “one of the big things that has to happen is to open knowledge up in such a way that the community of those interested in this can access this knowledge and edit it in a collaborative way, in a community-moderated driven way, very Wiki style,” English says. The aim is to do it with an eye on ontologically-rounded, lexical-sensitive, computationally linguistic-savvy descriptives, to help with determining meaning possibilities in the context in which they exist.
“Our primary goal, or one of them, is to fuel all this, to have a system available that lets people who want to help this go on and add to it," English notes. "Hopefully that is a scalable way of knowledge acquisition.”
Schema.org and Freebase, says Benjamin Bengfort, the final originating member of the trio behind OpenWIMS, are projects of equal magnitude seeking to solve different problems that have seen success in the community. “We are hopeful we’ll get the same sort of contribution and also come up with automatic ways of having knowledge bases generated by crawlers,” he notes. Another step in the knowledge acquisition ladder is to come up with an intelligent strategy for helping contributors to work smart, focusing on places where knowledge acquisition is desperately needed.
“We want strategies and automatic methods to help direct people to the hotbeds of where we can gain as much ground as quickly as possible,” says Johnson. An aim is for the system to report when it is unsure of the right meaning because it has insufficient knowledge to disambiguate a lexical token rather than guessing at one, and automatically alert the community to the context issue.
The short list, the founders say, is making knowledge publicly available and setting things up in a way that isn’t super-scary, so that someone can feel confident in making contributions to it. Longer term, the possibilities are vast when computers can raise their accuracy in semantic text processing and analysis, via enriched meaning repositories.
“The future utility is almost everything,” English says, “if we could use natural language and have our computers give us answers they can seek out because they dig through unbelievable volumes of information in a small amount of time to give you what you are looking for.”