Applying Ontology, NLP and Semantic Search To Enterprise Collaboration

By   /  February 3, 2009  /  No Comments

This paper describes how advanced semantic web and natural language techniques can be used within the context of enterprise collaboration to solve concrete user problems.

This paper describes how advanced semantic web and natural language techniques can be used within the context of enterprise collaboration to solve concrete user problems.


At our mission is to significantly improve enterprise collaboration. In the last few years, enterprise collaboration has advanced significantly with the introduction of wikis, discussion forums, document repositories etc. Our objective is to take these collaboration tools one-step further by making them very easy to use and make them “smart”. By “smart”, we mean that they leverage meta-data such as historic user behavior, semantic information, and language analysis to make the usage of these tools easier and more efficient. We measure efficiency in terms of:

  • How well information and experiences within an organization are reused and applied
  • How fast organizations create and agree on specifications, analysis, decisions, etc.
  • How efficiently questions and issues are resolved
  • How this information is reused and re-applied when the same or similar issues reoccur. 

In order to take wikis, discussions, files, etc. to the next level in terms of efficiency, we have selectively applied advanced techniques such as natural language analysis, semantic web techniques and collective intelligence technique. In this paper we discuss why we think our approach makes sense and may be more effective for enterprise collaboration.

The majority of discussions on semantic search, semantic web and collective intelligence are centered on the application of these techniques for the consumer market. These applications generally consider very wide and general usage like semantic search. It also means we expect hundreds of thousands or millions of users, which is required for some collective intelligence techniques to work efficiently.

In contrast, enterprise collaboration may involve anywhere from ten to tens of thousands of users, and the applied techniques must work over the whole range. In addition, we can make assumptions about the area of discourse within business collaboration. For example, we can apply ontologies for the area of collaboration in a way that is not possible for the general universe. Since we operate in a business environment, we can make certain assumptions about participation and also expect users to contribute to the collaboration and the creation of meta-data.  We can also help influence the meta-data as the user population is contained within an enterprise or customer community.

Key Problems in Enterprise Collaboration

Enterprises already facilitate collaboration within and amongst enterprises. Enterprise collaboration is, however, still mainly done through email, phone conferences, file servers, instant messaging and the like. As an example, information can be created locally and passed around using email for comments and enhancements. Decisions and supporting information is stored in email inboxes; making the information practically inaccessible for all but the directly involved users. It also evolves over several unconnected email chains making it difficult to establish context and see every piece of information involved in a decision or question.

So a key problem is the lack of mechanisms for capturing and connecting all information that is generated through collaboration, as well as efficiently creating collaborative information. The information explosion makes it difficult for employees to track what is going on, and what they should focus on. There are few effective mechanisms for keeping them informed and involved without participating in an excessive number of email threads and messaging conversations.  The increased complexity of decisions requires more people to participate and be informed about the decision-making. Tools commonly used today make this process inefficient and puts a heavy burden on users. Deliverables such as decisions, specifications, offers, and contracts must be produced efficiently often involving many different persons in an organization; this is cumbersome with current practices. All these issues are magnified by an increasingly distributed work force.

With new collaboration tools, organizations can now efficiently create and organize the information. The next problem that occurs is how individuals find the information that is relevant to them and helps them solve their everyday business tasks. In addition, the systems must help users keep current and informed, without being overwhelmed by the information. Good collaboration tools go beyond the information. They also provide a social angle helping users seek out other users that may know what they need, or find information based on similar interests and profiles. Groupswim is built to address the fundamental collaboration need and to excel in aiding users in organizing and  finding resources that helps them in their everyday tasks. The resources can be files, questions, answers, wikis, groups of people or individual users.

In the following sections we discuss some concrete problems and how we apply semantic and natural language technologies to provide useful functionality.

Ontology and NLP Assisted Categorization

The Problem and the Function

Tags have emerged as a very popular way of categorizing information in so-called folksonomies. These categorizations rely on the assumption that many users categorize the content, and that common wisdom of these users will lift forward the best and most appropriate categorization of information.  This works well for large sites with many users categorizing information, but not as well for services with fewer users, such as enterprise sites with just perhaps tens or a few hundred users. With fewer users, assigned tags may not represent the common categorization of content as reliably as a large number of user assigned tags would.

In addition, tags are generally used by relatively avid Internet users who understand how tags will help them find information at a later time. Within an enterprise, we want to encourage all users to help categorize content. In order to do this, we need to proactively help the user in this task.

With these problems and needs in mind, we have designed a function that makes it very easy for users to assign appropriate tags. It analyzes the user provided content in real-time looking for appropriate tags, and it uses site-specific meta information to help streamline and make categorization more consistent and applicable to the topic areas of a site.

The Challenge and the Implementation

The function suggests in real-time appropriate tags for a discussion post, wiki or file. The suggestion is based on language analysis of the content and guided by ontology and preferences defined on a site level.  Language analysis can find appropriate terms and words based on text categorization techniques. This usually provides a good result when you consider an individual post. When you need to put the post into a context and relate the post to other information, considering only the individual post will not cut mustard. In order to address this, we allow a site to create a context that describes semantic relations between terms used within the specific context of the site. Consider the picture below.

Figure 1: Koister

Figure 1: Koister


The picture describes a simple ontology for a fictitious site. It expresses that the term United is an airline, likewise for SAS and American. It also states that these airlines are all competitors.  A site can also specify synonyms and preferred terms. As an example, a site may state that ist prefers the term Enterprise rather than Company. All this meta-data is considered by the tag suggestion function.


Combining natural language processing, site-specific ontology and semantic lexicons to guide the suggestion and application of tags makes it easy for users to find appropriate tags. It also makes tags more consistent with the terminology, semantics and usage within a site.

Ontology Aided Search

The Problem and the Function

The general search problem is an enormous challenge.  There are many different approaches ranging from the statistical approach of Google to the natural language approach of Powerset. These approaches try to solve the general search problem. Solving the general search problem is challening due to the lack of contextual information available to guide the search.

In contrast, our aim is to make search within the context of a site as efficient as possible. We can leverage the fact that we know things about the site such as what the general topic area is, preferred terms, semantic relationships between terms, etc..

The function we provide is that we assist the user while conducting a search. Lets say a user searches for “Airline” with limited result. We can then derive that in the context of this site, “United” is an airline and present that as a result to the user. Our search function enables users to widen or narrow searches based on the semantic relationship that are included.

The Challenge and the Implementation

We have implemented ontology and semantic information as an RDF database. Users can add semantic information to their site-specific ontology, and they can synonyms and preferred tags. We also import certain general models such as word-net to provide some base data that we believe are applicable for all sites.

The big challenge is to assist users to build these site ontologies. We are working on solution where we weave the questions into the interaction model allowing the user to build the ontology as part of the general interaction. For some topic areas there are existing ontologies that could be imported. We have also prototyped interaction with semantic databases such as Freebase for collaborative building of ontologies.


We have found that within the context of enterprise collaboration using ontologies, language semantics etc. can greatly improve search, which is a critical function to drive productivity. The reason being that it is more likely that the ontology and semantic area is relevant for the topics of the site.

Real-time Discovery of Related Information

The Problem and the Function

I am sure everybody has seen different people send out email asking the same basic questions. Collaboration tools are supposedly addressing some aspects of this, but still people tend to ignore forums and collaboration tools before asking questions. In our never ending quest to simplify finding relevant information for users, we have developed a function that automatically finds relevant information in real-time. The function takes a snapshot of what the user has typed in as well as any available context information.  It uses this information to find related information in the GroupSwim site and presents it back to the user. The user can then elect to either omit posting and instead reply or augment an existing post or just determine she found the answer.

The Challenge and the Implementation

The implementation is challenging since similarity and recommendation techniques normally requires processing over a complete dataset. This would not be a viable solution for a real-time recommendation function. We therefore do an approximate similarity analysis over a selection of elements. We then expand and deepen the search around the elements that are most similar.

We use clustering techniques for preexisting data. We then do real-time similarity calculation for all median elements at a specific level in the clustering hierarchy. We pick the clusters with the median elements that are closest to the post being written. Then we reapply the algorithm at that level in the hierarchy.

In addition, we use common language analysis techniques to identify questions to further filter the information. If a user alters the text during the interaction, the algorithm must be reinitiated.


This feature is new in our system and we do not know yet how well our first release is perceived by users. We do think it will help users very much by reducing the time to find relevant information, and reduce the amount of redundancy in a site. Rather than having many post on the same topic, we hope this will cluster information so that it becomes easier to find all relevant information concerning a specific question or topic area.


We are applying natural language and semantic technologies to enterprise collaboration. By taking advantage of the fact that we deal with narrower topic areas and a motivated user base, we can build functions that are more powerful that are possible for the general use case. We have illustrated three functions that take advantage of semantic techniques and models in order to improve the efficiency and usefulness of enterprise collaboration services.


You might also like...

A Brief History of Data Modeling

Read More →