OpenText Takes Next Steps In Automatic Content Classification

otextby Jennifer Zaino

OpenText yesterday made its secure file sharing and synchronization product, Tempo Box, available for free to customers using its OpenText Content Suite enterprise information management tool.

“A lot of our customers have major concerns about employees sharing documents with cloud tools like Dropbox,” says Lubor Ptacek, vp of strategic marketing. They want them to be available, synched and sharable across all their devices, but using such services can create security and compliance problems. By deploying Tempo Box on top of their existing infrastructure, at no charge to all internal employees and any external parties they may need to share content with, companies get a seamless and cost-effective way to share files in the cloud without compromising security, records management requirements and storage optimization, he says – “the things that enterprise customers care about, especially those operating in regulated environments.”

Among those capabilities is applying automatic content classification, which is usually required for records management reasons – for example, helping companies determine if a document is an employee record they must keep for five years or a tax record they have to hold for seven years. That under-the-hood classification engine is an outgrowth of OpenText’s acquisition a few years back of text mining, analytics and search company Nstein. Since the acquisition, says Ptacek, the company has been looking at ways to apply the technology to specific business problems and make it part of its applications.

It has launched, for example, its Semantic Navigation product to target website content and automatically recommend related articles to visitors, to keep them at the company’s website. Its Auto-Classification for Records Management product helps organizations deal with retention issues, litigation risks and storage and eDiscovery costs, removing the burden on business users to manually identify records and apply classifications to content including unstructured information.

Ptacek says that companies can achieve an 80 to 90 percent accuracy rate, far beyond the 60 to 65 percent they can expect to see if they manage to get their employees to abide by manual practices.

InfoFusion, which replaces a variety of individual information applications—and their associated indexes, connectors, hardware, and support—with a common information management platform, also uses its content classification engine. “It helps you plug into multiple repositories and navigate access to their content, otherwise for every search query you’ll have a lot of results,” he says. The classification engine enables automatically and dynamically grouping content based on patterns discovered in results sets – such as extracted entities, to bring together all documents from connected repositories with a certain individual’s name in them, or by size of documents or other parameters. “That helps people to do something meaningful with content from a search- based paradigm,” he says.

Expect later this year to see more specific product announcements around InfoFusion. Ptacek says there is a lot of interest by clients in initiatives such as voice of the customer – they want, for example, to leverage the technology to apply sentiment metrics to content and commentary plugged in from many different online sources. “If we aggregate all that together, they ask can we apply the engine to measure the voice of the customer, and the answer is an absolute yes,” says Ptacek.