A New Take in a Data Science Platform for the Unstructured Data World

By on

data-science-languagex300by Jennifer Zaino

There is a new Data Science system for unstructured data, designed to help businesses uncover insight to drive performance and growth. Sensai co-founder and CEO Jonas Lamis would agree that that’s probably not the first time you’ve heard that phrase used by a vendor. But this time, he says, with the Sensai platform, things really are different.

Sensai co-founder and CTO Monica Anderson, whose resume highlights specialization in Artificial Intelligence and Cognitive Science, designed the solution to incorporate its new language, dubbed Content Description Language (CDL), atop an in-memory platform. CDL is a linguistic data flow language that users can leverage to investigate unstructured data in hundreds of data sources through precision faceted search, regular expressions, and Machine Learning algorithms in a single query, and from there realize precise, structured analysis results. Data sets around which to build investigations can include proprietary company information, reference data, social data, or hundreds of millions of public documents, web feeds and the like; Sensai itself provides reference data users can resource as part of the solution.

As Lamis sees it, many things help Sensai stand out from other Data Science platforms – the fact that it isn’t focused on uncovering results from a specific data type, such as Twitter data (a heavily commoditized market), and the fact that its multi-data source Data Science solution doesn’t require an army of consultants or millions of dollars to deploy. It’s meant to give the Global 2000 – or even the Global 10,000 – a more modestly priced entrance, in the $100,000 to $250,000 per year range. But beyond that, it is one that he says will “short cycle the process as an off-the-shelf solution that does 80 percent of what they need, and also lets them write their own investigations in CDL.”

Short cycling the Data Science investigation process is accomplished by Sensai building in many capabilities Data Scientists and related users will want so that they don’t have to create them or source them themselves. In addition to reference data, such as lists of public companies or words that can be associated with positive or negative sentiments, the platform provides pre-built investigations based on CDL (such as momentum and trends) and report templates. According to Lamis, it also wraps in the ability:

“To make it easy for a company to connect its internal data with other sets in Sensai, and it has the right collaborative sharing and data security in the middle of it, and the right visualizations on the backend.”

The company also continuously evaluates new data sources, reference lists, and Machine Learning algorithms to include in CDL. It makes what it deems to be the best new AI algorithms available to users, saving Data Scientists at companies the trouble of building similar algorithms themselves or digging through the increasing number of powerful algorithms flooding the market.

“With Sensai we evaluate those algorithms and release the most interesting ones into our language, so that our customers can turn them on and use them in the pipeline mechanism without having to do the heavy lifting and evaluation process,” he says. At some point Sensai may even build some of its own algorithms, but for now he sees its role as being “almost a marketplace for which ones make the most sense for the kinds of problems our customers are solving.”

For example, one algorithm it offers is a clustering algorithm to see which documents returned in response to a query are most like each other, and then attempts to label clusters of documents that are like each other based on words found within them. Data Scientists can help an analyst in a hedge fund market who follows Google, for example, by taking advantage of that algorithm within the context of a query to find documents related to the vendor and the concept of acquisitions. The analyst then can subscribe to that investigation so that it runs in the background all the time, and sends up alerts when something changes, like a growth spurt in content around the topic.

That, Lamis says, “might be the leading indicator for the kind of analysis you report on or stocks you trace, so you might want to investigate further.”

Smoothing the Data Scientist’s Path

CDL was constructed with the idea that tech-savvy business analysts could be a market, as it aims to be usable by anyone comfortable with handling Microsoft Excel macros. But among its early customers, it tends to be the Data Science team that gets on board first. They’re the ones chartered with building ways to gain insight from unstructured data, which can be slow and costly, or finding the technology and algorithms to best support their user communities, Lamis says.

Providing so many pieces of the puzzle they need to accomplish this in an off-the-shelf fashion is valuable, he explains – especially when you consider that there are more Data Science jobs open now, and will be for the next five years, than there are highly trained practitioners available. “All that has been driving slightly less skilled people into the business,” he says, and while salaries for Data Scientists are going up, project lead times stretch as these workers try to grow their expertise and business frustration grows.

“A platform like ours is the natural evolution of where the market needs to go – make things more out-of-the-box, available to a slightly less-skilled Data Scientist. Productize the mystery data scientists have been brewing,” he says.

And, with 80 percent of their issues addressed, they can focus on the more strategic 20 percent of Data Science considerations – being able to allocate more time to deciding what data sets are going to be interesting to analyze, to finding and conversing with the business experts who can explain important questions to ask, to exploring important patterns that result from queries. “They can take a more strategic role as a Data Scientist vs. having to code everything and applying different modules of open source technology to do it,” Lamis says.

He adds that the product also can help smaller companies address the issues that have in competing with big names to hire top Data Science talent. Many times, they’re just not going to win. But many of these companies do have on staff very technically-literate analysts who really understand their business – and those individuals are strong contenders to become users of the product a couple of years down the road, he believes.

“Hopefully, as our roadmap and product matures, we most likely will deliver more powerful analytics tools directly to the end analysts, and maybe even have more visual ways to create investigations.”

Sensai in Use Today

Initial adopters of Sensai include names in the financial services and industrials sectors, including engineering firm Siemens, Swiss financial services company UBS, and asset management firm WorldQuant.

At Siemens, the product was deployed in its Munich data center with an initial focus on compliance and audit use cases. In the last decade or so, Lamis reports, the company has aggregated millions of documents related to thousands of audits they conducted worldwide. That’s terabytes of internal content, with final reports living in an unsearchable legacy repository. “They were not searchable or analyzable and there was no way to get structured information out of them,” he says. The data was moved into Sensai and its consultants trained Siemens’ Data Science team there to create investigations using CDL. At a basic level, for example, Siemens now can leverage faceted search to look for insight about auditing and compliance efforts across divisions, geographies or projects.

“The business value exists by finding specific statements with documents – like commitments made to improve operations at a division. Without a solution like Sensai, identifying and quantifying those commitments is like finding a needle in a haystack,” he says.

At a large investment bank that Sensai is working with, the focus is on proprietary data it’s licensed from a feed of content about public activities on the Web, such as chatter about companies. Its desire, he says, is to apply “quantamental analysis” to this data. “Financial services companies see a competitive advantage if they can apply quant-type strategies to fundamental data, but in the past they’ve had no tools to do it,” he says. With Sensai, for example, these companies can find specific patterns in such data sources to indicate that a business is about to announce a new product, or an executive is being promoted, or other heads-up that competitors haven’t yet tracked, never mind acted upon. All those things can be seen through Sensai and can be counted and reported on and trended, and an overall momentum indicator can be shown to see momentum over the last 24 hours compared to the average of the previous 14 days, he says.

Sensai has raised $900,000 in seed funding from investors including Andreessen Horowitz, Formation8, and others. The solution can be deployed on premise or as a Cloud service.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept