Big Money and Big Rich Data: Why Wall Street Should Approach NoSQL Carefully

by Robert Greene

Wall Street’s getting all sentimental on us. Not in the sense that they are all of a sudden pining away for some form of the past. No, what I mean is that new trading strategies that look beyond just ticker quantitative financial data to incorporate information from sources like RSS feeds, breaking news, and “unstructured” Twitter feed data are increasingly the Wall Street norm.

But these new strategies quickly run headlong in to some serious high performance computing challenges, which is exactly why Wall Street is taking a shine to NoSQL.

Essentially, what financial institutions are trying to do is add elements of sentiment analysis to existing trading strategies and algorithms. The thought is that if they can correlate information from things like news feeds with current data about the market value of a position or asset, they can create a “network of networks” data model that lets them see how changes in one system affect another. So, for example, rising geopolitical concerns near the Strait of Hormuz can be correlated with the risk of trading oil-related assets, and automatically adjust investment strategies in real time.

Doing this is not trivial, and is why I’ve heard a lot lately – and will hear more tomorrow at the High Performance Computing on Wall Street event – about how it is changing the analytics and database technologies they use to build such applications.  Many attempts to construct these “network of networks” data models have tried to stuff these new information streams in to the model using the existing mathematical algorithms to make predictions. To put it bluntly, that doesn’t really work for Wall Street’s needs. These older algorithms are based purely on sampling and statistical analysis of lots of quantitative data.

But trying to take the qualitative sentiment among whole populations, transform it in to something that can be quantitatively measured, and then correlate that to lots of financial data requires a completely new class of algorithms that don’t just use sampled sets of data. It requires using the full sets, and, because this is Wall Street after all, processing all that data in a meaningful timeframe (read: very-near-real time).

This is of course the type of situation where, as I’ve mentioned before, NoSQL versus more traditional information management systems, like relational databases, start to look extremely appealing. But there is still a word of warning for many larger enterprises, especially like those on Wall Street, that need to think about not just large, simple volumes of data and performance at speed, but in addition very rich data models and things like concurrency of the system, too.

In the last five years or so, a great deal has been learned about how these “network of networks” data models actually function. In particular, it’s now known from examining things like Google’s Web Spider algorithms that the structure is almost exactly the same for an extremely high percentage of all networks, and armed with that structural knowledge,  when an analytic question is asked, a large percentage of the data in these networks is immediately identified as completely irrelevant to analysis and can be ignored to get quick answers for things like what Wall Street wants to do. This causes a completely new set of programming and information management system requirements that makes many of the first generation NoSQL technologies unsuitable for these advanced, complex analytics applications in an industry where both speed and accuracy are of equal, and intense significance. Wall Street should be discerning in how it approaches NoSQL. Stay tuned for feedback on how NoSQL is received at next week’s event.

Related Posts Plugin for WordPress, Blogger...

Leave a Reply

Your email address will not be published. Required fields are marked *