by Angela Guess
The Wall Street Journal recently reported on the value of mining Big Data across industry. Citing the McKinsey Global Institute, the article states, “The definition MGI uses for big data is deliberately vague and not based on a specific number. It refers instead to sets of data which are too large for current conventional database tools to capture, store, manage and analyze. It also takes account of the differences between sectors in the type of data and software available.”
The article goes on, “The ability to process this vast flow of data in real time will become a business imperative. In a traditional environment, information is gathered, put in a database, which is stored on a disc, and then indexed. Queries are then run against the database. But traditional databases are simply not up to the task of storing and handling the sheer quantity of data. It requires new types of databases able to span tens, hundreds, even thousands of servers. Yahoo, for example, runs a database cluster that spans 40,000 servers. Advanced processing power is necessary in an environment where decisions are made in nanoseconds. At the very least, databases have to be stored in memory, demanding massive increases in server power and storage.”
It continues, “At its most extreme is a process called ‘complex event processing’, which instead uses the flow of raw data and matches the query against it, looking for patterns. Typical uses of CEP include high-frequency financial trading, but
other examples include sending a game player a special offer at exactly the right moment in a game, persuading them to make an in-game purchase. This is a point picked up by Kristian Segerstrale, co-founder of social games company Playfish, who says: ‘Getting the information is important, but even more important will be the ability to react in real time to data and structure experiments to learn empirically from user behavior.’”

















