Many operational big data applications have predictive analytics at their core. So, given how much of the business may be riding on a predictive analytics infrastructure, business professionals have to ask: how stable is this infrastructure under the chaotic, complex, dynamic conditions that I find in many operational environments?
Predictive models are usually built around specific scenarios with well-defined dependent and independent variables in expected distributions. What predictive models thrive on are linear relationships among variables – in other words, the sort that can be most effectively defined using regression modeling. But what happens when the underlying reality being modeled becomes non-linear – in other words, when seemingly inconsequential new events, neither expected nor modeled explicitly, render formerly powerful predictive models impotent?
That’s the famed “butterfly effect” of chaos theory. It’s technically defined as “the sensitive dependence on initial conditions, where a small change at one place in a deterministic nonlinear system can result in large differences to a later state.” There are many mathematical techniques for modeling non-linear relationships, but, given that these are highly specialized and often unfamiliar to business-oriented data scientists, you probably haven’t incorporated any of them into the sorts of predictive models that drive your big data applications.
The next frontier on operational big data applications is the Internet of Things (IoT), which is likely to become a deepening vortex of butterfly effects waiting to happen. This is due to the fact that non-linear effects are likely to be far more prevalent in IoT environments – such as smart grid and real-time distributed process monitoring – than in traditional B2C-oriented big data applications. The chief causes for these effects will be the continued expansion in new IoT endpoints and growth in these endpoints’ generation and consumption of a wider range of messages under a broader range of operational scenarios. If nothing else, the sheer combinatorial explosion in IoT interaction patterns is a recipe for chaotic traffic loads.
Think about it. Every new sensor, gadget, system, and other device that enters the IoT becomes yet another butterfly, and every new piece of data it emits or action it takes becomes another flapping of the butterfly’s wings. Throughout the world, as more of these butterflies come online, produce and consume more data, and cavort in countless combinations in every possible circumstance, the non-linear effects are almost certain to intensify. How can we do effective predictive analysis under those conditions?
These thoughts came to me as I read a recent article by Geoffrey West. He focused on the accelerating complexity of distributed big data systems and called for a “big theory” to encompass it all and enable better prediction of complex behaviors. While reading this article, though, it occurred to me that we already have such a theoretical framework: the chaos and complexity theories developed by IBM’s Benoit Mandelbrot and others. It seems to me that we can’t truly harness and control the coming global IoT if we don’t revisit these theories with a renewed emphasis on prediction under chaotic conditions.
The vision of a planet-wide optimization depends on keeping the butterfly effect under control in operational IoT clouds. But how will that be possible?
One key approach will be to deploy federated IoT clouds. In this scenario, which I expect to see first in autonomic smart-grid applications in energy and utilities, each cloud is a distinct domain (business unit, region, application, etc.) with its own dedicated predictive infrastructure that ensures continuous, closed-loop local optimization. In addition, the IoT cloud domains would be loosely coupled from each other, lessening the likelihood that anomalous non-linear events (e.g., “butterfly effects”) in one or more of them don’t trigger chain reactions that cascade across them all.
Potentially, big data might address the solution end of this vision in the form of an event management bus shared by the federated IoT clouds. Non-linear predictive models and associated rules that leverage the pooled real-time event data on this shared bus could act as shock absorbers that prevent the butterflies from running riot globally.
What do you think?