by Sue Geuens
We all know that data has been “big” for many years, so why now is there this concerted effort and discussion around it. I believe it is because organisations are beginning to realize that they have a lot of information about almost everything they do that could give them a better insight into how to do things better, more effectively, with a bigger chance of a bigger profit and to gain that competitive advantage we all talk about.
That being said, it’s all very well to keep storing the data – but what now? The price of hardware and storage has dropped substantially over the years, the capability of storage to manage huge data sets has improved drastically – BUT – the tools to analyze and extract meaningful information from this multi-level department store of data are just not there. Yes, we have the traditional analytical tools around BUT they are based in many instances on using smallish data sets that we already know and understand (to a certain extent), adjusting the data to fit our idea of quality and then extracting the information into a valuable format and displaying in some kind of dashboard.
Big Data – by its very nature, is not likely to be as well organized as relational or hierarchical data which we are all used to and fond of. We are going to have to start thinking differently. Firstly we are going to need to stop thinking that we can “fix” this data – improve it’s quality. This would be like trying to clean the Challenger Space rocket with a toothbrush! There is just too much to clean.
What we are going to have to do is start to analyze the data in such large data sets so that the errors and impurities don’t affect the result in a meaningful way. We are going to have to move away from expecting perfection to allowing assumptions. We are also going to have to do more results based thinking rather than the current way of design thinking. I am not saying we can’t and won’t design – but we will need to be designing from the end backwards rather than the beginning forwards. What are we looking to see as the outcome? Knowing this will allow us to mine the data for this information. It is going to be impossible to analyse such enormous data sets to “see what is there” – our current traditional approach. We are going to have to assume that the information we want to obtain is “there”. We are also going to have to assume it’s fairly accurate and that any errors/omissions/anomalies are not enough to skew the outcome significantly.
So why are we not doing this already? Why all the talk but not too much action? I believe we don’t quite yet have the tools we need to turn this data into information. I also believe that we don’t yet have our minds around what exactly we need to get out of it. We are still in the process of beginning to manage our data in the way that the DM profession recommends: governance, policies, controls, standardization. I believe we may have to throw some of that rulebook away for Big Data. We may have to rethink our pre-conceived idea that we can control this beast. We may have to let the beast reveal itself totally before we find ways and means to work together and create something useful, trusted and valuable. I am hoping the journey will end up being more along the lines of the Transformers with Optimus Prime than Godzilla or King Kong.