by Ian Rowlands
I’m wrestling with what started out as a simple question for me, but which is becoming more complex daily as I talk to people about it. As the Big Data phenomenon has more and more of an influence on the shape and direction of the way Information Technology supports business, what does that mean for Data Governance?
I thought I had this clear in my mind, and was getting some kind of consensus from people I was talking to. Recently, however, two themes have started to emerge that at least are making me think a bit more about it. I thought I’d share the issues and see what the Dataversity community might have to say about them.
Where I started was with the notion that once the dust has settled Data Governance would still be a single discipline, with the fundamentals not much disturbed by the arrival of the fascinating new classes of data parked under the “Big Data” umbrella. (Actually, “single discipline” might better be expressed as “meta discipline” incorporating data stewardship, issue management, lifecycle management, privacy and security management, metadata management, master data management, data quality management, data integration and business process management.)
The first theme to disrupt this comfortable view is the discovery that a lot of the Big Data specialists I’ve been talking to couldn’t care less about Data Governance. The argument seems to run along the lines that “we trust the algorithms that we run the data through, and so we trust the conclusions – so who needs Governance?” The counter proposition, of course, is that knowing where the data comes from and how authentic it is, how it’s going to be used and the decisions it drives, and what data was used to drive decision, will be critical. But what do you think? Does Big Data not need Governance? Will “data” increasingly be processed with “Big Data” technologies, and will the convergence of these two issues eliminate Data Governance altogether?
The second theme is about the recognition that “Big Data” is not just bigger, but qualitatively different from “data”. Big data is stored in its primal state, uncleansed and untransformed. The notion of “data quality” meaning “data accuracy” shifts more explicitly to a notion of “data fitness for purpose”. That means that data quality is much more multi-dimensional than it used to be. (Actually a better analogy might be between scalar and vector quantities). Perhaps that has a beneficial impact on Data Governance, pushing it towards being more relevant to business users, and less a technical ghetto?
To make it clear, I don’t buy the death of Governance, and I think implementing Big Data solutions without Governance is likely to lead to trouble … And none of this changes (admittedly self-interested) perspective that says Data Governance is impossible without metadata management. But what do you think?