by Sunil Soares
I had the privilege of participating in a panel on Big Data Governance. The other participants were Maria Villar, Global Vice President, Data Governance and Management, SAP and Eric Chacon, Global Head of Data Standards, Chief Data Office, Citi. Tony Shaw from Dataversity did an excellent job moderating the panel. I just published my book on Big Data Governance, so this panel was timely.
I will summarize a few of the topics that were discussed on the panel:
- Definition of Big Data
There seemed to be a consensus that the “Big” in “Big Data” is misleading. That is because Big Data can also have velocity and variety in addition to volume. There was some discussion about unusual data types like clickstream data, social media, smart meters, content analytics and telematics.
- Timing of Big Data initiatives
Most organizations tend to believe that Big Data is an initiative that will be addressed in the “out years.” However, on further examination, most organizations already deal with Big Data today.
Technologies like NoSQL and Hadoop have made it cost effective for organizations to analyze large datasets.
There was some discussion about the privacy impact of new datasets such as utility smart meter readings and telecommunications GPS data.
- Role of Data Governance
Data Governance Councils will need to assume the responsibility of governing Big and Small data in a combined manner. Data Stewards will need to become well versed in dealing with data quality, metadata and privacy issues relating to new data types like social media. The integration of MDM and social media will throw out some new challenges relating to entity matching and privacy.
Big Data Governance is an emerging sub-discipline within the broader discipline of Data Governance. Many of the questions have not yet been answered but they make data governance programs even more relevant today.