Loading...
You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Data Governance vs. Big Data Governance

By   /  February 20, 2017  /  3 Comments

Click to learn more about video blogger Stefan Groschupf.

Introducing the Big Data & Brews video blog series presented by Stefan Groschupf, Founder of Datameer. The series will touch on hot topics within the business of  Big Data, Analytics, Internet of Things, Machine Learning, Cloud Computing, Modern BI, NoSQL and Next Generation Technologies.

In today’s video blog Stefan Groschupf talks about the difference between Data Governance and Big Data Governance. The reality is that new technologies in the Hadoop ecosystem are not where they need to be for strong Data Governance, which is critical in order to move Hadoop as a center piece into enterprises.

About the author

Stefan is Founder of Datameer. Stefan Groschupf is a big data veteran and serial entrepreneur with strong roots in the open source community. He co-founded Datameer in 2009 after several years of architecting and implementing distributed big data analytic systems for companies like Apple, Hoffmann La Roche and the European Union. Stefan served as Datameer’s CEO until 2016 in which he led the company through a period of triple digit growth. He was one of the very few early contributors to Nutch, the open source project that spun out Hadoop, which is expected to be a $50 billion dollar business by 2020. Technologies designed and coded by Stefan run in all 20 of the Fortune 20 companies in the world, and innovative open source technologies like Kafka, Storm, Katta and Spark all rely on technology Stefan designed a decade ago. Stefan is a frequent conference speaker, contributor to industry publications and books, holds patents and advises a set of startups on scaling go-to-market strategies and product development.

  • George Firican

    I also think that with big data governance you now not only need to define and measure its accuracy, completeness, validity, consistency, and integrity as with traditional structured data, but also: timeliness (does data arrive on time?, does it meet refreshing schedule?, what is the time interval from collection to processing?); readability (content & format easy to understand); authorization (do you have the right to use the data?); structure (what is the ability to transform unstructured into structured data through technology); and of course credibility and context.

    • Mary Y

      Also something that should be considered is retention of data. How long should some data be kept? Does everything that is created need to be retained forever?

  • In an ideal world, governance between big data and data should be exactly the same. However, too often, the two are treated as if they are different. One thing that could help is modernizing technologies related to governance, which will lessen the divide.

You might also like...

Benchmarking the Full AI Hardware/Software Stack

Read More →