You are here:  Home  >  Data Education  >  Big Data News, Articles, & Education  >  Big Data Blogs  >  Current Article

Data Risk in 2018: Data Management and Data Integrity Necessities

By   /  February 6, 2018  /  No Comments

Click to learn more about author John Felahi.

Once again, businesses sat up and took notice of a number massive consumer information leaks over the past year. The data breach headlines of 2017 were led by Yahoo, Uber, Arby’s, and Equifax, to name just a few.

These events exposed various levels of personal and identifiable information (PII) with millions of consumers being exposed, along with their analytics-backed consumer behaviors and interests, such as: book buying habits, financial spending behaviors, and gourmet cooking interests. This bonus information is typically what fraudsters have to research in order to obtain, therefore upping their odds of winning the trust of their intended victims.

These events spotlight the obligations of corporations to protect consumers’ privacy and security – especially in the modern era of Big Data and analytics-driven businesses.

The cracks appear when businesses think of managing Big Data as a sprint instead of a marathon. Big Data moving through a modern enterprise has a long, complex journey from the moment it’s produced or acquired, through interim preparation or storage stops, to its final resting place for consumption by business users, analysts, and Data Scientists. Everywhere along this path Data Security, Data Governance, and enterprise-grade Data Management practices are essential.

Data Preparation: Where Data Comes Together – or Apart.

Stand-alone Data Preparation Tools or “Wranglers” is a crowded category that a number of organizations fall under. They provide a necessary function and series of benefits for data which include Data Visualization, Predictive Analytics, Advanced Modeling, data transformations, aggregations, and more.  These capabilities are great against data that is already clean, well-organized, and governed.

However, there are many steps in the data journey not addressed by Wranglers, and it is here where enterprise-for-scale providers that manage data throughout its entire lifecycle – stand out. Why? Because every step matters.

Data Integrity Begins at Ingest

Attention to data details on ingest through automated data validation and profiling covers a spectrum of critical checkpoints that Wranglers take for granted, yet benefit from, with their last-mile toolset.  Enterprise-for-scale providers check for data errors, incorrect formatting, and other idiosyncrasies common in mainframe and legacy Big Data sources up front.

And further Data Preparation and data safeguarding needs to continue where Wranglers do not – under a single consolidated catalog of all data as described in this report by 451 Research.  This is new territory for Wranglers because their tools were not built for cataloging data and Data Governance – they were built for data manipulation. This is a realization among the Wranglers now and why you see them attempting to move further down from the first mile in the data path to ingestion, orchestration, preparation, governance, and exploration of data in a variety of modern and traditional repositories.

Enterprise-grade, automated Data Management has proven to reduce blind spots in Data Security. And this can be achieved without compromise to business agility and productivity, providing secure self-service access to data on step in the journey.

Without doubt, breaches of this magnitude will happen again and it’s not all on the shoulders of the Wranglers.  Company data delivery teams, charged with empowering a growing army of Data Scientists and Business Analysts with expanded access to Big Data, need to expand their thinking and awareness to consider the whole data marathon, not just the final mile.


Photo Credit: Podium Data


About the author

John Felahi is a product and strategy executive with more than 25 years’ experience in high-growth organizations. His tenure in high growth technology companies makes him unique in his ability to analyze consumer preferences and needs, anticipate trends, and develop market-leading products and solutions. Prior to joining Podium, John founded JGF Strategies, a technology consulting firm focused primarily in the areas of cognitive computing, cloud transitions, information access, search and machine learning-based solutions. He has served in executive and senior leadership positions at Microsoft, Fast Search & Transfer (now Oracle), Sun Microsystems (now Oracle) and other companies managing product strategy and/or management, engineering and marketing teams. John was a member of the honors program and holds a BA in Economics and Philosophy from Boston College.

You might also like...

Thinking Inside the Box: How to Audit an AI

Read More →