Data Risk in 2018: Data Management and Data Integrity Necessities

By on

Click to learn more about author John Felahi.

Once again, businesses sat up and took notice of a number massive consumer information leaks over the past year. The data breach headlines of 2017 were led by Yahoo, Uber, Arby’s, and Equifax, to name just a few.

These events exposed various levels of personal and identifiable information (PII) with millions of consumers being exposed, along with their analytics-backed consumer behaviors and interests, such as: book buying habits, financial spending behaviors, and gourmet cooking interests. This bonus information is typically what fraudsters have to research in order to obtain, therefore upping their odds of winning the trust of their intended victims.

These events spotlight the obligations of corporations to protect consumers’ privacy and security – especially in the modern era of Big Data and analytics-driven businesses.

The cracks appear when businesses think of managing Big Data as a sprint instead of a marathon. Big Data moving through a modern enterprise has a long, complex journey from the moment it’s produced or acquired, through interim preparation or storage stops, to its final resting place for consumption by business users, analysts, and Data Scientists. Everywhere along this path Data Security, Data Governance, and enterprise-grade Data Management practices are essential.

Data Preparation: Where Data Comes Together – or Apart.

Stand-alone Data Preparation Tools or “Wranglers” is a crowded category that a number of organizations fall under. They provide a necessary function and series of benefits for data which include Data Visualization, Predictive Analytics, Advanced Modeling, data transformations, aggregations, and more.  These capabilities are great against data that is already clean, well-organized, and governed.

However, there are many steps in the data journey not addressed by Wranglers, and it is here where enterprise-for-scale providers that manage data throughout its entire lifecycle – stand out. Why? Because every step matters.

Data Integrity Begins at Ingest

Attention to data details on ingest through automated data validation and profiling covers a spectrum of critical checkpoints that Wranglers take for granted, yet benefit from, with their last-mile toolset.  Enterprise-for-scale providers check for data errors, incorrect formatting, and other idiosyncrasies common in mainframe and legacy Big Data sources up front.

And further Data Preparation and data safeguarding needs to continue where Wranglers do not – under a single consolidated catalog of all data as described in this report by 451 Research.  This is new territory for Wranglers because their tools were not built for cataloging data and Data Governance – they were built for data manipulation. This is a realization among the Wranglers now and why you see them attempting to move further down from the first mile in the data path to ingestion, orchestration, preparation, governance, and exploration of data in a variety of modern and traditional repositories.

Enterprise-grade, automated Data Management has proven to reduce blind spots in Data Security. And this can be achieved without compromise to business agility and productivity, providing secure self-service access to data on step in the journey.

Without doubt, breaches of this magnitude will happen again and it’s not all on the shoulders of the Wranglers.  Company data delivery teams, charged with empowering a growing army of Data Scientists and Business Analysts with expanded access to Big Data, need to expand their thinking and awareness to consider the whole data marathon, not just the final mile.


Photo Credit: Podium Data


Leave a Reply