Data Governance and Data Quality have been around for quite a long time, but there has recently been a renewed focus on these essential Data Management practiices. In a recent DATAVERSITY® interview, Harald Smith, Director of Product Management at Syncsort gave his perspective on this resurgence and where the future is heading for Data Governance and Data Quality.
According to their website, “Syncsort is a leader in solutions for Big Iron to Big Data.” Syncsort’s focus was in high performance sorting on mainframe computers said Smith:
“Evolving into high performance movement and transformation of data, and sorting is a key part of that. But really, looking at that larger issue of ‘how do I deal with moving data efficiently?’”
A little over a year ago, Syncsort acquired Trillium Software, which according to Smith, has now given Syncsort “a very broad Data Management portfolio,” and the acquisition of core legacy data has emerged as a central theme. Trillium’s origins were specifically in the field of Data Quality, with a focus on addressing the challenges of cleaning, standardizing, and de-duplicating core data.
With the acquisition and newly expanded portfolio “We’re now providing products to about 6,000 organizations globally. We’ve been very prominent in financial services, but also in areas such as retail and hospitality industries, where there’s a lot of focus” on finding insight from customer data, Smith said.
The Revival of Data Quality and Data Governance
Smith sees two recent primary drivers that have brought renewed interest into these core Data Management practices: regulatory compliance and a desire to compete better in the marketplace. Financial services have been dealing with regulatory compliance for much of the last decade since the financial crisis, but Data Privacy regulations in Europe – specifically, GDPR implementation – are having a significant impact throughout all industries, he said.
Although Smith doesn’t see a trend toward similar regulations in the US in 2018, “Any organization that’s dealing with things at a global level they have to address it. You don’t really want to be in a position where it’s just reactive.” he said. There are tools that can identify, monitor and if requested, delete types of information on demand, even if it’s in “places you’re not necessarily expecting it to be,” he remarked.
“You want that understanding of the data. So that’s certainly a strong driver for you to put in place tools that help you understand your Big Data environment and, really, across your entire data landscape, and also be able to monitor that on an ongoing basis. You know, with rules that might help you identify where that customer data is. So that’s the obvious driver.”
Keith Kohl, VP of Product Management at Syncsort, also shared some predictions to go along with Smith’s interview. He agrees that regulation and privacy are driving renewed interest in Data Governance and Data Quality.
“As organizations get more sophisticated in how they use Big Data, it’s not just a sandbox anymore where anybody can join in – it’s real workload that needs governance controls around it. It’s mandatory: there are processes that need to be put into place, people that need to execute them, and the technology itself has to support the required controls and audits.”
A less obvious driver, although one Smith hears from an increasing number of customers, is about competing better. “I want to drive more revenue in my business. I want to understand how I can do things more efficiently, more effectively, and that means I need to be able to work with data I can trust that has the right content.”
He shared an example of a company doing online sales wanting to understand where their customers in the UK were by looking at IP addresses and attempting to map them out:
“There are some requests coming from up north, some from the southwest, but a lot of those are driven by poor geo-location information, based on a lot of centralized information that hasn’t been standardized, and not really verified in terms of the address-based content. And when you apply some quality tools into that picture, it changes the entire landscape view.”
Suddenly the company discovers that a majority of their customers and queries are coming from London and the immediate counties around it, requiring a change in focus for their marketing efforts.
The Intersection of Data Quality and Data Governance
Smith sees a trend toward a growing understanding of the role of Data Quality in business success:
“People are beginning to make the connection that good business decisions require good quality data, and that really changes the equation in terms of why they want Data Quality in place. Then you can begin to say, ‘How do I get to that point?’ Well, I need Data Governance processes in play so that I can monitor this, I can measure this, and I can really track this closely.”
Data Governance creates a culture around Data Quality so that “all lines of business understand how critical” Data Quality is to informed business decisions. Kohl added,
“You can’t do any of that unless you have proper Data Governance policies – including Data Quality controls and monitoring – in place, and that’s going to continue to be a pressing issue in 2018.”
Data Quality and Data Governance: No Longer Optional
“Data Governance and Data Quality have always been important,” said Smith, and that was true even 20 years ago. “There wasn’t as much data, but the fact is that it doesn’t matter how much Big Data you have. If you’re not governing it and it doesn’t have quality, it’s not going to do you any good.”
Smith stressed that foundational Data Management practices, concepts, and technologies aren’t going anywhere, and if anything, “They’re even becoming more prominent. To be able to do Data Science with these data sets, you have to know what you’re doing or else you’re just going to waste huge amounts of time and money.” Not understanding where the data comes from, or if it’s even appropriate for the intended purpose has a cost impact. Add to that the exponentially increasing volume of data from an expanding array of sources.
“Coming in faster and faster, and you can’t do that at a human scale. So, you have to start applying a quality monitoring approach – a measurement approach – which is all part of what you would emphasize from a Data Governance process,” he said.
Smith said there have been recent studies showing that many Data Scientists spend close to 80 percent of their time just finding and preparing data. Considering the personnel costs, he said, that’s not a good investment.
“You’re paying them to really work with your data and develop new insights, and new models that can help you assess different hypotheses about your customers or your industry, but that’s not what they’re spending their time doing.”
There is still room for experimentation, but repeatable procedures created from Data Governance policies have to be in place.
“You still have to be able to do it in a way that provides a rigor for the next person and say, ‘Well, this is where I got this data source.’ I think we’re going to see extensive evolution of that over the next five years or so as industries wrestle with these concepts and try to manage those volumes. That’s where the practices and tooling around Data Quality and Data Governance are really going to evolve.”
The increased need to stay competitive and a growing awareness of the role technologies like Analytics and Machine Learning can have on success are driving a change in culture, Smith said. “I think one of the other key factors is how to shift an entire organization to have literacy around data – to understand what data is.” How do organizations work with that? What are some of the tools that are available to enterprises to help with that process? It’s not enough to just put the tools in place. Organizations have to do it in a way that people can understand what they can do once the tools are in place.
Smith believes this shift in culture is a critical thing for organizations as they wrestle with of the volume and velocity of change.
Underpinning New Technologies: Data Governance and Data Quality
Kohl sees a movement toward getting access to information more quickly, and organizations may be concerned that they aren’t evolving quickly enough. “If you don’t have real-time access to your analytics, it’s not too late, but you need to start now.” He predicts an expansion of the use of Machine Learning beyond Analytics.
“People are increasingly understanding that AI can be applied to everything in their lives, from making their jobs easier to helping inform decisions. Syncsort predicts that Machine Learning and AI will be much more prevalent in 2018 across all kinds of technology, from products to analytics to Data Quality and Governance. It applies to everything.”
Smith also predicts that this increased use of new technologies will facilitate “better practices and tooling around Data Quality and Data Governance” over the next five years. As volumes of data arrive from varying sources, Data Governance policies must be in place so that “everything is marked and recorded in a way that you can understand the contents, where it came from, and how complete it is, so that you’re not [making] biased business decisions.”
Facilitating wise business decisions based on good data, said Smith: “That’s where they are really going to evolve.”
Photo Credit: LeoWolfert/Shutterstock.com