WANT TO IMPROVE YOUR ORGANIZATION’S DATA QUALITY?
Learn how to get started and leverage a multitude of Data Quality principles and practices with our online courses.
Click to learn more about author Glen Martin.
Although this May’s arrival of the European Union’s General Data Protection Regulation (GDPR) has been generating much of the buzz around compliance issues recently, the task of staying in line with myriad legal and industrial regulations is a 365-day-a-year issue that goes far beyond Europe. These days, it touches nearly every organization’s digital footprint.
Whether performing a routine audit or investigating a violation or other incident, regulators want access to relevant information from the companies they cover on a timely basis. And most regulators aren’t known for their patience. Deadlines for delivering compliance-related information may be as short as 48 hours.
Do you know where or how to find the information you’ll need to deliver?
Classification at Scale
A robust Data Catalog can find and classify information at scale. For example, it can infer from examples that a record containing the term “Rhode Island” would probably be of interest to regulators in the Ocean State. It can then apply that metadata tag to any record or file that mentions Rhode Island. Filters can also be applied to screen out information that regulators shouldn’t see.
A Data Catalog can do intelligent searching and tagging at a volume that would be impossible for humans to match. For GDPR purposes, it can scan through historical records to find mentions of cities, countries, phone numbers, ID numbers, and so on that designate EU residents. Using machine learning, it can infer that Bydgoszcz is in Poland, and tag residents of that city as European. Using techniques like these it can find EU-resident people, and the data stored about those people, which lies at the heart of GDPR’s privacy protection objectives.
Equally important is that a Data Catalog can flag information that shouldn’t be kept. For example, many regulations specify expiration dates for customer records. Keeping those records beyond the mandated thresholds not only exposes an organization to unnecessary liability, but can even result in fines. Using metadata tags enables organizations to more easily manage the lifecycle of data they collect and retain.
The creators of privacy-focused regulations are drawing stricter lines about what information can be kept for legitimate business purposes. The rules aren’t always clear-cut. For example, under GDPR a retailer may collect a customer’s email and mailing address for delivery of a purchased product and for follow-up communications related to that product, but without explicit consent may not use that information for general marketing purposes. In other cases, personal data may be used for research or market segmentation purposes, but may not be tied to the identity of an individual.
A Data Catalog can inform, and help prevent unintended misuse of personal identifiable information, as well as provide documentation in case of an audit. Intelligent tagging applies clear definitions for such factors as how information can lawfully be used, whether or not customers consented to that use, and when the organization’s legal right to use that information expires. In short, a Data Catalog provides a means to label data in a way that is consistent with relevant regulations.
International Data Corp. expects the volume of data that organizations create and store annually will grow tenfold by 2025. No one expects regulatory deadlines to adjust in kind. The only realistic way to get a handle on that volume of data in the time that auditors and enforcers want it is by using a Data Catalog that scans and classifies data automatically.