Click here to learn more about Ben Kelley.
Back in 2006, UK mathematician Clive Humby was the first to coin the phrase, “Data is the new oil.” While the analogy has been controversial to some, the statement foretold how business has evolved in the last decade. Today, companies in all sectors rely on customer data to augment or otherwise enable their business. Whether your company is a merchant collecting billing information from customers or a service provider logging usage of your platform, data aggregation is becoming a standard practice. While the rise in customer data collection has created new opportunities for business, it has also introduced new risks that must be considered and mitigated where possible.
JOIN US AT THE DATA GOVERNANCE & INFORMATION QUALITY CONFERENCE
Learn from dozens of real-world case studies, tutorials, seminars, and more – June 6-10, 2022, in San Diego.
Customer information is both an asset and a liability. As more consumer data is collected for business purposes, more attention is being paid to the enforcement of standards for storage, transmission, and retention. Laws such as the General Data Protection Regulation (GDPR) in the European Union and the California Consumer Protection Act (CCPA) in the United States outline rules for handling the data for customers in those regions, as well as punishments for failure to handle that data appropriately. Beyond legal ramifications, the loss or misuse of personally identifiable information (PII) can also cause irreparable damage to the trust relationship between a company and its customers.
When designing a system that will collect and use customer data, it pays to plan for the security of that collection and retention during the design phase. The following guiding principles should help ensure security and privacy are fundamental aspects of your system design.
1. Understand your data sources
Not all data is created equal, and therefore all data cannot be uniformly protected. The billing information collected from customers at an eCommerce vendor is more sensitive than the connection data in the web server logs, and hence the corresponding data protection methods aren’t going to be identical for those sources.
Understanding your data sources via proper data categorization helps you build a model for security that is appropriate for the kind of data you are protecting.
This categorization process helps you to identify the right “bucket” for your data to live in. Some examples may be:
- Customer Data: data about a customer that is collected during a transaction.
- Proprietary Data: data purchased or otherwise obtained through partnerships with other businesses.
- Technical Data: data derived from logging, usage monitoring, etc.
Appropriately categorizing your data can simplify the process of deciding how that data can be stored, how it should be protected, and how long it should be retained.
2. Be purpose-driven
Data collection should be driven by a clear business purpose. That purpose could be as trivial as gathering statistics on the kinds of web browsers used to visit your company homepage or as regulated as processing billing information after a purchase. Whatever the case may be, that purpose will help guide decisions about how that data is stored and protected.
Clearly defining your purpose for collection and retention of data also helps determine the appropriate timelines for purging information that is no longer needed. Data does not stay relevant forever, but the storage of data does come at a cost. Data kept beyond its usefulness not only costs the company money to retain but also increases their liability should a breach happen. When scoping a data retention policy, it is important to determine the purpose that data is serving and retain it only long enough to serve that purpose.
In addition to data retention considerations, being purpose-driven in your data collection helps scope which teams will require access to that information. Employing role-based access control — or RBAC — ensures that only employees with a legitimate business need can access the data. Actively controlling access by job role, rather than by the individual, eases the burden of access control management while ensuring that an individual can only access those systems that are within their scope of work
3. Security is a feature, not an afterthought
You wouldn’t build a bank vault by allowing people to deposit money and then figuring out ways to keep that money safe. You’d want to have a plan for keeping that money safe before it was your responsibility to protect it. The same principle applies for data storage; building in security from the ground up helps ensure that potential gaps are addressed prior to implementation.
As a general rule, steps should be taken to ensure that customer data is protected both “in transit” (the process of collecting the information from the source) and “at rest” (the final storage point for the data). The extent to which you need to protect that data, the ways in which you implement that protection, and the length of time you retain that data should all be purposefully designed based on your data categorization and use case.
Data Protection vs. Business Value
Once you have developed an understanding of the data you are collecting and the requirements around its collection, you can start to define how you plan on implementing the necessary protections. This is where being purpose-driven will help, as the way you protect data depends on the business need that data fulfills. Just as inadequately protecting data can lead to data theft or spillage, overly securing data can also inhibit the usability of that source for your business.
The solution you ultimately choose to implement should balance security and usability. For instance, while one-way hashing is great for security, you sacrifice any ability to get back to the original plaintext data. Depending on what you intend to do with that data, this may or may not be a feasible solution. Likewise, the usage of SHA-512 results in the generation of a hash that is 512 bits long. If your input data was very small before processing, this might result in a much larger amount of data being stored. Each data use case is unique, and the methods of protection must weigh the balance of keeping the information safe with keeping the information usable.
Companies should also consider their policy for data retention. While it can be tempting to retain records indefinitely, that information will likely grow stale, and its value will diminish with time. In order to both keep data relevant and to keep the scope of data you are protecting reasonable, companies should consider aging off data that is no longer needed to fulfill a specific business purpose.
This is another process that can go more smoothly with the help of the work you did in the framing requirements phase. Understanding your data and being purposeful in why you are collecting it are the key factors in determining a reasonable retention policy. You may only need to retain the data for a short time (such as technical logs used for debugging purposes), or you may need to retain it in anticipation of a future event (such as billing records until the next tax season). Whatever the case may be, the goal should be to keep the data for the minimal time necessary to fulfill its reason for collection.
While legal regulations, such as the GDPR, do not provide specifics for how long different types of information can be retained, they do expect organizations to be able to prove what purpose that data serves.
Data Redundancy and Availability
While less related to the privacy of data storage, data redundancy and availability should also be considered a part of data security. Just as organizations should take steps to mitigate the risk of data theft or spillage, they should also consider the risk of data loss in the event of a catastrophic failure. If the server hosting your customer transaction database failed tomorrow, would you be able to recover that data? How long would it take to transition that functionality to a new server and get back to business as normal?
Fortunately, modern cloud providers make it simpler than ever to ensure data redundancy. We’ve engineered our platform to leverage Amazon S3, which provides us with both ease of access and inherent redundancy across AWS availability zones.
Whether you use AWS, Azure, other cloud providers, or an on-premise solution, taking steps to ensure the redundancy and availability of your data is critical for business continuity.
In today’s world, data security is more of a marker of good business operations versus an optional add-on. It’s an ongoing commitment to protect your customers and partners both now and in the future. The protection and handling of customer data should be a critical keystone at the base of every business.