Cloud-Based Data Management Platforms in the Age of Privacy

By on

Click to learn more about author Keith D. Foote.

Currently, many businesses are using public clouds to do their Data Management. Data Management platforms (DMPs) started becoming popular during the late 1990s and the early 2000s. At that time, the internet services offered by Amazon and Salesforce appealed to a large number of businesses because it reduced their in-house maintenance costs while supporting the additional flexibility needed for a constantly evolving business environment.

Also, the use of cloud-supported AI has made storing and processing massive amounts of data much more efficient. It is important to recognize Data Management platforms rely significantly on cookie technology in identifying customer behaviors. Steve Touw, CTO and Co-Founder of Immuta, said this about using cloud-based Data Management platforms:

“The cloud enables organizations to store and process large amounts of information at a dramatically lower cost than they could in their own data centers, transforming business and IT operations. Today’s cloud-driven teams are leaner, faster, safer, and more productive. As companies continue to amass more data, they’re now turning to the cloud not just for data storage but for new ways to process, compute, and analyze data. The days of deploying on-premise big data infrastructure — which can take years to budget for, acquire, build, and optimize — are numbered. Modern cloud data and analytics platforms provide data teams with scalability, flexibility, cost savings, and performance, enabling organizations to do more with data to make faster decisions, share and monetize data, and build data-driven products.”

However, concerns about new data privacy regulations, such as Europe’s GDPR (General Data Protection Regulation) and the CCPA (California Consumer Privacy Act ), and their impact, are growing. For example, Safari, Firefox, and Google Chrome have announced they will phase out their use of third-party cookies by the year 2022. Data taken from cookies about individuals is often collected from all over the internet. This data builds psychological profiles and can be used to entice online shoppers into purchasing something they want. Unfortunately, these profiles can also be used to target and manipulate people by promoting misinformation (or lying). Fortunately, many public clouds have taken steps to balance privacy with understanding the consumer base. It’s important to research clouds with solutions for replacing the data which was previously supplied by cookies.

Data Management in the Cloud

Essentially, everything a customer does on a business’ website or mobile app creates data. This data can then be used to develop marketing strategies and increase the customer base. Merging a customer’s personal data with the purchases they made with other online organizations can be quite useful. This kind of data provides a more thorough understanding of individual customers and helps to understand different segments and how their shopping behavior varies. To accomplish this, cloud supported Data Management platforms support the following processes:

  • Gathered data is stored in one place
  • Third-party data helps in discovering new markets
  • Gaining customer insights
  • Helping to effectively budget marketing expenses

A Data Management platform stores digital data, such as customer data (taken from cookie IDs or mobile identifiers) and marketing campaign data. This kind of information can help marketers and advertisers find patterns of behavior for the customer base as a whole and for individual customers. These patterns of behavior include demographic data, previous browsing behaviors, the devices customers use, and more.

Immuta’s Steve Touw went on to discuss security concerns about DMPs in the cloud, saying:

“As companies increasingly move to the cloud, the biggest issue Data Management teams face is how to automate and scale cloud data access without compromising security and privacy protection. The blessing (and curse) of the cloud is that it stores all data centrally, usually in a cloud data lake or data warehouse. While this brings massive cost savings, it also potentially exposes companies to material legal risk and contractual violations if data is hacked, breached, or gets into the wrong hands. Simply put, the cloud makes it much more difficult to control enterprise-wide data access, particularly when companies are adopting multiple cloud data and analytics platforms such as Snowflake, Databricks, Azure Synapse, and others. Immuta’s recent Data Engineering Survey revealed that the majority of companies plan to adopt multiple cloud compute technologies within the next two years. While these platforms bring massive power at lower cost, they require data teams to adopt new access control and governance solutions that bridge security, legal, compliance, and business teams to ensure timely access to critical business data while mitigating risk.”

The Different Sources and Types of Data

Data comes from a variety of sources and is categorized in different ways. How data is used depends on what kind of data it is. Data Management platforms generally get their data from three types of sources:

  • First-Party Data: This is data that is collected and owned by the organization itself. For example, the organization may collect website data, mobile phone application data, and customer relationship management data (Salesforce uses this to manage an organization’s interactions and relationships with their customers and potential customers).
  • Second-Party Data: This data has been collected by another business and then sold. It includes online campaign data, as well as customer journey data.
  • Third-Party Data (Cookies): Data that is delivered by data aggregators — it is data purchased from sources that aren’t the original collectors. Large data aggregators pull data from a variety of other platforms and websites. These aggregators purchase data from publishers and other data sources.

From these first, second, and third sources, three main types of data are collected and downloaded into Data Management platforms. They are:

  • Observed Data: Generally speaking, this is the digital footprint left by internet users, such as a search history (say for winter boots) or the type of web browser being used.
  • Inferred Data: This type of data is based on conclusions reached by a user’s internet behavior, and to a large degree, it is guess work (and can be irritating when the guess is wrong).
  • Declared Data: This data is provided (originally) by the users filling out online forms or applications.

Data Management Platforms and Privacy Laws

DMPs have helped digital marketers find new customers using third-party data. The GDPR and CCPA, on the other hand, make it harder to obtain third-party data. Previously, Data Management programs processed third-party data, and existing laws didn’t require the user’s consent in gathering the data. The GDPR, however, states personal data — including the data collected by cookies — can be used only with the user’s consent. This means gathering third-party data will become harder for businesses and may even become illegal. As a result, DPMs will have to rely more on first- and second-party data.

Immuta has developed an alternative solution to the issue of privacy laws and the loss of cookies through the use of metadata and external catalogs combined with AI and machine learning to offer an integrated and balanced solution. Their de-identification and auditing features replace manual processes, empowering researchers to deliver valuable data analytics quickly and efficiently. Steve Touw commented:

“2020 was a significant year for Immuta, as we made significant product investments in bringing our automated Data Governance platform to the most popular cloud data platforms. We recently announced native support for Databricks, Snowflake, and Starburst, which, according to our survey, are among the most heavily adopted platforms. For companies with two or more of these platforms, Immuta provides a single solution to automate cross-platform data access control, discovery and classification, and privacy protection — significantly improving productivity, unlocking more data for more data consumers, and minimizing the risk of data leaks or breaches.”

The Future of DMPs

As businesses shift their focus to first- and second-party data and direct relationships with consumers, the usefulness of third-party cookies comes into question. Third-party data will be based on aggregated and de-identified reports. As a consequence, Data Management platforms will be adjusted and modified to work with first-party, consented data, and somewhat limited second-party data.

These new DMPs will be designed to capture, manage, and securely matchup consented personally identifiable information (or PII). The amount of first-party data will increase significantly, and the ability to securely matchup first-party personally identifiable information and second-party data will improve significantly. With these changes, the need for data techs will increase. Steve Touw predicts:

“In 2021, we anticipate an acceleration in the shift to the cloud and the emergence of  ‘data as product,’ with DataOps emerging as a new, crucial function and discipline for cloud Data Management, availability, quality, governance, and security. While today, the number of DataOps professionals counts in the thousands, we anticipate an explosion in this field — analogous to what occurred with DevOps for cloud app deployment and management. DataOps teams will bring the same agile principles to managing data as a critical asset for organizations, whether it’s used for BI, Data Science, or to power data-driven products and user experiences. As DataOps emerges, we’ll see a significant rise in demand for data engineers, data architects, data platform owners, and other data professionals — those who can manage data across multiple cloud data platforms while protecting and controlling access to sensitive information.”

Leave a Reply