Inflation is on everyone’s minds, with consumer prices soaring by 7.9% versus a year ago, according to the most recent consumer price index (CPI), released March 10. While the mounting cost of raw materials may not be the culprit, enterprises are simultaneously watching the cost of data starting to rise as well.
What do I mean by the rising cost of data? I’m talking about the increasing enterprise spend associated with collecting, using, managing, storing, and securing data. Gartner predicts that more than half of enterprise IT spending will shift to the cloud by 2025, with more than $1.3 trillion moving to the cloud in 2022 alone.
First of all, data use is exploding. While structured data is growing quickly but relatively linearly, unstructured data is growing exponentially and we’re just beginning to harness it. We are only at the dawn of the age of IoT and our refrigerators, thermostats, and washing machines are already producing data, our cameras recognize faces, and the apps on our wearable devices and phones produce mountains of data. The amount of data generated by IoT devices is expected to reach 73.1 ZB (zettabytes) by 2025. We are also in the early days of 5G networking, artificial intelligence, and autonomous vehicles, which are adding to the explosion of data. Experts predict that a single autonomous taxi may produce from 60 to 450 terabytes a day.
Enterprises are spending billions to handle massive amounts of data. Some of that spend is necessary, effective, and even profit-driving – the virtuous side – for data that has a purpose. New technologies may generate data with marvelous benefits, but data has to go somewhere. A surprising proportion of enterprise data growth lacks cohesive governance, planning, and strategy, and as a result, many are paying more for data than they need to while spending more on governance and management. It isn’t just more devices accelerating data proliferation; it’s also human behavior.
Sources of Data Inflation
Data inflation ensues when spending on data rises without deriving proportional enterprise value from that spending. Surprisingly, digital transformation and application modernization have created fertile ground for data inflation to run rampant. As enterprises refactor applications and ever-expanding datasets aren’t managed carefully, enterprises experience data sprawl. Moving to the cloud to deliver more capability and use can inadvertently lead to data inflation.
Often, a dataset is helpful across multiple areas of a business. Different development groups or people with unrelated objectives might make numerous copies of the same data. They often change a dataset’s taxonomy or ontology for their software or business processes, making it harder for others to identify it as a duplicate. This occurs because the average data scientist trying to hone in on a particular data insight has different priorities than the data engineers responsible for pipelining that data and creating new features. And the typical IT person has little visibility into the use of the data at all. The result is that the enterprise pays for many extra copies without getting any new value – a core driver of data inflation.
A lack of long-term planning in cloud architecture can also lead to increased data inflation. Cloud migrations certainly take into account data gravity, placing applications and data as close to each other as possible. But as datasets become larger and larger, moving data around to various applications becomes more cumbersome and expensive. There is potential for massive amounts of data generated by cloud applications multiplying the capacity requirements, inflating costs and setting an enterprise up for sticker shock for egress charges.
Data egress fees are in fact a significant driver of data inflation and a common complaint by enterprise CIOs about the cloud. The primary public cloud providers – AWS, Microsoft, and Google – allow companies to move data into their shadows for free but charge data egress fees when data leaves a network and goes to an external location. While cloud providers usually don’t charge to transfer data into their clouds (“ingress”), they do often charge when your applications write data out to your network or whenever you repatriate data back to your on-premises environment.
It’s notable that as the cost of compute has fallen precipitously in the past 16 years, the cost of data transfer from cloud has “barely moved” relative to the underlying cost structure. It’s not that the cost to serve that data out hasn’t declined – it has. But whether it is to create a sort of moat against data leaving, or simply to maintain stellar margins, the egress costs have barely moved, relatively speaking.
Egress is certainly not the only sort of transfer fee around data. There are costs to move zone to zone, region to region, to even send between different networks in the same region (whether for organizational reasons or for partnership outside an organization). Apple reportedly spent $50 million in AWS data transfer charges in a single year, as much as 6.5% of their bill. Even after the substantial discounts for large enterprise spend, these costs can still add up.
Staving Off Data Inflation
Enterprises should be careful to use clouds with higher egress fees only for the workloads that genuinely require the capabilities of that specific cloud. Data egress fees can vary considerably as each cloud has its own egress fee structure. When planning multi-cloud data access, they need to consider not only how to minimize real-time latency and keep data secure but also how to find the most efficient ways to access their data from anywhere while minimizing fees. While the drive for low latency might lead some toward colocation data centers, a multi-cloud data service that feels much like any other cloud service can be substantially less expensive and less risky.
From the top down, enterprises must set policies for what information is saved, based on how the business uses them. Not every piece of data generated must be preserved. Prior to making tactical decisions on each dataset or based on application rollouts, it’s important to instill data governance and define data retention and acquisition policies carefully. Organizations should make certain that policies indicate where data is allowed to be stored, who is permitted to make copies, and the period for retention. This also requires a set of tools to adopt and mandate across the enterprise. This planning can extend into data governance to also define what is required in terms of schemas, and metadata, what sort of format will be used for data under what circumstances, and more.
According to Gartner, by 2024, two-thirds of organizations will use a multi-cloud strategy to reduce vendor dependency. One of Gartner’s largest cautions about multi-cloud, however, is complexity. Driving simplicity in a multi-cloud architecture is not an oxymoron, however. Leveraging a multi-cloud data service can mean a unified access methodology across clouds, a single governance and metadata schema, a single system for identity and access management. These are big challenges that the pejorative term “cloud sprawl” pokes at, but a multi-cloud data service can eliminate them with a single copy of data that is accessible from all clouds. This can have the added benefit of easing that pain of transfer costs, because if the data is on a platform that is cloud-adjacent and not charging egress, you can move it freely, while still benefiting from the proximity to cloud.
In summary, even as enterprises drive and benefit from massive data-driven innovation, datasets grow unwieldy and costly. As a result of huge demand from every business function for application modernization and rapid data insights, datasets are rarely centrally managed, resulting in duplication of costs and efforts, artificially inflating the price of data. To tackle data inflation, enterprises are not only emphasizing stronger cloud data governance but also using multi-cloud data services platforms to reduce cost and complexity.