Technologies are sometimes categorized as stateful or stateless. The terms can apply to applications or communication protocols, for example. A stateful application saves data generated by each client session and uses it the next time the client makes a request. A stateless application doesn’t save client data from one session to the next.
We can also refer to data as stateful or stateless. Traditionally we think of data as stateful: once it’s created, it always exists. But organizations are discovering that data increasingly needs to be stateless, for a couple of reasons.
First, data has a limited lifespan. That’s because some data has value only for a specific period of time, depending on its context. Take a fire alarm. To the fire commissioner, the data from a fire alarm has value for months or years, because it might reveal fire trends or other insights. But to the fire company, the data has value only for a matter of minutes.
Second, data is affected by its geolocation. For starters, data no longer resides in a single place, as portions of a dataset can be generated in multiple locations. In addition, whether data is useful often depends on whether or not it can be acted on.
These are more than academic questions. The agile nature of today’s data is creating significant challenges for how organizations source, aggregate, transform, analyze, and act on information. The good news is that technologies and strategies now exist to create a stateless data architecture that can optimize the value organizations realize from their increasingly agile data.
So Much Data, So Little Time
Managing agile data involves four key challenges. First is the massive volumes of data organizations must deal with today. Data feeds from business processes, customer inputs, edge sensors, and other sources have grown exponentially. And organizations have become wary of throwing any of it away, because they know it could one day yield valuable insights.
Next is the increasingly real-time nature of data. More of the data organizations generate must be captured, understood, and acted on almost immediately. An example is data that indicates a cyberattack, which is most valuable during the brief period when you can take action to prevent a data breach.
Third is the problem of latency. Data is often spread over vast geographies. How do you perform real-time analytics across feeds from San Francisco, Chicago, and New York? Do you transmit the data to a central location? Can you aggregate and analyze the data close to its sources?
The final challenge is deep analytics. Cyberattack data can later help you recognize anomalous activity to better protect systems in the future. But there’s also value in doing deep analytics on real-time data. For instance, real-time analysis of sensor data can help equipment or vehicles operate optimally in changing conditions.
Achieving a Stateless Data Architecture
To conquer the challenges of agile data, organizations will need to implement a stateless data architecture. Achieving that objective requires three key components:
1. Global data mesh. Organizations need to move away from databases, data warehouses, and data lakes. Instead, you need to think in terms of a global data network or data mesh. A global data mesh abstracts away the complexity of managing data and applications separately so that data becomes stateless.
A data mesh allows you to intelligently move data close to compute and analysis, or move compute and analysis close to data. That gives you great flexibility in how you perform data analysis or deploy applications.
But a data mesh isn’t about simply processing data at the edge. Rather, it enables you to process data anywhere and everywhere. It connects data across physical boundaries like data centers, clouds, or geographies, and across logical boundaries like company departments or supply chain partners.
A data mesh lets you simply tell the system you want to perform analytics on a set of data. The system will then find the data, aggregate it, and place the analytics as close to the data as possible. You no longer have to keep asking, “Where’s the data? Is it in the cloud? Is it at the edge? Has it been replicated?” That’s all handled by the data mesh.
Just as important, a data mesh makes your data fungible, because it can transform the data into an API. Once the data is an API, you can connect it to clouds, applications, dashboards, workflows, and more.
2. Distributed framework. A stateless data architecture also requires a distributed framework. Today, most organizations send data to the cloud, because that’s where their compute is. But to manage agile data, you need to run analytics, machine learning (ML), and artificial intelligence (AI) locally, where data has value.
A distributed framework involves a set of data services that are stateful and that take responsibility for managing state with consistency, accuracy guarantees, and time-bound replication guarantees. The data services abstract state away from the applications that process the data. That way, when you perform analytics on geographically dispersed datasets, the data appears to be stateless.
3. Global data protection and sharing. The final element of a stateless data architecture involves data protection and secure data sharing. Your organization likely manages data in jurisdictions with financial and privacy regulations that prevent you from removing certain information from its country of origin. But you can use metadata representations of that information. You can tokenize the data and remove the tokens from those jurisdictions.
In addition, you probably share data with various stakeholders. Increasingly, you’ll want to expose only portions of your datasets to each stakeholder. You can achieve that through confidential computing, which sets aside a section of a computer’s CPU as a secure enclave. Data in the enclave is encrypted in way that it remains encrypted even while the data is in use. So, you can permit stakeholders to query specific aspects of a dataset without seeing all aspects.
Use Cases as Limitless as Your Data
The technology to achieve a stateless data architecture is available today, and the use cases are virtually limitless. Here’s one example from public sector and critical infrastructure. California is grappling with a changing climate that’s resulting in more spikes in energy use. It’s also home to a growing fleet of electric vehicles that further strain an electric grid designed for smaller loads with fewer peaks.
Real-time analysis of streaming data can enable grid operators to follow and better predict electricity-use patterns in real time. They could map exactly when and where industrial, commercial, and residential customers are switching on air-conditioning, charging vehicles, turning on stoves, turning off lights, and so on. Operators of power plants, transmission lines, and distribution networks could then precisely match demand with supply, dynamically routing electricity in predictable quantities in real time.
Any enterprise that needs to capture and analyze agile data in multiple locations – organizations in financial services, manufacturing, retail, transportation, life sciences, public sector, and so on – can benefit from a stateless data architecture. You’re probably already imagining how your organization could too.