Where Does the Data Live?

Click to learn more about author William Peterson.

Where does the data live? A simple question that can have a very complex answer. But the reality may be that with a data fabric the answer doesn’t matter. Why? As we pointed out in the first blog in this series:

“The next generation of Data Management will effectively handle the diversity of data types, data access, and ecosystem tools needed to manage data as an enterprise resource regardless of the underlying infrastructure and location.”

The notion here is if we can abstract away the complexity of determining where data is located and how to access data, we can then manage data across silos, workloads, tools, and applications using a data fabric without ever really knowing where exactly the data lives.

Let’s look at just a few of the components we must get right in order for a data fabric to provide this abstraction layer to give enterprises a consistent approach to manage, secure, govern, and protect data.

Linear Scalability Without Limits

The term scalability takes on many meanings and is often confused with performance. To be clear, here we are talking about within a data fabric being able to maintain performance levels under an increasing load. For example, in a fraud detection scenario where a sub 10 millisecond response time is the SLA, the data fabric must be able to maintain that SLA while the user/bandwidth/performance/storage load increases. And by the way, the data fabric should be able to scale linearly to support across the three location components: edge, on-premises, and cloud.

Architected to Scale, Performance, and Consistency to Simplify Development and Management

Storage tiering, snapshots, mirroring, load balancing, and multi-tenancy support are the key set of data services needed to ensure exabyte scale and high performance while providing data protection, disaster recovery, security, and management services for disparate data types within the data fabric. The data fabric must also support open APIs and containerization to ensure broad distributed application access and seamless portability of applications across disparate environments.

Data and Metadata Are Distributed to Eliminate Bottlenecks and Points of Failure

The data fabric must ensure that the complexity of data is handled across locations and hardware infrastructures, from on-premises to the cloud to the edge, as well as containers. Achieving this allows the data and metadata to be distributed ensuring that if one physical location in the infrastructure (e.g., an edge node) goes down, the application/workload can still execute within the other infrastructure components in the data fabric.

What’s Next?

The next entry for this blog series will answer the question: What do I know about the data I have?

LISTEN NOW: MY CAREER IN DATA PODCAST

Data Topics

Leave a Reply Cancel reply