Zero-Copy Integration: How Small Data Practices Will Replace Big Data

By on
Read more about author Brian Platz.

The future of data is small.

As organizations grapple with ever-increasing amounts of data, the limits of the big data movement are becoming clear. Over the last two decades, big data has provided benefits in the form of cutting-edge software that made the generation, collection, and amalgamation of data widely available to organizations. These positive impacts are both wide-reaching and apparent from optimal route planning in aviation to fraud detection and risk management in the financial sector and even tracking infectious diseases on a federal level.

But today, data storage, cleaning and preparing, and structuring have begun to outstrip our abilities to glean what we’d like to from all this information. Big data can sometimes be too big for us to actually analyze and leverage at the pace of real-time business. 

Turning Big Data into Small Data

Perhaps the solution to this problem is small data. Small data is information that is more user-friendly and accessible and supplies measurable benefits. The goal for small data is to provide analysts with just the data they need, at the right time, for them to make the most well-informed and timely decisions.

There are a few different routes companies can take when looking to convert big data into small data. The most straightforward is to launch a company with a data-centric philosophy, built on the foundational understanding that data is as important as any other asset in the company.

What this means in practical terms is creating one set of data for each category required, and developing policies that force employees to extract what they need from the data and return to their respective departments with actionable, accurate information.

This may require organizational groups to reorganize the data they take from the central, “golden record” data. However, only a few specialists tasked with maintaining a dataset’s integrity are authorized to alter the organization’s primary sets of data.

Meanwhile, the rest of the world has inadvertently complicated their situations by copying datasets, altering those datasets and not maintaining the integrity of a “golden record” dataset.

Though copying and altering all these datasets originally helped organizations achieve whatever goal lay before them in the short term, the consequences today include siloed datasets that make it impossible for machines to communicate with and extract relevant information from these banks of data.

A data-centric architecture is built around operations purposefully revolving around the data. It also means that security and governance protocols may be inserted into the data itself, so it’s able to defend itself.

The unfortunate truth in today’s private and public sectors, however, is that the vast majority of companies and organizations are not in a position to abruptly shift to becoming data-centric. Those that do shift to this strategy benefit from the ability to grow and scale from the ground up.

The Zero-Copy Integration Solution

Optimally, problems that stem from duplicated datasets would be solved with zero-copy integration – the on-demand integration of data without having to copy or otherwise physically move it.

This process pulls data together without pasting it into data-storage units such as pools, lakes, and warehouses. This allows for federated queries across multiple datasets, where analysts can leverage golden records (sources of truth) without having to copy them over into another data silo. 

Zero-copy integration also allows for “data clean rooms,” where sensitive data from different sources can be compared and analyzed without ever revealing the actual data. This can be done using cryptography that does not share data yet still is able to analyze it and identify relevant bits for multiparty computation.

For example, perhaps an industry regulator might want to learn how many customers a number of companies have in common. The customers own the data and are able to adhere to privacy and compliance practices. But using cryptographic technology, the regular can get the answer to this without sharing the individual bits of information.

In recent years, companies across industries have spent tens of millions of dollars and manpower hours attempting to reorient their Data Management systems in ways that are more efficient and less error-prone and provide true insights. But the process is unavoidably slow and expensive.

Zero-copy integration capabilities soon will be among the primary types of fuel companies use to scale and remain competitive. Those who adopt the approach suddenly boast a market differentiator. Those who ignore the problem simply will be left behind, likely ceasing to operate.

But buy-in must occur at the executive-team level. Chief information officers understand zero-copy integration is the future. But they need their C-suite colleagues to share that vision.

Without an organization’s top leaders recognizing the need for this type of shift and supplying the resources to enable change, a smooth transition to new and improved systems will be impossible.

Bringing the Future into the Now

The financial costs of transforming to zero-copy integration Data Management systems likely still will deter many companies from taking the leap. Organizations are aware of the competitive advantage zero-copy integration offers, but if the cost exceeds the budget, the pace of change will be slow.

Innovators similar to those who were early adopters of the internet will be the drivers behind making zero-copy integration a reality. These are people with extremely strong motivations to share data and collaborate to bring about huge innovation leaps. 

Similarly, academic researchers – including those working with cancer data and other life-changing projects – would fall into this group as well alongside leaders in the big data movement. 

But similar to initial hesitation with the internet that followed with more widespread acceptance, time will tell how zero-copy integration and data-centric architectures will become a critical part of companies’ plans as they look to maintain a competitive edge. 

Financial technology companies already are using semantic graph technologies to implement zero-copy integration, and international supply chain companies have recognized the incentives for optimizing their operations by becoming data-centric.

Once the benefits gained by the early adopters of this strategy become obvious, zero-copy integration will shake up how business is done – just like big data did only a few short years ago.