Click to learn more about authors Andrew Sohn and Avi Kalderon.
There’s a lot of talk in the data community that data needs to be managed as a corporate asset and treated like other assets in a corporation’s portfolio. Assets a corporation typically manages include product parts, real-estate, human capital, furniture, computer equipment, and finished good for sale.
There are common principles used to manage each of these types of assets. An inventory list is created and maintained that associates each item with important attributes such as location, age, how it is used, financial information, and supplier information. As an item gets logged into the inventory system, these critical attributes are captured and subsequently they are updated at each step in their respective supply chain.
The assets are typically managed in place. Product parts or finished products are managed in their designated warehouse locations. You would not ship all of them to a central location just to pick the desired items and then act on them. The inventory system has a holistic view of all the assets and information about them. It identifies the location(s) of the required items, sends instructions to the warehouse to send only those items of interest, and they move those items through the next steps of the process. And it’s not practical to move buildings or people around to perform reporting or other actions on those assets.
Most importantly, at most corporations, assets are the responsibility of individuals to manage. An asset manager, warehouse manager, parts manager or another is put in charge of all processes and procedures required to manage those assets and is ultimately responsible for the overall inventory both count and value.
To be effective in managing data as a corporate asset, you must follow the same type of procedures as you would with other assets. One the most critical components is to have an inventory of the corporate data, including sufficient metadata. Not surprising to data professionals, while a CDO or equivalent is an emerging trend with a mixed success ratio in the more mature enterprises, most companies do not have a person whom they hold accountable for the comprehensive accounting of their data assets. A few have a program in place to create and maintain inventory of their data assets in a manner that can be utilized effectively. Addressing that gets into to the practice of data governance, which is beyond the scope of this article.
Once a sufficient subset of usable inventory is available, this gives you the ability to act upon it, either passively – for reporting and analyses purposes – or actively, creating competitive advantage or supplying it to downstream consumers within and outside of the organization. Given the unique nature of data compared to other assets as well as past technology limitations, the practice of acting on data has differed from that of the other assets. This practice introduces a large set of additional complexity and risk.
Data, unlike physical assets, can and often is duplicated and moved from its primary origination system to other locations. It’s not uncommon for data to be duplicated many times until it is being consumed. In a well-managed data ecosystem, each duplication and move requires complex and time consuming systems to ensure the authenticity and accuracy of the replicated data. In companies where data controls are not as mature, it’s almost guaranteed that each movement will degrade data quality and introduce trust issues which significantly adds effort and costs to resolve.
To address this problem, the maturing technology stack known as Data Virtualization can be very useful. Data Virtualization’s main value proposition is that it allows a company to view data that is housed across several different locations to appear as a single, integrated source. The minimum amount of data is moved, and any replicated data remains under the management of the data virtualization system. (This scenario is also referred to as a logical data warehouse).
Having this holistic view of data important to the business processes and value generation of the company is a powerful tool. This makes it possible to identify and put the necessary controls on the critical data elements of the organization at a centralized location which includes visibility and transparency to the available inventory. It also makes it easier for users of the data to find and get access to this data.
Modern Data Virtualization (DV) platforms also contain other capabilities necessary to ensure data adheres to enterprise-wide standards and policies. By provisioning data through a DV layer, a company can ensure that a consumer only gets the data from the approved source for a request; only gets access to authorized data; does not receive personally identifiable information if not required; and a host of other functions.
Data Virtualization is not a magic bullet and may need to be combined with other tools to meet your organization’s data management requirements. There are many factors to consider about where to apply Data Virtualization — performance, latency, and impact on productions system are some of the are common issues. While Data Virtualization toolsets have capabilities to do simple data transformation and data quality processing, it is common to see them implemented in tandem with more sophisticated Master Data Management and ETL tools.
Managing data as a corporate asset requires an organization to take its best practices for other assets and apply them towards data. Besides a robust set of operational practices and procedures, it also requires a sophisticated toolset. Data Virtualization is one you should explore.