Every public and private sector organization has a demand for information. The business information demand is an organization’s continuously increasing, constantly changing, need for current, accurate, integrated information, often on short notice or very short notice, to support its business activities. It’s a very dynamic demand for information to support a business that constantly changes. The problem in most organizations is that the business information demand is not being met due to a disparate data resource.
Disparate data are data that are essentially not alike, or are distinctly different in kind, quality, or character. They are unequal and cannot be readily integrated to meet the business information demand. They are low quality, defective, discordant, ambiguous, heterogeneous data. A disparate data resource is a data resource that is substantially composed of disparate data that are dis-integrated and not subject oriented. It’s in a state of disarray, where the low quality does not, and cannot, adequately support an organization’s business information demand.
Comparate data are data that are alike in kind, quality, and character, and are without defect. They are concordant, homogeneous, nearly flawless, nearly perfect, high-quality data that are easily understood and readily integrated. A comparate data resource is a data resource composed of comparate data that adequately support the current and future business information demand. It’s an integrated, subject oriented, business driven data resource that is the official record of reference for the organization’s business.
How does an organization go about resolving their disparate data resource and creating a comparate data resource? The answer is formal data resource integration.
Data resource integration is the thorough understanding of existing disparate data within a common data architecture, the designation of preferred data, and the development of a comparate data resource based on those preferred data. The data resource integration concept is to resolve the disparate data and produce a comparate data resource that meets the current and future business information demand. An integrated data resource is a data resource where all data are integrated within a common context and are appropriately deployed for maximum use supporting the current and future business information demand.
The problem is that most data integration approaches are temporarily integrating data for a specific purpose, but are not permanently integrating the organization’s data resource. Data integration is the merging of data from multiple disparate sources, usually based on some record of reference, to provide a single output, such as an interim database or report. It does not resolve existing data disparity, and may further increase data disparity, and it’s seldom done within a common context.
The common context for data resource integration is a common data architecture. The Common Data Architecture is a single, formal, comprehensive, organization-wide, data architecture that provides a common context within which all data are understood, documented, integrated, and managed. It provides the concepts, principles, and techniques for properly managing all data in an organization’s data resource. A common data architecture (not capitalized) represents the actual common data architecture built by an organization for their data resource, based on the Common Data Architecture.
Traditional data integration is usually either semantic or structural. Formal data resource integration includes both semantic and structural data integration. A choice between semantics or structure does not need to be made. Within the concept of a common data architecture, both semantics and structure are treated equally.
Traditional data integration usually relies on a system of reference, record of reference, or system of record. A single system of reference is identified and that system of reference becomes the comparate data resource. All other sources are then discarded, and miraculously, data integration is complete. However, an organization’s disparate data resource is usually so entangled with databases, applications, bridges, feeds, and so on, that it’s nearly impossible to find a single system of reference. Data resource integration includes all existing data to determine the best source for a comparate data resource.
Traditional data integration often includes two actions that are expressly forbidden in data resource integration. The first is the brute-force-physical approach where people jump right into the physical changes to the databases. Adjustments are made to the databases and to the applications in an attempt to resolve data disparity. However, little progress is made toward data resource integration, and the result is often worse than the initial situation. The second is the suck-and-squirt approach where data from a system of reference are sucked out of one database, pushed through some superficial data cleansing tool, and squirted into another database under the assumption that the result is comparate data. Again, little progress is made toward formal data resource integration, and the result is often worse than the initial situation.
Traditional data integration often refers to data migration. Migration is a movement to change location periodically, especially by moving seasonally from one region or country to another. It’s wandering without a long term purpose, or wandering with only current objectives in mind, like nomadic wandering or bird migration. It’s a lack of a permanent settlement, especially resulting from seasonal or periodic movement. Data resource transition is the transition of an organization’s data resource from a disparate data resource state, through an interim data resource state and a virtual data resource state, to a comparate data resource state. It’s a transition, not a migration.
The disparate data resource state is the current state of a disparate data resource in an organization and is outside the context of a common data architecture. The formal data resource state is a necessary state where the disparate data are readily understood within the context of a common data architecture. The virtual data resource state is an interim state between the formal data resource and a comparate data resource where real-time data transformation is performed to produce interim comparate data according to formal data transformation rules. The comparate data resource state is the desired state where disparate data have been substantially and permanently transformed to comparate data and the disparate data are substantially gone from the organization’s data resource. It’s a persistent state where the data are subject oriented according to the organization’s perception of the business world.
Data resource integration includes four formal phases: data inventory, data cross-referencing, preferred data architecture designation, and data transformation.
The data inventory concept is that all data at the organization’s disposal will be completely and comprehensively inventoried, and documented in one location that is readily available to anyone in the organization, so that the organization at large understands the content, meaning, and quality of those data. The data inventory objective is to identify, inventory, and document all data that currently exist in the organization’s data resource or are readily available to the organization so that those data can be readily understood and used to support the current and future business information demand.
The data cross-reference concept is the inventoried disparate data are cross-referenced to a common data architecture to further increase the understanding of those disparate data within a common context. The initial understanding gained during data inventorying is increased through a cross-referencing of the inventoried disparate data to a common data architecture. The data cross-reference objective is to thoroughly understand the content, meaning, structure, and integrity of all data at the organization’s disposal within the context of a common data architecture.
The preferred data architecture concept is that the redundancy and variability of disparate data will be resolved through the designation of a preferred data architecture and the transformation of disparate data to comparate data according to that preferred data architecture. The preferred data architecture objective is to designate the preferred representation of all data at the organization’s disposal so those data can be readily understood and shared within and without the organization. The objective is to take a common data architecture that was enhanced to cover the data cross-references and designate preferred components that will become a pattern for transforming disparate data to comparate data.
The data resource transformation concept states that all data transformation, whether disparate data to comparate data or comparate data to disparate data, will be done within the context of a common data architecture, using the preferred data architecture designations, according to formal data transformation rules. The best existing disparate data are extracted and transformed to comparate data to create a single, high quality version of truth about the business. The data resource transformation objective is to transform the best of the existing disparate data to a high quality comparate data resource so it can support the current and future business information demand. It’s a precise, detailed, and very rigorous process that creates a high quality comparate data resource. The data transformation process includes a formal Extract-Transform-Load sequence that consists of three extract processes, five transform processes, and three load process.
Organizations need to seriously consider formal data resource integration, rather than transient data integration, if they want to fully support their current and future business information demand.
* Material extracted from:
Brackett, Michael. Data Resource Integration: Understanding and Resolving a Disparate Data Resource. New Jersey: Technics Publications, LLC, 2012. (technicspub.com. Coupon code BRACKETT20)