Data Integrity Rules 1

By   /  June 5, 2014  /  No Comments

rules x300by Michael Brackett

Developing formal business rules is an excellent concept that must be followed by both business professionals and data management professionals to ensure high quality processes and data. However, developing formal business rules is often oriented toward processes, procedures, and policies rather than toward data resource quality.  Formal business rules must be developed to ensure a high quality data resource and high quality information from that data resource to support an organization’s business information demand.

A rule is an authoritative, prescribed direction for conduct, or a usual, customary, or generalized course of action or behavior a statement that describes what is true in most or all cases; a standard method or procedure for solving problems.  To provide a foundation for developing formal data rules for an organization’s data resource, business rules are divided into six categories representing the six columns of the Zachman Framework, specifically data rules, process rules, network rules, people rules, time rules, and motivation rules. The first three categories are referred to as architectural rules and the last three categories are referred to as behavioral rules. (Adapted from Data Resource Design).

The emphasis for data resource design and development is on data rules.  A data rule is a subset of business rules that deals with the data column of the Zachman Framework.  They specify the criteria for maintaining data resource quality.  They are not the same as data requirements, just like business rules are not the same as business requirements, and network rules are not the same as network requirements, following the principle of independent architectures.

Data rules are divided into five groups representing data integrity rules, data source rules, data extraction rules, data translation rules, and data transformation rules.  Data integrity rules are important for ensuring data resource quality during data resource design, development, and use.  The other four groups are important for data resource integration and won’t be described in the current article.

Data integrity is one component of the relational model, but no formal criteria were provided for specifying the data integrity.  The data integrity that was developed was largely oriented toward physical development and implementation of database management systems and seldom toward logical data integrity from a business perspective.  The physical data integrity was usually placed on the data structure since the structure and integrity components were for database development and the manipulative component was for database use.  However, that practice often overloaded the data structure, which led to paralysis by analysis, brute force physical development, and brute force implementation.

Precise data integrity rules need to be developed to resolve these problems.  Integrity is the state of being unimpaired, the condition of being whole or complete, or the steadfast adherence to strict rules.  Data integrity is a measure of how well the data are maintained in the data resource after they are captured or created.  It indicates the degree to which the data are unimpaired and complete according to a precise set of rules.  Data integrity rules specify the criteria that need to be met to insure that the data resource contains the highest quality necessary to support the current and future business information demand.

Precise means clearly expressed, definite, accurate, correct, and conforming to proper form.  Precise data integrity rules denotatively specify the criteria for high quality data and reduce or eliminate errors in the data  resource.  Precise data integrity rules are short statements about constraints that need to be applied or actions that need to be taken on the data when entering the data resource or while in the data resource.

Precise data integrity rules are separate from comprehensive data definitions, although they must be in synch with the data definitions.  A data definition explains the data and is not a data rule.  A data definition, by definition, is a data definition.  Treating a data definition as a data rule directly impacts the development of comprehensive data definitions.  Similarly, precise data integrity rules are separate from proper data structures, although they must be in synch with the data structure.

Precise data integrity rules do not state or enforce accuracy, precision, scale, or resolution.  They can only verify the validity of the data.  Data accuracy is a measure of how well the data values represent the business world at a point in time or for a period of time.  Data precision is how precisely a measurement was made and how many significant digits are included in the measurement.  Scale is the ratio of a real world distance to a map distance.  Resolution is the degree of granularity of the data, indicating how small an object can be represented with a specific scale and precision.  Data accuracy, precision, scale, and resolution are documented either in the comprehensive data definition, or as data values.

Precise data integrity rules do not pertain to completeness or suitability.  Data completeness is a measure of how well the scope of the data resource meets the scope of the current and future business information demand.  Data suitability is how suitable the data are for a specific purpose, which varies with the use of the data.  Data completeness and suitability cannot be documented in data definitions, data structure, or data integrity rules.

Precise data integrity rules do no pertain to volatility or currentness.  Data volatility is a measure of how quickly data in the business world changes.  Data currentness is a measure of how well the data values remain current with the business.  Note that currentness is used rather than currency to prevent any confusion with monetary amounts.  Data volatility and currentness are best stated in comprehensive data definitions rather than as precise data integrity rules.

Precise data integrity rules are formally and uniquely named according to the data naming taxonomy and supporting vocabulary (A Suitable Descriptive Component).  A data integrity rule name is designated with an exclamation mark (!), such as Student. Name, Change!.  Data integrity rule versions are designated with left and right carets, such as Student. Name, Change! <1990 – 1998>.

Data integrity rules are normalized to the data resource component which they represent or on which they take action.  Formally naming data integrity rules requires that the data integrity rules be normalized, the same as formally naming data requires that they be normalized.  For example, an account balance data integrity rule takes action on the balance, not on the individual transactions, such as Account. Balance, Derivation!.  When the data are properly normalized and formally named, the development of normalized data integrity rules is relatively easy.

Data integrity rules can be stated in narrative form or as a formal notation.  The narrative form is often unclear and could easily lead to connotative interpretation and low quality data.  Also, data integrity rule engines are becoming more prominent and a specific format is needed so those engines can process data integrity rules.  Therefore, precise data integrity rules must be stated with a formal notation that provides a denotative interpretation and can be processed by data integrity rule engines.

The When – Then notation with no Else condition is used for precise data integrity rules because it is more acceptable to business professionals.  The If – Then notation often implies a mathematical structure, and also implies an Else condition which could lead to low quality data because it may not check for all possible situations.  The When notation should state every possible condition and the Else notation should only be used as an error condition for situations that are not valid.

The symbols used in a formal notation must be acceptable to business professionals and data management professionals.  It must be based on accepted mathematical and logic notation, and must use symbols readily available on a standard keyboard, such as < for less than, > for greater than,  = for equals, <> for not equal to, >< for must equal, & for logical and, | for logical or, || for concatenation, { } for a set, and so on.  A special symbol table should not be needed to create precise data integrity rules.

Precise data integrity rules use a set of common words, much like the common words for formal data names.  A few of the common words are Cardinality! for specifying data cardinality, Change! for specifying actions when data values change, Derivation! for specifying the data derivation algorithm, Domain! for specifying a domain of valid values, Need! for specifying the need, such as Required, Optional, or Prevented, and so on.

Data management professionals must keep data rules separate from other business rules.  They may need business decisions about the development of data rules, but that is not the same as developing other business rules. They must recognize the difference between the types of data rules, and must keep data integrity rules separate from other data rules.  They must understand the need for precise data integrity rules as one of the primary components of the organization’s data resource, and must develop and use precise data integrity rules to ensure high quality data in the organization’s data resource.

You might also like...

Property Graphs: The Swiss Army Knife of Data Modeling

Read More →