Uniquely identifying people is not an easy task; but, it is far from impossible. Uniquely identifying people is important in both private sector and public sector organizations. However, it’s more important in public sector organizations due to the larger volume of data sharing between public sector organizations and the wider range of business activities.
Generally, private sector organizations don’t readily share data about people. They tend to keep their data more protected for competition reasons, and their data about people is relatively simple due to the narrower range of business activities. Health care is a notable exception because data can be readily shared between many different health care organizations. Public sector organizations regularly share their data, and deal with a wider variety of people across a wider range of business activities. Also, private sector organizations generally have a relative finite set of identifiers, while the public sector has a much wider range of identifiers.
The designation of unique identifiers for people begins with a definition of a person. A Person is a human being, of any age or size, of any race or ethnicity, of any mental capacity, of any gender, of any physical appearance, living or deceased, including an unborn fetus or a still-born fetus. The definition applies to a wide variety of business activities from law and justice to business licensing, from driver’s licenses to welfare, from health care to taxation, and so on. An organization does not use all components of the definition, but collectively organizations use all components of the definition.
Two primary key designations are useful for developing person identifiers.* Primary key meaning indicates whether or not the primary key is meaningful or meaningless to the business. A meaningful primary key is a primary key that is meaningful to the business. A meaningless primary key is a primary key that has no meaning to the business. Note that the terms meaningful and meaningless are used, rather than intelligent and non-intelligent, because primary keys cannot possess intelligence.
Primary key origin indicates whether the primary key is inherent to the data occurrences or was assigned within the organization and is not inherent to the data occurrence. A natural primary key is a primary key that is an inherent feature of the data occurrences. It is usually assigned outside the organization and is inherited by the organization. A natural primary key is usually, though not always, a meaningful primary key. An artificial primary key is a primary key that is arbitrarily assigned to the data occurrences by the organization to support their management of the data. An artificial primary key is usually, though not always, a meaningless primary key.
Unique identifies for people in the public sector is difficult due to many possible choices that include both artificial and natural identifiers. A person can have many artificial identifiers and can have many variations of the same artificial identifier, and those identifiers often vary across organizations. For example, driver’s license number, passport number, business identifiers, welfare recipient identifiers, sate inmate identifiers, social security or other social numbers, and so on, are artificial identifiers that vary across organizations.
Natural identifiers are more unique to a person. Friction ridges include all skin ridge detail, such as finger prints, palm prints, toe prints, and foot prints. They display a number of unique characteristics known as minutiae. Fingerprints are the most common friction ridge and have been used for more than a century to identify people. Fingerprint analysis is widely used today, particularly in law enforcement, as conclusive identification of a person. The Automated Fingerprint Identification System (AFIS) is used internationally to uniquely identify people.
DNA Profiling, also called DNA typing and genetic fingerprinting, is a biometric process that uniquely identifies people. It is an encrypted set of numbers reflecting a person’s DNA, and is more accurate than traditional fingerprinting. About 99.9% of the human DNA is identical, but the remaining 0.1% is highly variable and uniquely identifies each person, except in the case of identical twins.
Retinal scans are a biometric identification technique that uses the unique patterns on a person’s retina. The retina is a thin tissue at the back of the eye that has a complex structure of capillaries supplying blood. That capillary structure is unique to each person, including identical twins. The retina generally remains unchanged during a person’s life, except in the cases of diabetes, glaucoma, or retinal degenerative disorders.
A retinal scan is performed by casting a low-energy infrared light into a person’s eye. The blood vessels absorb more light than the rest of the retina. That pattern of variations is converted to a unique code that can be stored and used to identify a person.
Retinal scans are not the same as iris scans or iris recognition. The iris is a circular structure in the front of the eye for controlling the size of the pupils, which controls the amount of light reaching the retina. The iris is colored, typically green, blue, or brown, but can be hazel, gray, violet, or pink. The iris color does not uniquely identify a person. Some people believe that a study of the iris, called iridology, can determine a person’s health. However, such studies do not uniquely identify a person.
Voice prints are a biometric process that uses the spectrogram of a person’s voice to identify that person. It is also known as speaker verification or speaker authentication. Voice prints can be used to analyze whether a person’s response to a question is truthful or deceptive. They are the basis for psychological stress evaluation (PSE) that is often used by law enforcement instead of a polygraph.
Voice prints are the recognition of a speaker, not the recognition of the speech produced by a speaker. Speech recognition is a different process that determines what has been said and converts it to text. Speaker recognition identifies who is speaking, independent of what is being said. The term voice recognition is confusing because it includes both speaker recognition and speech recognition. The term speaker diarization is a process used to identify when the same speaker is speaking and is used to simplify translating speech to text based on the recognition of a person’s voice.
Facial recognition is another biometric process that identifies or verifies a person based on a digital image of their face. Selected facial features are encoded and compared to known digital images for a person. Facial recognition has typically been a two-dimensional comparison of a person’s facial features. However, as technology evolves, three-dimensional sensors can be used to identify the shape of a person’s face, such as eye sockets, nose, chin, cheek bones, and so on. Three-dimensional features offer improved accuracy over two-dimensional features.
Biometric identification technology is evolving rapidly. What used to take months now takes only days or hours. As the technology continues to evolve, rapid tests can be conducted to quickly identify an individual. These tests can be used for a wide range of business activities from identifying suspects, customers in a bank, welfare recipients, and so on. These tests can be used on site in real time for rapid and unique identification.
It’s not feasible for public sector organizations to share all data about people. The volume of data is way too large, even for today’s computer speed and storage capability. The best approach to sharing data about people is to develop a matrix of identifiers that includes both natural and artificial identifiers. Data sharing is done first by sharing only the identifiers. When a match is found, additional data are shared based on each organization’s needs.
For example, a Person can have many Person Identifiers. Each Person Identifier is qualified with Person Identifier Type, such as Driver’s License, Social Security Number, DNA Profile, Retinal Scan, and so on. The names a person uses, though not unique, could also be added to the matrix. The names would be qualified as Birth Name, Married Name, Legal Name Change, Alias Name, Street Name, and so on. Other non-unique identifiers that might help uniquely identify a person could also be added, such as birth date, iris color, blood type, distinguishing features, and so on.
The identifier matrices may be sparse because not all the possible identifiers are available for all people. Also, comparisons may need to be made between matrices to uniquely identify a person. Initial matrices may be developed in different organizations based on their perception of a unique person. Comparing these initial matrices between organization may identify the same person, at which time the matrices are combined.
Uniquely identifying people can be used for a wide variety of purposes from providing common public services to an individual, to appropriate medical treatment, to preventing fraud. Any organization managing data about people and involved with uniquely identifying people, particularly when sharing data across multiple organizations, must be aware of all the possible artificial and natural identifiers and use those identifiers appropriately.
* Resolving the Primary Key Fiasco, Dataversity.net, August 28, 2012.