Data management is a very lexically challenged discipline. A major part of that lexical challenge is the terms data, information, and knowledge. These three terms are often misused, abused, and used interchangeably to the point that their real meaning is often unclear. These three terms must be formally defined and consistently used to begin resolving the lexical challenge and creating a formal data management profession.
Data are the individual facts that are out of context, have no meaning, and are difficult to understand. They are often referred to as raw data. The term data is plural, equivalent to facts, while datum is singular, equivalent to a fact. Although some people continue to use the term data as singular, a comprehensive, denotative definition of data in the singular form, beginning with Data is … is not available. Most definitions of data in the singular are really definitions of a data resource.
Data could be considered an irregular noun, like deer or sheep, where the meaning is in the context. Data could be used to represent an individual fact the same as datum, and data could be used to represent a set of facts. However, the data management discipline has enough lexical challenge without treating data as an irregular noun. Therefore, datum is singular and data is plural.
Data in context are individual facts that have meaning and can be readily understood. They are the raw facts wrapped with meaning, but they are not yet information. Datum in context is a single fact wrapped with meaning.
Information is a set of data in context with relevance to one or more people at a point in time or for a period of time. Information is more than data in context—it must have relevance and a time frame. Information is considered to be singular.
Knowledge is cognizance, cognition, the fact or condition of knowing something with familiarity gained through experience or association. It’s the acquaintance with or the understanding of something, the fact or condition of being aware of something, or apprehending truth or fact. Knowledge is information that has been retained with an understanding about the significance of that information. Knowledge includes something gained by experience, study, familiarity, association, awareness, and/or comprehension.
Knowledge can be either tacit or explicit. Tacit knowledge, also known as implicit knowledge, is the knowledge that a person retains in their mind. It’s relatively hard to transfer to others and to disseminate widely. Explicit knowledge, also known as formal knowledge, is knowledge that has been codified and stored in various media, such as books, magazines, tapes, presentations, and so on, and is held for mankind, such as in a reference library or on the web. It is readily transferable to other media and capable of being readily disseminated.
Organizational knowledge is information that is of significance to the organization, is combined with experience and understanding, and is retained by the organization. It’s information in context with respect to understanding what is relevant and significant to a business issue or business topic – what is meaningful to the business. It’s analysis, reflection, and synthesis about what information means to the business and how that information can be used. It’s a rational interpretation of information that leads to business intelligence.
Knowledge management is the management of an environment where people generate tacit knowledge, render it into explicit knowledge, and feed it back to the organization. The cycle forms a base for more tacit knowledge, which keeps the cycle going in an intelligent learning organization. It’s an emerging set of policies, organizational structures, procedures, applications, and technology aimed toward increased innovation and improved decisions. It’s an integrated approach to identifying, sharing, and evaluating the organization’s information. It’s a culture for learning where people are encouraged to share information and best practices to solve business problems.
Some people have misperceptions of information. One misperception is that information is the same as data in context. Whenever raw data are wrapped with meaning, those data become information. However, if information is considered to be data in context, then the question becomes what are the terms for information that is relevant and timely and information that is not relevant and timely?
The answer might lead to relevant information and non-relevant information. However, only relevant information leads to knowledge and non-relevant information does not lead to knowledge. Therefore, raw data are wrapped with meaning to become data in context, which can become either relevant or non-relevant information. Only relevant information can become knowledge.
Another misperception is that information is any summary data or derived data. That misperception is not valid because whether data are primitive or derived, they are still data. They have not yet become relevant or timely and, therefore, are not yet information.
If data in context are not relevant or timely, then they are not information. However, data may not be relevant or timely to one person, but could be relevant and timely to another person. Therefore, the definition of information can be expanded. Specific information is a set of data in context that is relevant and timely to one or more people at a point in time or for a period of time. General information is a set of data in context that could be relevant to one or more people at a point in time or for a period of time.
Now that these terms are defined, the data-information-knowledge cycle can be defined. The data-information-knowledge cycle is the cycle from data, to data in context, to relevant information (specific or general), to knowledge, and back to data when that information or knowledge is stored, as shown in the diagram below.
Many people want to belabor the issue, but when information and knowledge are stored, they become part of the organization’s data resource and are managed according to formal data resource management concepts, principles, and techniques. Whether those data were once raw data, specific or general information, or knowledge makes no difference. Everything stored is part of the organization’s data resource, is considered data, and is formally managed as data.
When specific information and general information are stored, they become part of the data resource, they are treated as data, and are managed like any other data. Those data will only become information again when they become relevant and timely. The same is true for knowledge. Stored knowledge becomes data and is managed like any other data. Those data will only become knowledge again when they are extracted as information, combined with experience, and retained.
A book on the shelf, a document on a server, raw data, a stored form or document, a stored report, and so on, are all considered data and managed as part of the organization’s data resource. The storage of information or knowledge is still data to other people, and may or may not become information or knowledge to those people.
Looking at the situation the other way around, all information and knowledge were data at one time, whether or not they were stored in the organization’s data resource. By becoming relevant and timely, those data became information. By being combined with business experience and retained, that information becomes knowledge.
Based on these definitions, there is no information resource, because timeliness and relevancy cannot be managed or stored. There can be information resources (plural) which is the set of resources used to produce information from data and present that information to the business. Knowledge resource is the tacit and implicit knowledge within an organization or available to the organization, and most of that knowledge is stored in the human resource.
Information overload is a misused term that is part of the lexical challenge because it is unclear. Information assimilation overload occurs when information is coming too fast for a person to absorb and understand. A certain amount of time is needed for information to be assimilated, and the delivery needs to match that assimilation rate.
Disparate information is any information that is disparate with respect to the recipient. It could result from information acquired from different sources that are organized differently, or it could result from information created from disparate data that provide conflicting information, or it could be conflicting information.
Information paranoia is the fear of not knowing everything that is relevant or could be relevant at some point in time. It’s a situation where a person is obsessed with gaining information for information’s sake.
Non-information is a set of data in context that is not relevant or timely to the recipient. Data overload is a deluge of data or data in context coming at a recipient, but is not relevant and timely. It’s a deluge of non-information that is not wanted by the recipient.
Data management professionals must establish proper terms that are comprehensively and denotatively defined, and must use them properly. The development and proper use of basic terms is one step toward resolving the lexical challenge in data resource management and creating a formal data management profession.