Data Management is the organization of data, the steps used to achieve efficiency, and gather intelligence from that data. Data Management, as a concept, began in the 1960s, with ADAPSO (the Association of Data Processing Service Organizations) forwarding Data Management advice, with an emphasis on professional training and quality assurance metrics.
Data Management should not be confused with Data Governance, nor with Database Management. Data Governance is a set of practices and concepts which prioritize and organize data, as well as the enforcement of policies around data, while following various regulations and curtailing poor data practices.
Data Governance is essentially a part of the greater whole of Data Management. Database Management, on the other hand, is focused on the tools and technology used to create and alter the foundation of data, rather than the overall system used to organize the data. Database Management is also a subdivision of Data Management.
To gain a better understanding of Data Management, consider the following: Each airport has outgoing flights. Each passenger has a destination and reaching each destination requires one or more of flights. Additionally, each flight has a certain number of passengers. The information could be shown hierarchically, but this method has a major problem. The displayed data can be focused on flights, or passengers, or destinations, but not all three simultaneously. Displaying three separate hierarchies requires storing the data redundantly, and starts becoming expensive. Also, updating the data in three separate files is more difficult than updating it in one. All three hierarchies must be updated to eliminate confusion. Using a network data model, which is much more flexible, provides a better solution. Good Data Management is key to a successful business.
The management of data first became an issue in the 1950s, when computers were slow, clumsy, and required massive amounts of manual labor to operate. Several computer-oriented companies used entire floors to warehouse and “manage” only the punch cards storing their data. These same companies used other floors to maintain sorters, tabulators, and banks of card punches. Programs of the time were setup in a binary or decimal form, and were read from toggled on/off switches at the front of the computer, or magnetic tape, or even punch cards. This form of programing was originally called Absolute Machine Language (and later changed to First Generation Programming Languages).
Second Generation Programming Languages
Second Generation Programming Languages (formerly called Assembly Languages) were used as an early method for organizing and managing data. These languages became popular in the late 1950s and used letters from the alphabet for programming, rather than a complex string of ones and zeros. Because of this, programmers could use assembly mnemonics, making it easier to remember the codes. These languages are now antiquated, but helped to make programs much more readable for humans, and freed programmers from tedious, error-prone calculations.
High Level Languages
An understanding of foundational languages can help in creating a new web service or application.
High Level Languages (HLL) are older programming languages which were easy to read by humans. Some are still popular. Some aren’t. They allow a programmer to write generic programs which are not completely dependent of a specific kind of computer. While the emphasis of these languages is on ease-of-use, their primary purpose is to organize and manage data. Different High-Level Languages come with different strengths:
- FORTRAN was originally created by IBM during the 1950s for engineering and science applications. It is still used for numerical weather prediction, finite element analysis, computational fluid dynamics, computational physics, crystallography and computational chemistry.
- Lisp (historically, LISP) was originally described in 1958, and quickly became a favorite programming language for AI research. It was unusual in that it made no distinction between data and code, and was one of the first programming languages to initiate a number of ideas in computer science, such as automatic storage management, dynamic typing, and tree data structures. Lisp also had the flexibility to expand in ways its designers had never thought of. (Lisp is on the decline.)
- COBOL (Common Business Oriented Language) was developed by CODASYL in 1959, and was part of a U.S. Department of Defense goal to create a “portable” programming language for data processing. It is an English-like programming language designed primarily for business, finance, and administrative systems. In 2002, COBOL was revised and became an object-oriented programming language.
- BASIC (the Beginner’s All-purpose Symbolic Instruction Code) describes a group of general-purpose programming languages designed to be user-friendly. It was designed in 1964 at Dartmouth College. (BASIC doesn’t get used much, these days.)
- C was invented at Bell Labs in the 1970s, and had an operating system written inside of it. The operating system was UNIX, and because the program was written in C, UNIX could now be transported to another system. (At present, it continues to be one of the most popular programming languages in the world.)
- C++ (pronounced “c plus plus”) is based on C, and is a general-use programming language, with low-level memory manipulation. It was designed to be easily altered, comes with desktop applications, and can be installed in a variety of platforms. (It is still used widely, and popularity seems to be growing.)
Online Data Management
Online Data Management systems, such as travel reservations and stock market trading, must coordinate and manage data quickly and efficiently. In the late 1950s, several industries began experimenting with online transactions. Currently, Online Data Management systems can process healthcare information (think efficiency), or measure, store, and analyze as many as 7.5 million weld sessions per day (think productivity). These systems allow a program to read files or records, update them, and send the updated info back to the online user.
SQL (Structured Query Language) was developed by Edgar F. Codd during the 1970s, and focused on relational databases, providing consistent data processing and reducing the amount of duplicated data. The program is also fairly easy to learn, because it responds to commands in English. The relational model allows large amounts of data to be processed quickly and efficiently. The language became standardized in 1985.
Relational models represent both relationships and subject matter in a uniform way. A characteristic of relational data models is their use a unified language while navigating, manipulating, and defining data, rather than using separate languages for each task. Relational “algebra” is used to process record sets as a group, with “operators” being applied to whole record sets. Relational data models, combined with operators, provides shorter and simpler programs.
The relational model presented some unexpected benefits. It turned out to be very well-suited for parallel processing, client-server computing, and GUIs (graphical user interfaces). Additionally, a relational database model system (RDBMS) allows multiple users to access the same database simultaneously.
The primary purpose of NoSQL is the processing and research of big data. It started as basically a search engine, with some additional management features, and is “not” a part of a relational database. That has changed now with much more advanced NoSQL platforms. While structured data can be used during the research, it is not necessary. NoSQL’s true strength is its capacity to store and filter huge amounts of structured and unstructured data. The data manager has a variety of NoSQL databases to choose from, each with its own specific strengths.
The efficiency of NoSQL is the result of its unstructured nature, trading off consistency for speed and agility. This style of architecture supports horizontal scalability and has allowed significantly large-scale data warehouses (Amazon, Google, and the CIA) to process vast amounts of information. NoSQL is great at processing big data.
The concept of NoSQL came about in 1998, and was first used by Carlo Strozzi, but did not begin to gain in popularity until after 2005, when Doug Cutting and Mike Cafarella released Nutch to the general public. Nutch led to Hadoop (now referred to as Apache Hadoop), and as “free” open source software, quickly became quite popular.
Data Management in the Cloud
Cloud Data Management is fast becoming an additional responsibility for in-house data managers. Though the concept of cloud storage was developed in the 1960s, it didn’t become a reality until 1999, when Salesforce offered the delivery of applications via its website. Amazon imitated the idea in 2002, providing internet-based (cloud) services, which included storage. The rented use of applications and services on a website, via the internet, quickly became a popular way of dealing with large and unusual projects. As comfort with the services developed, many organizations began shifting the bulk of their storage, and processing activities, to the cloud. Consequently, a number of cloud start-ups formed.
The cloud now provides organizations with dedicated Data Management resources, as-needed. The benefits of managing data in the cloud include:
- Access to cutting-edge technology.
- The reduction of in-house system maintenance costs.
- Increased flexibility in meeting the changing needs of business.
- The processing of big data.
SLAs (Service Level Agreements) are the contracts used to agree on guarantees between the customers and a service provider. As the architecture of different cloud providers varies, it is in the data manager’s best interest to investigate, and select the best fit, based on their organization’s needs. The compatibility of a cloud’s security and access to storage are both of crucial concerns for a cloud data manager, and should be researched thoroughly.
Artificial Intelligence and Data Management
It is predictable that, within the next ten years, AI will help organize and sort through huge amounts of stored data, and make routine decisions on basic procedures. It will become more and more valuable as an assistant to the data manager. Some examples include:
- Processing, managing, and storing unstructured data.
- Discarding irrelevant data.
- Maximizing data integration for research and info queries.
- Determining the value of data, and the best location to store it.
Artificial intelligence has great potential for assisting data managers in developing and managing a highly functional Data Management program.
Image used under license from Shutterstock.com