Historically speaking, a simple definition of Analytics is “the study of analysis.” A more useful, more modern description would suggest “Data Analytics” is an important tool for gaining business insights and providing tailored responses to customers. Data Analytics, sometimes abbreviated to “Analytics,” has become increasingly important for organizations of all sizes. The practice of Data Analytics has gradually evolved and broadened over time, providing many benefits.
The use of Analytics by business can be found as far back as the 19th century, when Frederick Winslow Taylor initiated time management exercises. Another example is when Henry Ford measured the speed of assembly lines. In the late 1960s, Analytics began receiving more attention as computers became decision-making support systems. With the development of Big Data, Data Warehouses, the Cloud, and a variety of software and hardware, Data Analytics has evolved, significantly. Data Analytics involves the research, discovery, and interpretation of patterns within data. Modern forms of Data Analytics have expanded to include:
- Predictive Analytics
- Big Data Analytics
- Cognitive Analytics
- Prescriptive Analytics
- Descriptive Analytics
- Enterprise Decision Management
- Retail Analytics
- Augmented Analytics
- Web Analytics
- Call Analytics
Statistics and Computers
Data Analytics is based on statistics. It has been surmised statistics were used as far back as Ancient Egypt for building pyramids. Governments worldwide have used statistics based on censuses, for a variety of planning activities, including taxation. After the data has been collected, the goal of discovering useful information and insights begins. For example, an analysis of population growth by county and city could determine the location of a new hospital.
The development of computers and the evolution of computing technology has dramatically enhanced the process of Data Analytics. In 1880, prior to computers, it took over seven years for the U.S. Census Bureau to process the collected information and complete a final report. In response, inventor Herman Hollerith produced the “tabulating machine,” which was used in the 1890 census. The tabulating machine could systematically process data recorded on punch cards. With this device, the 1890 census was finished in 18 months.
Relational Databases and Non-Relational Databases
Relational Databases were invented by Edgar F. Codd in the 1970s and became quite popular in the 1980s. Relational Databases (RDBMs), in turn, allowed users to write in Sequel (SQL) and retrieve data from their database. Relational Databases and SQL provided the advantage of being able to analyze data on demand, and are still used extensively. They are easy to work with, and very useful for maintaining accurate records. On the negative side, RDBMs are generally quite rigid and were not designed to translate unstructured data.
During the mid-1990s, the internet became extremely popular, but relational databases could not keep up. The immense flow of information combined with the variety of data types coming from many different sources led to non-relational databases, also referred to as NoSQL. A NoSQL database can translate data using different languages and formats quickly and avoids SQL’s rigidity by replacing its “organized” storage with greater flexibility.
The development of NoSQL was followed by changes on the internet. Larry Page and Sergey Brin designed Google’s search engine to search a specific website, while processing and analyzing Big Data in distributed computers. Google’s search engine can respond in a few seconds with the desired results. The primary points of interest in the system are its scalability, automation, and high performance. A 2004 white paper on the topic of MapReduce inspired several engineers and attracted an influx of talent to focus on the challenges of processing Big Data (Data Analytics).
In the late 1980s, the amount of data being collected continued to grow significantly, in part due to the lower costs of hard disk drives. During this time, the architecture of Data Warehouses was developed to help in transforming data coming from operational systems into decision-making support systems. Data Warehouses are normally part of the Cloud, or part of an organization’s mainframe server. Unlike relational databases, a Data Warehouse is normally optimized for a quick response time to queries. In a data warehouse, data is often stored using a timestamp, and operation commands, such as DELETE or UPDATE, are used less frequently. If all sales transactions were stored using timestamps, an organization could use a Data Warehouse to compare the sales trends of each month.
The term Business Intelligence (BI) was first used in 1865, and was later adapted by Howard Dresner at Gartner in 1989, to describe making better business decisions through searching, gathering, and analyzing the accumulated data saved by an organization. Using the term “Business Intelligence” as a description of decision-making based on data technologies was both novel and far-sighted. Large companies first embraced BI in the form of analyzing customer data systematically, as a necessary step in making business decisions.
Data Mining began in the 1990s and is the process of discovering patterns within large data sets. Analyzing data in non-traditional ways provided results that were both surprising and beneficial. The use of Data Mining came about directly from the evolution of database and Data Warehouse technologies. The new technologies allow organizations to store more data, while still analyzing it quickly and efficiently. As a result, businesses started predicting the potential needs of customers, based on an analysis of their historical purchasing patterns.
However, data can be misinterpreted. Someone in the trades, having purchased two pairs of blue jeans online, probably won’t want to buy jeans for another two or three years. Targeting this person with blue jean advertisements is both a waste of time and an irritant to the potential customer.
In 2005, Big Data was given that name by Roger Magoulas. He was describing a large amount of data, which seemed almost impossible to cope with using the Business Intelligence tools available at the time. In the same year, Hadoop, which could process Big Data, was developed. Hadoop’s foundation was based on another open-source software framework called Nutch, which was then merged with Google’s MapReduce.
Apache Hadoop is an open-source software framework, which can process both structured and unstructured data, streaming in from almost all digital sources. This flexibility allows Hadoop (and its sibling open-source frameworks) to process Big Data. During the late 2000s, several open source projects, such as Apache Spark and Apache Cassandra came about to deal with this challenge.
Analytics in the Cloud
In its early form, the Cloud was a phrase used to describe the “empty space” between users and provider. Then, in 1997, Emory University professor Ramnath Chellappa described Cloud Computing as a new “computing paradigm where the boundaries of computing will be determined by economic rationale, rather than technical limits alone.”
In 1999, Salesforce provided a very early example of how to use Cloud Computing successfully. Though primitive by today’s standards, Salesforce used the concept to develop the idea of delivering software programs by way of the internet. Programs (or applications) could be accessed or downloaded by any person with internet access. An organization manager could purchase software in a cost-effective, on-demand method without leaving the office. As businesses and organizations gained a better understanding of the Cloud’s services and usefulness, it gained in popularity.
The Cloud has evolved significantly since 1999, with customers “renting the services,” rather than acquiring hardware and software for the same purpose. Vendors are now responsible for all the trouble-shooting, backups, administration, capacity planning, and maintenance. And, for several business projects, the Cloud is simply easier and more efficient to use. The Cloud now has significantly large amounts of storage, availability to multiple users simultaneously, and the ability to handle multiple projects.
Predictive Analytics is used to make forecasts about trends and behavior patterns. Predictive Analytics uses several techniques taken from statistics, Data Modeling, Data Mining, Artificial Intelligence, and Machine Learning to analyze data in making predictions. Predictive models can analyze both current and historical data to understand customers, purchasing patterns, procedural problems, and in predicting potential dangers and opportunities for an organization.
Predictive Analytics first started in the 1940s, as governments began using the early computers. Though it has existed for decades, Predictive Analytics has now developed into a concept whose time has come. With more and more data available, organizations have begun using Predictive Analytics to increase profits and improve their competitive advantage. The continuous growth of stored data, combined with an increasing interest in using data to gain Business Intelligence, has promoted the use of Predictive Analytics.
Most organizations deal with unstructured data. Making sense of this unstructured data is not something humans can easily do. Cognitive Analytics merges a variety of applications to provide context and answers. Organizations can collect data from several different sources, and cognitive analytics can examine the unstructured data in-depth, offering decision-makers a better understanding of their internal processes, customer preferences, and customer loyalty.
Augmented Analytics provides automated Business Intelligence (and insights) by using Natural Language Processing and Machine Learning. It “automates” Data Preparation and enables data sharing. Augmented Analytics provides clear results, and access to sophisticated tools, allowing researchers and managers to make daily decisions with a high degree of confidence. It allows decision-makers to gain insights and act quickly and confidently.
Ultimately, Augmented Analytics attempts to reduce the work of Data Scientists by automating the steps used in gaining insights and Business Intelligence. An Augmented Analytics engine will automatically process an organization’s data, clean the data, analyze it, and then produce insights leading to instructions for executives or salespeople.
Photo Credit: bestfoto77/Shutterstock.com