Data Literacy is essentially the ability to read and understand data, much as one might read and understand a magazine article. The primary advantage of having the bulk of the staff made up of people who are data literate is that it reduces the need for data scientists. People on staff can handle many of the data issues that arise.
Staff members can critically assess data, to varying degrees, and fix problems or find business insights. This is why it is important for most people in the organization, not just the data analysts, to have access to data, and a basic ability to read and use the data. Putting data to use, rather than just collecting it, can be crucial to the future of a business. (There are some who suggest “everyone” working for an online business should be data literate, but this could be considered an extremist position.) Data Literacy, especially in terms of big data research, requires some knowledge of statistics and mathematics.
In a data literate organization, staff would know how to use data in their day-to-day activities, and in supporting big-picture decisions. If used correctly, a Data Literacy training program can help every team member in achieving their work goals, work more efficiently, and provide a common language, while adding to the organization’s overall performance. It is generally believed that when everyone is given access to the data, the organization becomes more streamlined and efficient. Certainly it is important for members of teams to be data literate.
The Changing Meaning of Data
The word data can be traced back to the greek mathematician, Euclid, around 300 B.C. He wrote a book, titled Data, which was a collection of geometrical axioms. After evolving into the word “date” (a time reference including the day, month, and year), the word re-appeared in English scientific texts, in the mid-1600s. The historian Daniel Rosenberg wrote about how the term’s meaning shifted in scientific contexts during the 1700s:
“At the beginning of the century, ‘data’ was especially used to refer either to principles accepted as the basis of argument or to ‘facts gleaned from scripture’ that were unavailable to questioning. By the end of the century, the term was most commonly used to refer to facts in evidence determined by experiment, experience, or collection.”
By the year 1900, the term data was used to describe the results of statistical observations and was considered common ground for scientists in reaching their conclusions and in defending those conclusions. Clear rules were used by “computers,” statisticians, and scientists when producing data, making it reliable information when used for further study. (There were human computers before the electronic ones came along, doing the same work, only slower.) Additionally, the use of the bible as a source of unquestionable principles became “questionable,” and was rarely used when providing data. The concept of “data” as a result of scientific experiments, observations, and statistics became a reality shortly before the computer revolution.
In the early 1900s, the meaning of data was expanded to include information (primarily mathematical information) used by electronic computers. Transferring the term data to cover the statistical information used by early electronic computers was a fairly simple shift. However, the overall meaning of the word data shifted subtly during the process. Prior to this, with the exception of biblical sources, data had only been generated by humans using scientific methodology. For some, this philosophical shift caused great anguish, because now, data could be generated by a machine. (Could it possibly be as accurate as human research?)
Data Literacy Becomes a Reality
The “concept” of Data Literacy took a little while to form. Originally, computers acted as calculators, performing mathematical equations at speeds far faster than humans could. Then, computers began being used for statistical research. Although the “concept” of Data Literacy would take several years to materialize, its origins start with data analysis and statisticians understanding and “reading” statistical information. In 1962, in his The Future of Data Analysis, John W. Tukey wrote:
“For a long time, I thought I was a statistician, interested in inferences from the particular to the general. But as I have watched mathematical statistics evolve, I have had cause to wonder and doubt. I have come to feel that my central interest is in data analysis… Data analysis, and the parts of statistics which adhere to it, must take on the characteristics of science rather than those of mathematics data analysis is intrinsically an empirical science. How vital and how important is the rise of the stored-program electronic computer?”
Tukey promoted a shift in thinking. He suggested understanding statistical data was more important than blindly following statistical equations.
This was followed by Peter Naur’s Concise Survey of Computer Methods in 1974. His book was a survey of data processing methods used at the time and was organized around the philosophy of data representing facts or ideas in a formalized way, and that these facts or ideas can be communicated or manipulated. Naur combined understanding data processing methods with making it accessible (in terms of communications) to large numbers of people and providing common language.
In 1996, Usama Fayyad, Padhraic Smyth, Gregory Piatetsky-Shapiro authored the paper From Data Mining to Knowledge Discovery in Databases. As the title suggests, they focused on the idea of businesses shifting from the “more limited” data mining to knowledge discovery in databases (KDD). In their paper, they criticize the blind and mindless application of data mining methods and promote a much more expansive series of steps in a process that highlights understanding and using the results intelligently.
In an article titled Mining Data for Nuggets of Knowledge, Jacob Zahavi stated, “Special data mining tools may have to be developed to address web-site decisions.” Zahavi’s request for specialized data mining tools-initiated research into the field, which, after decade of development, resulted in models and charts that made understanding the data much, much easier.
In 2009, Kirk D. Borne, and his colleagues, submitted a paper to the Astro2010 Decadal Survey, titled The Revolution in Astronomy Education: Data Science for the Masses. They wrote, “Non-specialists require information literacy skills as productive members of the 21st century workforce, integrating foundational skills for lifelong learning in a world increasingly dominated by data.” In essence, Borne and his colleagues were announcing to the world that researchers outside of the data science field would need to become data literate in order to use the tools of Data Science, or to even communicate with data scientists and analysts.
Becoming a Data Literate Organization
Ideally, a startup would embrace the philosophy of Data Literacy when hiring its staff. Keeping this philosophy in mind during the hiring process would eliminate the need for training the entire staff at a later date. A business that is serious about hitting the ground running, and working efficiently, will
need to evaluate new employees for their Data Literacy. (Some training, in terms of specific tools and software, will probably be needed, but a data literate individual should catch on pretty quickly.)
The questions below should provide some insight into the understanding of data by individuals within the organization or of potential employees:
- Can this person interpret straight forward statistics, such as averages and correlations?
- Can they build a business case using accurate and relevant numbers?
- Can they explain the results of their data systems or processes?
- Can they explain how their machine learning algorithms work?
For organizations that already have staff in place, and want to educate their people in Data Literacy, there are many options available:
- The Data Literacy Project
- Rutgers University
- Qlik Data Literacy Framework
- National Library of Medicine
- DATAVERSITY Training Center
- Centre for Humanitarian Data
- Partners in Data Literacy
Knowledge Sharing is Continuous
A data literate organization involves building a culture, and knowledge sharing should be an important part of that culture. Data Literacy cannot be achieved through classes alone but must be a continuous process of communication within the business. While workshops can lay a foundation of knowledge, situations where staff are both empowered and encouraged to educate one another provide the best environments for building Data Literacy. This kind of training can be one-on-one or in the form of workshops.
Workshops have the additional benefit of being turned into an AV recording. Sometimes a staff member won’t be able to attend a workshop because of conflicting project deadlines, but they can view the recording of it. While not a replacement for ‘live’ interactions, recorded workshops can help to update and educate staff who couldn’t attend.