To the wide business community, a Data Scientist is one of those “data magicians,” who can acquire disparate data masses from diverse business functions; clean, massage, organize, and prepare the data; and, then exploit their inherent skills in mathematics, statistics, and Machine Learning to uncover hidden business insights and intelligence.
The data used by a Data Scientist can be both structured (metrics or raw numbers) and unstructured (e-mails, images, videos, or social data). As Data Scientists often have to create algorithms to extract insights from such complex data, these “data magicians” are expected to be equipped with a variety of skills and experience levels. Data Science skill development has been an issue of ongoing debate and continues to raise questions about the appropriate training programs available in Data Science.
A Recent RJMetrics Study of LinkedIn Profiles reveals that the number of Data Scientists has doubled in the last four years. According to Glassdoor, Data Scientists enjoy a median salary of $113,000. In a Datanami article titled The Future of Data Science, the continued impact of Data Science on the global business world will primarily depend on advanced analytics – especially using predictive or “real-time” analytics for achieving predetermined objectives like superior products and services, micro-market products, improved customer experience, and reduced operating costs.
Thus, advanced analytics and Machine Learning skills are high on any industry leader’s list of wanted skills when looking for a Data Scientist to hire. The future business environment will also expect more speed of execution with Data Science projects, so future Data Scientists will not only be expected to have traditional math, statistics, and Machine Learning skills, but also sound knowledge of and experience in using data productivity tools, such as those to automate data cleaning or data modeling.
A Typical Day in a Data Scientist’s Life
The organization Masters in Data Science recently wrote about a typical day in a Data Scientist’s life, which according to them revolves around a number of particular tasks, including:
- Conducting research to create useful questionnaires about existing industry practices.
- Studying, extracting, and collecting data from both internal and external sources to address various problems.
- Using a combination of Machine Learning, advanced analytics, and statistical methods to prepare data for use in data modeling.
- Cleansing the data for accuracy and relevance. How Much Time Do Data Scientists Spend Cleaning Data is an interesting review of this task.
- Examining and studying data to discover hidden trends, anomalies, or weaknesses.
- Developing algorithms to solve problems and using tools to automate the work.
- Communicating the findings and insights to the business leaders or managers through the use of data visualizations and reports.
- Making effective recommendations for operational changes.
Observations on Data Science Skill Development
A late 2014 article at KDNuggets explored the most important skills needed in Data Science, which included advanced degrees, knowledge of SAS or R, Python coding, knowledge of Hadoop platform, knowledge of SQL, and Cloud Computing among them. All have been identified as essential skills for an aspiring Data Scientist breaking into the industry. A 2016 DataStax survey has added to that list NoSQL expertise.
Along with the above skills, a Data Scientist will also need to demonstrate strong communication skills to articulate problems, business knowledge or domain skills, and intellectual curiosity to be able to do the job successfully.
Quora Analyzes Data Science Skill Development
When Quora analyzed the issue of finding qualified employees trained in Data Science, the most critical skills needed to become a successful Data Scientist were identified as a curiosity about data and a deep interest in the domain. The ideal Data Scientist, according to the article, must demonstrate a passion for the subject and an intellectual inquisitiveness about the data to be able to make an impact on the job. The soft skill that Quora promotes is “story-telling” ability, which is required to extract the story behind the numbers and figures.
A recent article from Business2Community provides additional support to the above observation, as the author points out that the most sought after skills in the Data Science industry fall in the areas of statistics, which includes data visualization. It is interesting to note that this article has not included programming in the list of top 10 skills required to succeed as a Data Scientist.
What Do Employers Want in Their Data Professionals?
An article from DeZyre concerning Hadoop and Data Science jobs provides the following insights:
- The emerging Data Scientist needs to develop business domain knowledge. Frequently, candidates with stellar academic records fail on the job because they fail to apply their knowledge in real-world situations.
- Candidates must demonstrate quick aptitude for Data Science tools like R, Python, Hadoop, or SAS.
- Successful Data Scientists are usually convincing story tellers. They ought to be able to communicate the “story” hidden behind their findings. Visualization eases understanding.
Information Week discussed Gartner’s Magic Quadrant, where Gartner predicted that by 2020, predictive and prescriptive analytics capabilities will attract 40% of enterprise investments in BI and analytics technologies. Thus, this prediction is a good indicator that increasingly the Data Scientist of the future will be expected to have superior predictive and prescriptive analytics skills. The global interest in advanced analytics has shown a meteoric rise in recent years with the emergence of Big Data and Machine Learning technologies.
When Gartner discussed advanced analytics platforms as a comprehensive data modeling solution, the implication is that future Data Scientists must show proficiency in using these platforms to their best advantage.
The Bloomberg article Black Belts in Data reflected that as the US will face a data talent shortage by 2018, it is imperative that industry and academia should jointly explore solutions to counter this demand-supply gap in Data Science.
The KDNuggets article discussed above notes that 85% of Data Science jobs posted on KDNuggets in 2014 originated from the US, and the rest originated in countries like Germany, UK, Switzerland, Canada, China, and India among others. In these job postings the desired skills that featured consistently were research acumen, design knowledge, business analytics, Machine Learning, statistics, and data mining.
A Gartner news piece published some years back predicted that by 2015, 4.4 million jobs in Information Technology will feature around Big Data, and every Big Data-related job created in the U.S. will trigger employment possibilities for three people outside of IT. This article also points out that the serious lack of talent in the data industry will leave almost two-thirds of the Data Science jobs left unfilled.
The Burtch Works Data Scientist Salary Study contains useful information on Data Science salaries and demographics that is quite extensive.
The Data Scientist in a Team Environment
PWC offered an interesting debate about whether it is really possible to find such a wide variety of skill sets in a single person? This article put forward a very interesting observation, whether an employer should look for all the desirable skills in a single candidate or whether developing a strong team made up of several Data Scientists with distinct skills was preferable. Many companies have proved that the best Data Science projects have resulted from teamwork. The focus on building an ideal Data Science team involves numerous elements.
In conclusion, it seems that the wide variety of expert findings seem to collectively suggest that Data Science skill development ought to be taken as a team effort, rather than expecting one candidate to be a “Jack of all trades.”