The Innovative Data Scientist: Overcoming the Big Data and Data Management Divide

By on

data scientistIn an ideal world, a Data Scientist is expected to demonstrate a rare combination of technical, analytical, and communication skills that make them sought after data innovators. In the real world, Data Scientists arrive from a wide variety of academic majors like mathematics, statistics, economics, operations research, or computer science.

Additionally, the roles they assume in the workplace range from Data Analyst to Data Engineer, with many other possible job titles in the middle. While Data Scientists must demonstrate deep knowledge of mathematics and statistics, Data Engineers rely more on engineering skills, and Data Analysts leverage their communication skills and domain expertise.

The Data Science community is working to guide digital transformation through the enablement of business assets and effective Data Management practices. Such digital enablement is helping to transform businesses into data-centric and data-driven entities thriving on competitive Business Intelligence. Big Data is a major part of that initiative, and Data Scientists must seize this golden opportunity to drive digital innovations through leveraging both Big Data and legacy data.  Such a task is proving to be easier said than done.

One of the principal challenges facing today’s Data Scientists is the shortage of requisite skills and experience to transform businesses into analytics-driven enterprises. Data Scientists have a long way to go before they can extract timely, competitive intelligence from the proverbial avalanche of data. To excel in today’s business environment, Data Science professionals needs to evolve as statisticians and Machine Learning experts to unlock the possibilities of all an enterprise’s data assets.

What Are Industry Leaders Thinking Doing About the Data Science Skill Shortage?

In the podcast titled How Advanced Analytics Can Drive Productivity, two senior partners of McKinsey conducted an interview of Cecilia Ma Zecha to discuss how business data can be used in Advanced Analytics for furthering the business goals of B2B and B2C companies. What comes out of this insightful interview is that the success of Business Analytics depends to a large degree on the availability of the right talent, implementation of best practices, and focusing on the data-driven desired results.

In the present scenario, business productivity is driven by data and analytics. Bill Wiseman of McKinsey notes that the current preoccupation of many businesses is neither Data Science or Analytics, but Data Governance and the legal compliance of data activities, because compliance issues are forcing enterprises to focus more on traditional Data Management within a complex business environment. Does that imply that today’s Data Scientists should put Data Governance as their topmost priority for their ongoing training agenda? It’s an open-ended conversation in many enterprises today.

Spotlight on Data Science Skills

The Forbes blog post titled Become a Data Scientist, confirms that Data Science job postings have shown a 57% YOY growth and the demand for Data Scientists also grew about 73.5%. In this period, the Data Scientist job was ranked the most sought after job for wide career prospects, high earning potentials, and available positions. In this post, the 2017 Data Science skill trends are listed in order of priority. In the listing of required skills, data mining and statistical analysis retained the second spot, while data presentation skill features in the top 10 for the first time.

In KDNugget’s 10 Must Have Skills for Data Scientists, the author updates the original must-have skill list provided by Linda Burtch of Burtch Works in 2014. The author makes a very interesting point that the quintessential “unicorn” Data Scientist is more a myth than a reality. In the real world, Data Scientists may be covering a wide range of roles like Machine Learning Scientist, Data Engineer, Data Analyst, Hadoop expert, and MBAs with specialization in Analytics. In fact, the field of Data Science has opened up valid job roles for various technical, management, and business graduates with strong domain knowledge, communication, and interpersonal skills. Moreover, the people who are filling up Data Science positions come with computer science, physics, statistics, mathematics, or engineering majors.

As business data continue to grow in volumes and complexity, a clear understanding of data processing frameworks will become vital in succeeding in any Data Science role in the coming years. In addition, the traditional database skills that the experts used to manage structured data will no longer be good enough. The explosive growth of unstructured data is making it mandatory for Data Scientists to understand unstructured data.

The Most Comprehensive Data Science Learning Plan for 2017 claims that a detailed learning plan for Data Science, Machine Learning, or Deep Learning can easily convert followers into students. The new learning plan takes the confusion out of the learning process.

Evolution of New Skills for Data Scientists

Newer technologies such as Cloud Computing, Big Data, Hadoop, and Data Visualization have empowered Data Scientists to “wrestle with data variety and volume.” But, many articles indicate that though statistics, Machine Learning, or computer science are germane to Data Scientist roles, softer skills like date presentation, communication, and marketing skills may also be equally important for winning business cases.

Here are some real issues and questions facing the Data Scientists moving forward:

  • While Data Science graduates ease into the various roles of Data Scientist, Data Analyst, or Data Engineer, are there enough academic programs with focused training for each of these roles?
  • While academic degrees in Data Science, Computer Science, or Statistics may be significant for success in jobs, who will provide training for Big Data, Machine Learning, Hadoop, or MapReduce?
  • Training gaps is training are directly related to performance gaps. Unless academic program address those issues, how will the industry receive productive Data Scientists from day one?
  • The Data Scientist’s professional journey has to be balanced with ongoing learning opportunities. Are the global industries prepared to offer that learning to working Data Scientists?

In the ideal condition, academics and industry will collaborate and find solutions to this ongoing talent gap and performance gap facing the Data Science community. Also, review the DATAVERSITY® article titled Comprehensive Review of Skills Required for Data Scientist Jobs.

Data Science Training Trends

The article Data Science Predictions for 2017 seems to indicate that in the last decade, business operators have just scratched the surface of data technologies. McKinsey predicted in 2013 that the global business community would suffer the acute shortage of Data Science professionals around 2018, specifically a shortage of “1.5 million analysts” who routinely capture rare insights and competitive intelligence from vast amounts of data. The technological barriers that have so far limited the power of Data Science are gradually disappearing, and this year the global Data Management industry can expect some major upheavals in Data Science training practices.

This year will likely also witness a big boost of Big Data investments, with cheaper storage and processing options on the horizon. Hadoop will remain a mainstay of Data Analytics activities as it is ideal for ETL, Data Mining, Data Warehousing, IoT, and Predictive Analytics. Big Data, Hadoop, and Internet of Things will continue to evolve through 2017. Thus, the Data Scientists can look forward to newer and better growth opportunities.

The post titled Data Science Skills states that CrowdFlower examined almost 3500 Data Scientist job postings for every level on LinkedIn. After reviewing basic skills common to all posts, they concluded that along with both traditional database and NoSQL skills, the new generation of Data Scientists will need exposure to MapReduce, Hive, Pig, and related technologies for getting the best from their jobs moving forward.

The KDNugget’s post titled Data Science Trends for 2017 notes that as Big Data and IoT invade the entire digital landscape, businesses will continue to derive direct benefits from their data assets. While giants like Google, Amazon, and Facebook continue to fight the race for Artificial Intelligence (AI), this year many enterprises may induct fresh Machine Learning talents in their Data Science departments. Data Science and Artificial Intelligence are coming closer together, so Data Scientists with AI knowhow or some AI training may become top bets for prospective employers. Another golden career prospect for Data Scientists is the healthcare industry. Data Scientists interested in working with healthcare or clinical data will suddenly find themselves in a midst of many job openings.

How America Is Coping with Data Science Talent Gap

Datanami’s Tracking Data Science Talent Gap takes stock of the talent gap situation in the Data Science industry and makes the following comments based on observations made by industry experts:

  • American academia is busy developing PhD level programs in Data Science. American universities are already offering dozens of computer science programs with concentrations in Data Science.
  • Many American campuses have been agile in offering master’s level programs in Data Science as well. A good example is the University of Michigan, which announced its five-year plan to invest $100 million for launching the Michigan Institute for Data Science (MIDAS).
  • Many Data Science boot camps lasting 12 weeks have sprung up to counter the current shortage of Data Scientists. One example is the 12-weeks long, DS12 program offered by a Los Angeles-based Data Science consultancy firm.
  • Some accelerators are also offering focused crash courses in specific knowledge areas to help fill the available jobs.

In Gartner Says the Age of Citizen Data Scientist Is Dawning, the articles discusses how by 2020 the high proportion of automated tasks will enable citizen Data Scientists to bridge the gap between Self-Service Analytics and Advanced Analytics. On can hope that this trend will successfully mitigate the vacuum created by lack of Data Science talent by empowering mainstream business users with automate, Advanced Analytics capabilities.

Echoing these sentiments, Gartner 2017 Prediction for Analytics Strategy and Technology predicts that increased automation of Data Analytics will make both professionals and citizen Data Scientists more productive. Though this trend signals overcoming the current skill gap in data technologies, it also signals a need for more Data Governance, Data Quality, and other traditional Data Management practices within enterprises, otherwise all the Data Science in the world will not help them to overcome their already existing data challenges.

Photo Credit: Khakimullin Aleksandr/

Leave a Reply