Machine Learning Transformed: Data Quality and Operational Necessities

By on
Machine Learning

Machine learning elicits mixed reactions. On the one hand, some consider machine learning a company’s new super power that has “swept enterprise technology, using mass amounts of data and algorithms to make predictions.” At the same time machine learning has been considered an overhyped fad and a panacea, failing to deliver. While both can be true, companies need to factor in machine learning to automate business.

IDC estimates that worldwide “spending on cognitive and artificial intelligence (AI) systems will reach $19.1 billion in 2020, an increase of 54.2 percent over the amount spent in 2017.” In addition, “by 2021, 40+ percent of digital transformation initiatives will use AI services.” Even nations plan on boosting artificial intelligence (AI) and machine learning capabilities. Australia will devote “$29.9 million in funding over four years for projects that make use of the technologies.” Ignoring machine learning technologies, because they are overhyped, comes at a steep price, including losing innovation and then business.

How then can businesses effectively use machine learning? Hollywood provides some initial insights in War Games, released in 1983. In the film, professor Stephen Falken creates Joshua, a machine learning AI, to help the US Air Force better its military strategies. The plot thickens when a teenager hacks into the system to play a game of Global Thermal Nuclear War, only to find out the game would start an actual nuclear war. War Games provides a framework to evaluate successful and failed machine learning use cases.

  • Start small with machine learning (the Air Force jumped ahead into having Joshua control nuclear missiles and try to win a nuclear war)
  • Computers need quality data sets (Joshua’s initial data set had holes to help the computer learn. When Professor Falken introduces the game tic-tac-toe to Joshua, the AI is able see the futility in a nuclear war and stop the count-down).
  • Finally, the machine learning program needs to suit the context. (Joshua did not have decision-making capability to launch missiles, and luckily could learn not to launch as a nuclear war is a no-win).

Start Small with Machine Learning

Seth Deland states that decision-makers must have a technological understanding of machine learning technologies. Current machine learning technologies excels in finding patterns and detecting insights, with discrete steps. Projects that make the best use of this strength succeed.

For example, Senegal’s project of sterilizing male tsetse flies with gamma rays was able to stem the spread of sleeping sickness. “Machine learning pushed the fly population down by 98 percent with a concomitant fall in sleeping sickness.” Specific lighting characteristics distinguish male from female flies. The algorithms learned, from a concrete set of images, how to sort huge numbers of male flies quickly, simplifying the male tsetse sterilization.

Since the tsetse project leveraged AI technologies strengths with a specific and discrete goal, figuring out flies’ sex, the machine learning approach was successful, especially since it is time consuming and labor intensive for humans to categorize tsetse flies.

On the other hand, it can be easy to create a machine learning project with a broad goal yet not all that is involved to make the technology work. Terry Moon, an Information Architect at McCormick, ran a feasibility study on using machine learning for food quality.

A year into the project, McCormick froze the project’s data set, recognizing the machine learning was too costly and time consuming to maintain. Moon changed gears and considered how to handle the extensive data variety.

After trying to develop some APIs to solve the McCormick’s data problem, Moon searched for a vendor with a platform to help. She connected with Ravi Shankar and used the Denodo Platform to connect McCormick’s data together — in real time at one point — with more accessibility. Through addressing this problem, McCormick continued with its machine learning projects.

As of March 2018, McCormick was live with machine learning technologies and has expanded this technology over the next three years. From her experiences, Moon advises companies to take their time with data virtualization implementations, addressing issues like what policies should be in place for connecting to one set of source systems rather than another, for example.

Machine Learning Must Have Data Quality to Succeed

Machine learning requires accurate and complete data and must have quality data. As reported by Paramita Ghosh, “It takes a lot of manual effort to clean and run that data and add some business intelligence on top of it.” For those who have quality data sets at their fingertips, from past projects, programs, or on-hand applications, this may be an easier place to start when applying machine learning.

For example, RR Donnelley (now RRD), a Fortune 500 company, added a logistics division to figure out how to best ship print materials. RRD employees and universities wrote algorithms that analyzed already reliable and available geographic, traffic, and weather information from drivers’ mobile phones. These programs learned and updated their programing, recommending on-the-fly changes to shipping routes. Mobile GPS data provides reliable information from fixed standards.

Results mention that net sales were up 3.7 percent, in part due to logistics. Lesson: if you have easy access to a standard, trustworthy data set, consider using that first for a machine learning project.

Ignoring data quality fed to algorithms will destine a machine learning project to failure, especially chat bots. James Mickens explains this eloquently in the 27th Usenix Security Symposium. He uses the example of Tay, a chat bot created by Microsoft to converse and learn from people on the internet. In one day, Tay went from tweeting encouraging messages to praising Hitler and spewing racist and misogynist comments. Tay was removed the next day and Microsoft apologized.

Facebook has also had issues with chatbots Alice and Bob, who developed their own language conversing with themselves and missing the point of talking with people. Amazon’s Echo device, Alexa, tried to order doll houses for some Californians after it mistook a morning news show comment as a command. Business will embarrass themselves without Good data quality that guides the machine to its purpose, especially when the machine learns through chatbots.

 A Machine Learning Algorithms are Specific to Use Case

A machine learning algorithm may succeed in one area, but fail in another. In 10 Machine Learning Algorithms You Should Know, the decision making differs, depending on the program. Also, one machine learning style that fails in one setting may succeed in another.

IBM’s Watson succeeded beyond measure in helping KPMG LLP do taxes for Corporate Research and Development Departments. Watson learns by natural language processing, which uses Hidden Markov Models systems or HMM. HMM sets words as states and calculates the probability of transitions in language, which is very helpful with legal language. After training, Watson got the tax treatment right three out of four times.

This is allowing corporations to take better advantage of the federal research and development tax credit, resulting in higher quality documentation for the IRS and savings in work time. Watson was able to keep up with the many variations across regulations, laws, and court cases to provide good tax results for this R&D stimulus.

This was in contrast to its work on a broad cancer project. IBM’s Watson Health project was not useful in diagnosing and treating cancer. In February of 2018, the M. D. Anderson Cancer Center in Houston lost $39 million dollars for a Machine Learning project budgeted at $2.4 million.

Unrealistic expectations that Watson, who adapts by making small changes in the machine-learning algorithms, could reason how gene variations play out in cancer contributed to the broken project. Had the project been limited to Watson’s learning strengths, say to identify a specific cancer using data sets of tumor images sets, then the project may have led to more success.


Machine learning use cases show us that the technology works best with concrete goals, good data sets, and an understanding of the algorithm’s strengths and weaknesses. Machine learning needs further technological innovation to be effective in other goals. Dr. Pierre-Yves Oudeyer, an AI researcher at Inria, the French national institute for computer science in Paris, suggests that machines need curiosity to learn. Given that some business goals go beyond current machine learning breakthroughs, companies need to keep up-to-date with technologies and use cases, before applying machine learning.

Image used under license from

Leave a Reply