Many people remember IBM’s artificial intelligence supercomputer, Watson, when it appeared as a contestant on the popular game show Jeopardy! in February 2011. Arrangements for the computer to appear on the show went as far back as 2008 when IBM representatives contacted Jeopardy! executives to arrange for Watson to play against two of the show’s most successful contestants. The project would demonstrate the ability of a computer to not only participate in a game with human beings, but also, as IBM engineers hypothesized, beat its human competitors.
And it finally happened. After the first round of playing with the highest winners in Jeopardy!’s history, Watson won its first match, accruing $35,734 against Rutter’s $10,400 and Jennings’ $4800. After the second round, Watson did it again with a win of $77,147 against Rutter’s 21,600 and Jennings’ $24,000. Watson won the final round and ultimately the $1,000,000 award for first place; IBM donated half of the prize money to World Vision International and the other half to World Community Grid.
But how does Watson really work? Unlike computers that follow relatively simple algorithms to play games like chess or poker, Watson must attempt to understand the rules of language which are far more complex.
Watson’s essential function was to parse keywords against a massive amount of stored data. Using software that included C++, Java, Prolog, and through Apache’s Unstructured Information Management Architecture (UIMA), Watson was able to source a wide spectra of information, specifically “encyclopedias, dictionaries, thesauri, newswire articles, [and] literary works” installed by IBM. During the Jeopardy! Challenge, Watson would be offline, so it was important for IBM to provide the computer with as much information as possible. For the Jeopardy! Challenge, Watson had to parse all of its information within 3 seconds.
To meet this need, Watson was provided with 200 million pages of structured and unstructured content consuming four terabytes of disk storage; these included “databases, taxonomies, and ontologies” such as “DBPedia, WordNet, and Yago” as well as the full text of Wikipedia. In order to process this information quickly, Watson used “90 IBM Power 750 servers using 15 terabytes of RAM and 2,880 processor cores” to play the game.
Explaining, IBM told DATAVERSITY™ that:
“Watson’s predictive analytics capabilities help organizations analyze all of this data to spot trends that would otherwise go unnoticed. Watson’s natural language processing ability helps organizations understand the meaning of information in a particular context. The data explosion is happening now and Watson benefits organizations by helping them extract actionable insights from unprecedented types and quantities of data.”
But, many outside of the Big Data game might be asking “So what?” when it learns a supercomputer can win a game of Jeopardy! For the world of Big Data, Watson’s win means that humans are not only getting better at understanding artificial intelligence, but also how to store, process, and analyze massive amounts of unstructured data very quickly. This ultimately means that projects that require Big Data solutions – everything from predicting epidemics to analyzing intelligence data – is becoming more efficient. According to the IBM Watson team, “This ability to ‘think’ means Watson can benefit organizations through its unparalleled ability to essentially ‘learn’ from information generated from a variety of sources.”
This is precisely what was discussed in a June 2012 presentation given by Tony Pearson, IBM’s Master Inventor and Senior Managing Consultant, titled, “IBM Watson: How it Works and What it Means for Society Beyond Winning Jeopardy!” While extremely powerful, Watson had limited performance. As noted by Pearson, Watson was English only, performed for a single questioner per system instance, had a 3-second response time for static content, utilized unstructured text, and required training data. Nevertheless, Pearson foresees the machine operating for multiple, varied users with more dynamic content updates, varied training data and response times, and for additional languages.
One of the societal problems outlined by Pearson involved some of the rather complex challenges faced by the healthcare industry. In the presentation, Pearson noted that medical information is doubling every 5 years, much of which is unstructured. Further, 81% of physicians report spending 5 hours or less per month reading medical journals. Combined with the statistics that 1 in 5 diagnoses are estimated to be inaccurate and incomplete that lend to the 44,000-98,000 Americans dying each year from preventable medical errors in hospitals, Pearson believes a computer like Watson could help mitigate these problems. Indeed, what would happen if Watson could leverage every medical journal that exists against one patient’s medical history?
The possible outcomes are too important to pass up. As of March 2012, Watson is working diligently in New York City’s Sloan-Kettering Cancer Center, “absorbing the latest knowledge in oncology research from one of the top cancer hospitals in the country.” This is a part of the required training data the computer needs before it begins to handle the petabytes of cancer research human being have been collecting for years. According to reports, “Watson will be fed past and recent cancer research and – with permission – individual medical records. Then it will be tested with more and more complicated cancer scenarios and assessed with the help of an advisory panel […] expected to speedily suggest diagnoses and treatments, ranking several alternatives.”
One of Watson’s first tests was with a rare eye problem that can sometimes result from a patient diagnosed with Lyme disease. Doctors at the research center jumped on the opportunity to test Watson, and during a demonstration, “read Watson a case about a patient with eye problems and history of arthritis.” Watson returned with a diagnosis of Lyme disease with 73% certainty. In less than an hour, Watson was 73% sure of something that took years for doctors and medical research to figure out.
On 17 May 2012, the University of Rochester announced the three winners of a competition that challenged students to harness the Watson technology “to solve daunting societal and business challenges.” After months of researching and tinkering with what Watson could do, three teams of students won placement in the contest that will receive research awards through IBMs partnership with the university.
The first place was awarded to a team of students that used Watson technology to analyze weather data to help “organizations better prepare for a crisis administration and allocate resources accordingly.” The second place went to a team that used Watson to explore “profit margin, consumption rates and opportunities for exploration of oil, gas and mineral reserves” in conjunction with data to help figure out optimal exploitation areas against environmental impacts and regulatory and safety information. The third place was awarded to a group of students interested in marketing and developed a case study called “Unpacking Big Data Improves Travel Experience” that used Watson to “quickly analyze massive amounts of unstructured information in order to enhance security, reduce wait times and improve the travel experience in airports while taking the guesswork out of the customs process.”
In March 2012, Bloomberg reported that Citigroup, the third largest U.S. lender, became Watson’s very first banking client. IBM is extremely interested in putting Watson to work for the financial industry to test Watson’s ability to read complex financial and economic data that change rapidly. It also generates billions in revenue for IBM. It has been reported that Watson’s work with Citigroup to projected to “identify risks, rewards and customer wants mere human experts may overlook.”
IBM understands that the financial industry is more interested in outcomes than understanding how Watson works or why its software and functionality is so important in Big Data analysis. As such, IBM markets the technology as a superhuman forecaster. Manoj Saxena, the IBM manager tasked with finding work for Watson told Bloomberg:
“Watson offers a ‘more global’ picture by looking beyond financial data. For example, Watson can comb 10-Ks, prospectuses, loan performances and earnings quality while also uncovering sentiment and news not in the usual metrics before offering securities portfolio recommendations. It can also monitor trading, news sources and Facebook to help a treasurer manage foreign exchange risk.”
Market analysts expect Watson technology to provide Citigroup with “$2.65 billion in revenue in 2015, adding 52 cents of earnings per share.”
Experts on the IBM Watson team told DATAVERSITY that:
“Specifically, Citi will evaluate ways that IBM Watson technologies can help analyze customer needs and process vast amounts of up-to-the-minute financial, economic, product and client data. Hypothetical scenarios include, “How much money do I need to retire?” or “Should I reshuffle my investments given the volatility of the world markets?” Imagine getting an expert, personalized response in just a few seconds time. Though that scenario is not possible today, it could be in the not-too-distant future with the help of Watson.”
As Watson continues to learn and its technology evolves, global industries across the board might soon be using the platform to analyze, diagnose, solve, or forecast major issues of importance. As with any new technology, the ethics of how to use Watson will likely be discussed as the computer can be harnessed to identify health problems, save lives, and maximize profitability for a given business by knowing how many employees can be let go. As shortfalls with Watson will likely be worked out over time, the technology might just be able to analyze the vast amount of unstructured data that has been collected in the last three decades, providing humanity with an image of the world we yet to completely visualize or truly understand.