GET STARTED BUILDING A DATA GOVERNANCE PROGRAM
Learn how to develop a successful Data Governance framework and operating model with our online training program.
Click to learn more about author Kevin W. McCarthy.
Not to date myself, but by my childhood timeline, we should be about 15 years past flying cars, and just about coming up on the robot apocalypse. There was a time when everyone assumed that automation would eventually take over everyone’s lives, and as you see people walking into traffic staring at their phones, you might think we’ve almost arrived. It’s true, we’ve come a long way with “thinking machines” (for example, I still get creeped out when my iPhone correctly tells me how long it will take to get to my destination when I get in my car BUT I NEVER TOLD IT WHERE I WAS GOING), but there still some misconceptions around the definitions, uses, and the all-important fuel that drives these processes.
Let’s start with getting our terms straight. I’m not a PhD in quantum mechanics, but I can give you the layman’s definitions of Artificial Intelligence and Machine Learning. Artificial Intelligence (AI) is the umbrella term for “smart” machines – technology that appears to make logical decisions and choices based on a series of factors and a pre-defined set of business rules. If your paycheck is direct deposited into your checking account, with a certain amount put into savings, and another amount put in your 401k, and then taxes and healthcare, on an automated basis each pay period, you are seeing Artificial Intelligence at work. The system is making those calculations and decisions of what amount goes where based on the amount you are paid and what rules have been set up to distribute the funds. And yes, this type of processing has been going on for years, but only in the last decade did we start to connotate these automated decisions as Artificial Intelligence. At its core, it’s the replacement of human decisions with machine decisions.
Machine Learning, on the other hand, is a specific avenue of AI (like science fiction is a specific avenue of literature). Machine Learning (ML) is the process of using data (in most cases, a lot of data) to understand the widest variety of scenarios to predict potential and likely outcomes. The best example I can think of for ML is the 1983 film War Games. Joshua (the computer) is about to unleash the U.S. missile stockade on the Russians, when it starts using data to run through all the potential attack and counter-attack scenarios. As it’s flying through hundreds and thousands of the possible orchestrations of global thermonuclear war, it ends up stopping the attack, after it realizes that all the outcomes lead to the same result: the destruction of the world. It’s a great scene, a great message, and a great example of what ML is supposed to do for us. ML should use as much data as possible to calculate as many failures as possible to filter down to the rare occasions of success, so that humans can simply focus their time on those. In practical terms, the reason Amazon is asking you if you’d like to buy a set of mittens to go with your coat and hat (as opposed to a kayak) is because of ML.
Despite what the movies might have you believe, the science is relatively straightforward: Set up rules, process data, validate outcomes. You’re now in the AI/ML game! There’s just one problem, and it’s a problem that most people devising these systems aren’t thinking about. How are you fueling your process? What is the data that’s going to drive the outcomes, and, more importantly, can you trust it? Companies are setting up elaborate programs to take advantage of the hype around AI and ML, and pumping them full of erroneous data! Bad data is like putting sand in the gas tank of these machines–eventually, it’s going to cause them to stall, or worse. Worse means not just stalling, but determining outcomes that aren’t valid and that could spell disaster, like Joshua realizes. That is the ultimate fear of AI/ML. What if the machines are wrong?
Data Quality needs to be a factor in any discussion around AI or ML, as it’s the data that is the single most important factor in the end results of the program. As much as you hear about Data Preparation for Data Migrations, or for Analytics, preparing data for AI/ML is going to be just as important. Especially as we are moving from structured data (balances, timestamps, sensor input) to unstructured data (social media posts, comments, descriptions), the ability to manipulate, extract, and trust the information gathered is going to come under much more scrutiny. And that means that ensuring that you have a strong data management program—with both the tools and the talent in place to handle massive volumes of data—is fundamental to success. The adage, “Garbage in, garbage out” takes on new meaning in the hyper-drive world of AI/ML. Even a little garbage in is going to taint your results, and may get lost in the Big Data oceans that companies are throwing at their AI/ML initiatives.
Even if we’re not in flying cars yet, there’s still an undeniable amount of progress that has been made with technology even in the past 10 years, and AI/ML is going to accelerate that innovation even more. By focusing on the data, or the fuel, for these initiatives, we can do a better job making sure the robots don’t break down on their way to saving the world!