The history of Machine Learning stretches back decades, though for many industries and businesses, it remains an untapped technology. As nearly every sector and company finds itself increasingly awash in growing volumes of data, however, the likelihood that more organizations will be testing the Machine Learning waters grows.
“We are getting to the point where almost every organization will need to get more familiar with Machine Learning,” says Pedro Domingos, a professor of Computer Science at the University of Washington. Mr. Domingos, who also conducts Coursera classes on Machine Learning, recently authored The Master Algorithm: How the Quest for the Ultimate Learning Machine Will Remake Our World.
The technology has already proven its worth in fields like finance, with neural networking carving out a slot starting in the 1980s to better predict foreign exchange and stock fluctuations, and improve credit scoring, he points out. And of course, big name Internet companies – Google, Facebook, Microsoft, and Amazon – have turned the technology into a huge competitive advantage, driving more sales with recommendation systems and adding revenue with smarter advertising.
Mr. Domingos spoke with DATAVERSITY® about Machine Learning opportunities that lie ahead, and how companies – even those just beginning to figure out the landscape – can get off on the right foot.
DATAVERSITY (DV): How is that some companies/industries got so ahead of the Machine Learning curve?
Pedro Domingos (PD): Different organizations are at different stages. Some are really using Machine Learning for just about everything and sometimes in very advanced ways.
Finance provides a really good example of why an industry got way ahead. It meets all the criteria that make it a great candidate for using Machine Learning. It’s a very data-driven industry to start with and also extremely competitive. So, being able to construct non-linear models with neural networks versus assuming linear regression models can be the difference between winning and losing. Also, the industry has lots of money.
Web companies also meet the conditions: They have vast amounts of data to mine and Machine Learning helps them use that data to personalize advertising and make money that way. And again, the money was there to invest in this.
On the reverse side, an example of an industry where Machine Learning made some inroads – but not as much as it could so far – is [physical] retail. Walmart was a pioneer in using Machine Learning to decide what products to stock and how to lay them out on the shelves. But, compared to finance or the Web, retail has small margins and you’re also dealing with physical spaces where things move more slowly and it’s harder to adapt. It’s harder to [use what you learn from Machine Learning] in a physical store than in an online one, because it could mean changing store layouts so it’s more difficult, costly, and slow.
DV: As more industries get on board, or move beyond the early steps they’ve taken, what should they know to make their efforts successful?
PD: There are many things to know that often are not found in textbooks and are not obvious. And if you don’t know about them, they can really send you down the rabbit hole. I talked about these in an article a few years ago.
The first thing to keep in mind is that in Machine Learning it’s generalization that counts. It’s about generalizing from what you have seen, to predicting based on what you have not seen. It’s not about summarizing the past. The test of whether its working is when a new customer comes in that you haven’t seen before or an existing customer with new things to buy comes in, that you can predict what they will do, or when you see a new email, you can predict if it is spam or not.
The temptation is to test Machine Learning by how well an algorithm modeled the data you have, but that’s just a matter of storing data on disk and retrieving what you remember. The whole question you want to answer is can you predict new things.
Another one is the problem of overfitting. It’s very easy to think you are doing very well when all your Machine Learning model really has done is hallucinated patterns that don’t actually exist, that are just part of a random dataset you have. A lot of Machine Learning methods aim to combat this problem.
Another issue is that people don’t always understand that data is not enough. Yes, Machine Learning is about data, but you have to think about what additional assumptions you are making and are they the right ones or the wrong ones for your problem. It’s heartbreaking how often people subscribe to a particular school of thought about a Machine Learning technique and try to solve their problem using that technique because they were sold on it. They may stay with it over months or even years before they admit that it is the wrong one to have chosen for their problem.
I try to teach students what the main techniques are and the pros and cons of each so they can make the right choice for their problem. At the end of the day, if you care about the problem, you have no choice but to try different techniques, but at least you can try based on good knowledge.
DV: Can you tell us more about how Machine Learning techniques break down?
PD: Of course, there is supervised and unsupervised learning, but the best way to break things down is to look at Machine Learning in two dimensions. First, there are five main paradigms:
- Connectionism: which includes neural networks and deep learning algorithms
- Symbolism: which includes learning rules and decision tree algorithms
- Evolutionary computing methods: that use genetic algorithms to simulate evolution
- Bayesian learning: using probabilistic inferencing algorithms
- Analogy-based learning: which use techniques like nearest neighbor and support vector machines
There are thousands of Machine Learning algorithm variations for each of these that exist already and hundreds more are published each year. It’s easy to get lost in that jungle. So the second dimension is this: A good way to make sense of the lay of the land is to consider that any algorithm has only three components: Representation, which is the formal language used to represent what was learned; evaluation, which is how you measure or score different models to decide which one is better; and optimization, which is the search process by which you find the highest-scoring choice.
So, now if you have a Machine Learning application in mind, what you have to figure out is just which representation, which evaluation and which optimization to use. There are only a small number of major choices for each of those, rather than thousands.
DV: Just generally speaking, can you give us some examples of what Machine Learning approaches might work best for some use cases?
PD: Sure. Take recommender systems for movies or books. The best approach there is analogy-based learning and nearest-neighbor algorithms. I recommend a movie to you that you haven’t seen by finding people with similar tastes that you have exhibited and seeing what movies they rated highly that you didn’t see.
For stock market predictions, neural networking is good because the stock market is very noisy and neural networking lets you capture some nonlinearity. In retail, it’s usually rules-based; put these two products next to each other because the probability is that if you buy this you are going to buy that as well.
DV: How to best set up an organization to leverage Machine Learning?
PD: That’s a very important question. If you can hire good talent you should – but good talent is very scarce. The demand exceeds supply. The good news is there is a lot of low hanging fruit out there. You can put fairly basic Machine Learning techniques to work in a project that runs for about six months with a couple of people and still see a huge payoff.
For most organizations, particularly those just getting started, it’s important not to get seduced by the latest and greatest Machine Learning concepts you hear about, like Deep Learning. Yes, it’s good but it’s hard to use, requires experts to use it and you probably don’t need it yet. A decision-tree algorithm can be fine and still get you a big payoff. And once you get the return you will have the confidence to go to less low hanging fruit and more complex types of Machine Learning if you need to.
But for most people, even if they just walk through every process of their organization and think about applying simple Machine Learning to it, they would never run out of things to do. Naïve Bayes classifiers or nearest-neighbor might not be very sexy algorithms, but they don’t take a lot of time to get going and often they are as good as anything else.
DV: Terrific. But when you look to the future of Machine Learning, what do you see?
PD: One of the main things I’m working on is unifying the different paradigms. The goal of a lot of my research and that of others is to have a grand unified theory of Machine Learning such that you can have one algorithm that can do everything. A master algorithm is to induction the same thing the Turing machine was to deduction.
We are making progress. My book, The Master Algorithm, is essentially about this. Currently we can unify two algorithms at a time, but then it will be three and then finally five. What that would mean is that when working to discover which Machine Learning algorithm to use, you can just try many variations of one algorithm instead of trying many variations of main algorithms. And once we have a unified algorithm, we’ll be able to do things we can’t do today, because they require the capabilities of more than one of the current paradigms.