Get ready for the big changes that Deep Learning will bring to many sectors. Machine Learning in general and Deep Learning in particular are quickly advancing concepts right now, according to Dave Sullivan, primary architect at Ersatz Labs, which provides a browser-based unified Machine Learning environment with support for Deep Learning, model and data visualization, and GPU computing, among other capabilities. And the next six years in the software industry, he predicts, will be about “taking these kinds of concepts and applying [them] to software that exists already, making it work better and more efficiently.”
Sullivan, who provided this perspective at Dataversity’s recent Cognitive Computing 2014 conference, explained that Deep Learning is “all neural networks,” which are “theoretically and fundamentally better at drawing separations between really complex data.” And he sees the data-rich worlds of healthcare and finance as among the early verticals that will experience real-world impact from these concepts.
“Medical imaging is a slam dunk for Deep Learning,” he explained by way of one example, as it profits from the technology’s advanced ability to pick something specific out of a bunch of pixels. That, he said, is a good example of a non-linear problem, the type that neural networks are good at solving. So, healthcare organizations can use Deep Learning to find abnormalities such as tumors in MRIs, mammograms, and so on.
From Ersatz’ web interface, for instance, users can upload their image data, and train and use its provided models, such as convolutional neural network architectures that are well suited for visual tasks relating to classification and feature extraction. “If you are working on anything where you’ve got to pull information out of an image you should really try convolutional nets,” he says. Indeed, it’s an appropriate approach for pulling information out of audio, for music analysis and recommendation applications, he notes.
Financial data stream problems, such as understanding how assets move in relation to each other if one or another event takes place, also can get an assist from leveraging the technology. In these cases, and others (from marketing to energy), Ersatz believes there’s opportunity for organizations to leverage Deep Learning technology to build machine intelligence directly into their apps, for better discovery and predictive analytics.
What’s Ahead for Deep Learning
Going forward, Deep Learning also will have an impact on Natural Language Processing problems, helping with disambiguation, translation, word vectors and sentiment analysis, Sullivan believes. “There’s tons of text to analyze, and… text is really where the cutting edge of Deep Learning is right now,” he says. Expect one of the impacts to be that personal assistant applications will get a lot better.
That sounds great and very sane, but Sullivan also talks about other manifestations of Deep Learning that he admits sound crazy but, at least according to research, are showing themselves to be possible. Chief among them: Zero Shot Learning, where systems can figure out, just by being shown an unlabeled picture of an object, what that object is based on information gathered from ingesting Wikipedia. It is based on work that has been done by Google in word2vec that unsupervised, learns every word in Wikipedia and how it relates to other words, and resulting dense word vectors that deliver order and meaning that holds up across different words. (So, given a picture of a cat, it can output the word vector for the word cat that was learned.) This concept, he says, can extend beyond words and ultimately to sentences, paragraphs, and documents, too. (See the video here for the details.)
“I really believe in my heart that this is where we are going to see the whole NLP area, the whole learning about things simply by some inference – we’re not really sure how it gets there but it seems to work in practice,” he says.
Going Deep into Deep Learning
From a usage standpoint, Sullivan points out, Deep Learning is part of the overall Machine Learning pipeline, even though the former’s architecture and inner workings, he believes, are more complicated to implement. So, it’s important to be sure that the scenario is suited for applying Deep Learning to solve the problem. Questions to ask to see when and where Deep Learning might make sense, he comments, include exploring what you already are doing and where are some ways that an increase in accuracy, maybe at the cost of speed of training, would be really valuable.
Sullivan also points to some aspects that mean less than they used to when it comes to Deep Learning – pre-training, the use of RBM vs. autoencoder neural networks, and Big Data. The good news is that Big Data is dealt with “really really well,” according to Sullivan. In fact, he says, “your data matters a lot …you need a lot of data” for Deep Learning. And if you don’t have it, he suggests opting for random forest Machine Learning to see what kind of results you get.
Infinite amounts of data, however, can be an issue, depending upon the hardware in use and other implementation details. Theoretically, infinity data can be fed to a neural network in mini-batches, and it will make adjustments in its mental model as it ingests them, asking for the next set of examples to try to learn distinctions from as it goes along. But beware of catastrophic forgetting: “If there is not enough capacity in the neural network to remember the things it is trying to remember, it starts to forget what it learned some time ago,” he says – a real problem if what it forgets was very important to remember.
Increasing the capacity of the neural network can help that, but at some point you will hit a Graphics Processing Unit (GPU) memory bound, which will require parallelization to overcome – possible, yes, but there are depreciating returns, Sullivan says. “You can say you want a bigger model spread across multiple GPUs, and have a different piece of the model on each GPU, but it doesn’t scale linearly,” he says. Algorithmic advances are needed to fix that, he says.
Sullivan is pretty confident, though, that GPUs – the same graphics cards that you play video games on – are the way to go over CPUs when it comes to Deep Learning hardware requirements. He explains that GPUs, which Ersatz leverages for training models, are 20 to 40 times faster than CPUs for neural networking applications, because they are good at matrix multiplication. That’s what a neural network does all day long, “so whatever is fastest at doing matrix multiplication and can hold the most stuff in memory, the most added memory, is going to win, [and] right now the most happy medium are GPUs,” he says.
He also asked attendees at the event to think twice about the efforts some are making towards building FPGAs (field-programmable gate arrays) for Deep Learning. As he sees it, Deep Learning has stabilized and realized many good gains: It’s ready for industries to take advantage of, but change is still in the wind, including around some of the more basic ways to train algorithms. There will be a time lag of a couple of years between designing FPGAs and getting the hardware into production, time during which Deep Learning will also have evolved and GPU speed will have accelerated, he points out.
Also, Intel, AMD, and Nvidia will be all over the space, “basically making it so that custom hardware is not the way to go. General-purpose hardware seems to win the day,” he says.
Sullivan encourages anyone interested in the technology, which will have a major role to play across a variety of industries, to go home and play, getting more comfortable with a range of other concepts critical to Deep Learning, as well – such as dropouts, the Nesterov Momentum, and hyperparameters. Those all may sound a little off putting, and he admits that it’s not easy to get started in this area. “But,” he adds, “it’s easy enough.”