Click to learn more about author Henry Bequet.
Every year scientists and researchers gather at a conference called Super Computing, or SC, to exchange their views, solutions and problems in computational science. At SC17, there were no fewer than 22 presentations and keynotes involving Machine Learning and Deep Learning (DL). There were actually many more presentations about DL, because it is often the motivation for Hybrid Architecture (more on this later). This is remarkable if you consider that the year before, there were a grand total of two DL presentations. In other words, it appears that we are quickly moving into the era of data-driven programming.
This type of evidence from the world of science can also be seen in the economy at large. The Internet of Things (IoT), for example, is an emerging new industry that exists only through the data that it can collect: without sensors and data, there would be no IoT. There are many more examples of data-intensive industries that regularly appear in the news, from self-driving cars to automatic translation to predictions of shoppers’ behavior.
From Tasks to GPUs
In the last 40 years, we have been looking at multiple paradigms for our Analytics as part of our DNA.
In the ‘80s, we ran single-threaded Analytics. That gave us some performance numbers that we didn’t quite like, so 20 years later we graduated to parallelism with multithreaded and multiprocess executions. We saw orders of magnitude of performance improvements by going parallel. That speed increase came at a cost: complexity of development. By following the principles of task-based development and by using a many-task computing framework, we could tame that complexity and be productive by focusing attention on our problems rather than the mechanics of multithreading and multiprocessing.
The organization of our code into tasks has its limits in terms of parallelization. One of the major limitations is the number of CPU cores, which as of this writing is in the hundreds for a single machine, not in the thousands or millions. The general-purpose graphics processing unit (GPU), with its thousands of cores, addresses this limitation. Alas, the complex synchronization of multiple threads of execution using CUDA consumes a great deal of development resources. This programming complexity implies that CUDA is not the ideal tool for the Data Scientist or the statistician.
Training and Inference
While we were searching for a simpler model than CUDA for Data Scientists, we realized that we could use DL to train a Deep Neural Network (DNN). That gave us reasonable approximations of our Analytics. Scoring with a DNN for this approximation, also called inference, gave us another order of magnitude of performance improvement for our Analytics. Deep Learning for numerical applications was born.
We didn’t worry about the time it took for training, because we didn’t need to perform it very often.
Once we had DL4NA, we realized that many use cases were applicable for this methodology. Consider the use of DL4NA for approving loans over the phone or using a website. Many regulations govern the approval of a loan. For example, approving loans must be fair. You don’t want to explain to regulators that you approved the loan for Jane, but not for Sue, when they have almost identical financial risk profiles. Another important consideration for banks is their overall risk management profile (across all loans held by the bank). By approving this loan, does the bank expose itself to a higher risk? For example, does the bank have too many loans in one geographic area? The risk profile must be calculated across an entire portfolio of loans, not just for one loan at a time. So, for a bank with thousands or even millions of loans, this process of calculating the risk profile takes minutes, if not hours. This is clearly not desirable when you’re waiting in your browser for an approval. This problem is solved by DL4NA: You train a DNN to approximate your decision of approving or denying a loan application in a couple of seconds or less. Clearly, DL4NA makes the decision on a loan application fast, but it will probably not satisfy a regulator, since DL doesn’t easily give out its secrets. To meet regulatory requirements, the bank can run through its entire risk profile once at the end of the day, not every time a new loan application is considered.
In the example above, we used GPUs to run the training and the inference. Other devices are available to perform the same tasks.
This marriage of CPUs and other devices is called a Hybrid Architecture. We believe that this is the way computers will be built in the future. This means that if you want to deploy your Analytics so that they run as fast as possible, you must embrace Hybrid Architectures.
The good news is that with DL4NA, you are well-armed to deploy your Analytics on Hybrid Architectures.
We should point out that even though this post has emphasized the great performance gains with Hybrid Architectures, you can also benefit from a wider platform to deploy your Analytics. With DL4NA, you can deploy any Analytics program that you can approximate using a DNN to any of those devices. In particular, you could run an approximation of a SAS, MATLAB, Octave or a Python program in an IoT device, thereby pushing your Analytics to the edge of your network.