Understanding GPUs for Deep Learning

By on

Click here to learn more about Gilad David Maayan.

Deep learning is the basis for many complex computing tasks, including natural language processing (NLP), computer vision, one-to-one personalized marketing, and big data analysis. Deep learning algorithms are based on neural networks, which commonly have millions of parameters that need to be calculated numerous times in order to train the model.

Training a neural network is very computationally intensive, and because these computations can very easily be parallelized, they call for a new approach to hardware. Graphical processing units (GPUs), originally designed for the gaming industry, have a large number of processing cores and very large on-board RAM (compared to traditional CPUs). GPUs are increasingly used for deep learning applications and can dramatically accelerate neural network training.

Should You Use a CPU or GPU for Your Deep Learning Project?

Here are a few things you should consider when deciding whether to use a CPU or GPU to train a deep learning model.

  • Memory Bandwidth: Bandwidth is one of the main reasons GPUs are faster than CPUs. If the data set is large, the CPU consumes a lot of memory during model training. Computing large and complex tasks consume a large number of clock cycles on the CPU because CPUs execute tasks sequentially.
  • Data Set Size: Training a model using deep learning commonly requires large data sets, which require a lot of memory to store computational tasks. The larger the data being trained in each batch of the model, the bigger advantage a GPU will have.
  • Optimization: It is easier to optimize tasks on a CPU because it has only a few cores (compared to a GPU, which has thousands). Each CPU core can execute different instructions — this is called multiple instruction, multiple data (MIMD). A GPU core, which typically consists of a block of 32 cores, executes the same instruction in parallel at a specific time — single instruction, multiple data (SIMD). Therefore, complex performance optimization techniques are more difficult to implement on a GPU than on a CPU.

Gaming vs. Workstation GPUs

GPUs are on sale for just a few hundred dollars. However, these are general-purpose GPUs built for gamers. They can process deep learning applications relatively well, but vendors like NVIDIA offer “workstation GPUs” that are more expensive but are specially designed for deep learning computations.

Let’s review the key criteria for selecting a gaming GPU vs. a full-blown workstation GPU for your projects.

GPU Hardware

It’s important to understand that both gaming and workstation GPUs may have the same physical GPU under the hood. For example, the NVIDIA Quadro P5000 workstation GPU uses a GP104 GPU, the same hardware that comes with a low-cost GTX 1080 gaming GPU. There is also no architectural difference between gaming and workstation GPUs.

In terms of memory, a key difference is that workstation GPUs offer error-correcting code (ECC) that provides higher accuracy of calculations. Gaming GPUs do not use ECC, which can make their memory faster but less accurate.


In a workstation GPU, firmware is designed for reliability and stability, while in a gaming GPU, it is built for high performance. Drivers are also different — on a workstation GPU, they are suitable for professional graphics applications like CAD/CAM, while on a gaming GPU, drivers are tuned for the latest games.


Workstation GPUs are typically used in professional settings — deep learning research, film production, CAM/CAM or AutoCAD, and animation studios. Gaming GPUs are most commonly used for gaming but are also used for all of the above scenarios in a smaller scale or with lower-budget projects.

How to Choose the Best GPU for Deep Learning?

Deep learning tasks such as training models that recognize and classify objects in images or video frames or process large amounts of textual data require a robust hardware setup.

Here are some scenarios that illustrate the hardware requirements for deep learning projects:

Image Source: iStock
  • Lightweight Tasks: For deep learning models with small datasets or relatively flat neural network architectures, you can use a low-cost GPU like Nvidia’s GTX 1080.
  • Complex Tasks: When dealing with complex tasks like training large neural networks, the system should be equipped with advanced GPUs such as Nvidia’s RTX 3090 or the most powerful Titan series. Alternatively, you can use cloud services such as Google’s TPU or GPU instances on Amazon EC2, such as the P2/3/4 series.
  • Very Demanding Tasks: For very large neural networks or large experiments involving thousands of training runs, one GPU will not be enough. To run multiple simultaneous experiments, you’ll need local GPU parallelism. In this case, you need to purchase a system designed for multi-GPU computing and configure your deep learning framework that can distribute the work between multiple GPUs.

Cloud Computing with GPUs

Increasingly, organizations carrying out deep learning projects are choosing to use cloud-based GPU resources. These resources can be used in conjunction with machine learning services, which help manage large-scale deep learning pipelines. All three major cloud providers offer GPU resources in a variety of configuration options.

Microsoft Azure

Azure offers a variety of instances that provide access to GPUs. The examples below are optimized for advanced computing tasks such as visualization, simulation, and deep learning:

  • NC-series: optimized for network and computationally intensive workloads, such as simulations and applications based on CUDA and OpenCL. GPUs are based on NVIDIA Tesla V100 or Intel Haswell/Broadwell hardware.
  • ND-series: optimized for deep learning inference and training scenarios. GPUs based on NVIDIA Tesla P40 or Intel Broadwell/Skylake.
  • NV-series: suitable for virtual desktop infrastructure, streaming, encoding, or visualization, supporting DirectX and OpenGL. GPUs based on NVIDIA Tesla M60 or AMD Radeon MI25.


On AWS, you can choose from four main instance families, each with a different hardware configuration. These are P3, P2, G4, and G3 instances. These families provide a choice of GPUs: NVIDIA Tesla V100, K80, T4 Tensor, or M60. You can scale up to 16 GPUs in the bigger instance sizes.

As an alternative to GPU instances, which can be quite expensive, Amazon offers Amazon Elastic Graphics. This service allows you to attach an inexpensive GPU to your EC2 instances. This allows you to use the GPU with regular (non-GPU) compute instances. The Elastic Graphics service supports OpenGL 4.3 and can provide up to 8 GB of graphics memory.

Google Cloud

Google Cloud Platform (GCP) allows you to attach GPUs to existing instances, similar to Amazon Elastic Graphics. For example, if you are using Google Kubernetes Engine, you can create a node pool that can access GPUs, including NVIDIA Tesla K80, P100, T4, P4, and V100.

GCP also offers Tensorflow processing units (TPUs) — these are compute instances that include 4 GPUs especially designed for fast matrix multiplication, which is the main computation performed in a neural network. It provides equivalent performance as NVIDIA Tesla V100 instances. The advantage of TPUs is that they offer very efficient parallelization for deep learning workloads.


I’ve discussed how GPUs can contribute to deep learning projects and the main criteria for selecting the right GPU for your use case. I’ve also provided criteria for selecting a CPU vs. a GPU and explained the difference between low-cost gaming GPUs and higher-end workstation GPUs.

Finally, I’ve reviewed the GPU options offered by the big three cloud providers — Amazon, Azure, and Google Cloud. I hope this review will be helpful in starting your journey to GPU-optimized deep learning research.

Leave a Reply