Cloud GPU Instances: What Are the Options?

By on

Click here to learn more about Gilad David Maayan.

If you’re running demanding machine learning and deep learning models on your laptop or on GPU-equipped machines owned by your organization, there is a new and compelling alternative. All major cloud providers offer cloud GPUs – compute instances with powerful hardware acceleration, which you can rent per hour, letting you run deep learning workloads on the cloud. Let’s review the concept of cloud GPUs and the offerings by the big three cloud providers – Amazon, Azure, and Google Cloud.

What Is a Cloud GPU?

A cloud graphics processing unit (GPU) provides hardware acceleration for an application, without requiring that a GPU is deployed on the user’s local device. Common use cases for cloud GPUs are:

  • Visualization workloads: Powerful server/desktop applications often employ graphically demanding content. Cloud GPUs can be used to accelerate video encoding, rendering, and streaming, as well as computer-aided design (CAD) applications.
  • Computational workloads: Large-scale mathematical modeling, deep learning, and analytics require the parallel processing abilities of general-purpose graphics processing unit (GPGPU) cores.

How Is GPU Advancing Deep Learning?

Deep learning (DL), an advanced machine learning technique that is the foundation of artificial intelligence (AI), relies on representational learning using artificial neural networks (ANN). This model entails processing of large datasets – a highly computing-intensive process. GPUs, which are specially designed to run multiple calculations at once, can expedite this significantly, making them ideal for training AI models.

Because they contain numerous cores, GPUs excel in parallel processing computations. Additionally, their high memory bandwidth easily accommodates the large amounts of data characteristic of deep learning machines.

For large-scale deep learning workloads, organizations use multi-GPU clusters. These can run either with or without GPU parallelism. For deep learning frameworks that support it, GPU parallelism combines several GPUs in a single computer, or across several physical machines, making it possible to distribute large models. Running several GPUs separately ­– without parallelism – lets you run different processes on each GPU, making it possible to experiment with several algorithms simultaneously.

Cloud GPU Instances: What Are the Options?

The big three cloud providers all offer cloud GPU services. Let’s review how you can leverage cloud-based hardware acceleration using AWS, Azure, and Google Cloud.


Amazon’s P3 instances provide GPUs for machine learning and deep learning applications. Amazon EC2 on demand pricing for P3 instances ranges between $3-31 per hour. P3 instances include:

Image Source: Pixabay
  • A maximum of eight NVIDIA Tesla V100 GPUs, each GPU incorporating five pairs of 120 CUDA Cores and 640 Tensor Cores
  • High frequency Intel Xeon E5-2686 v4 (Broadwell) processors for p3.2xlarge, p3.8xlarge, and p3.16xlarge instances
  • High frequency 2.5 GHz (base) Intel Xeon P-8175M processors for p3dn.24xlarge instances
  • NVLink supported peer-to-peer GPU communication
  • Aggregate network bandwidth of up to 100 Gbps
  • p3dn.24xlarge instances support EFA

Amazon recently released P4 instances, which provide high-end NVIDIA T4 GPUs – Amazon claims they are the industry’s most cost-effective GPU instances.

Another offering is Amazon Elastic Graphics, which lets you add inexpensive graphics acceleration to a variety of EC2 instances, so long as the instances offer sufficient storage, memory, and compute capabilities. This saves the cost of dedicated standalone GPU instances. Elastic Graphics uses accelerators with up to eight GB graphics memory, and supports OpenGL 4.3.

As with attaching AWS EBS volumes to an existing EC2 instance, you can now add graphics acceleration with Amazon Elastic Graphics, adjusting the amount of acceleration you wish to each workload rather than being limited by a fixed hardware configuration.


Of the wide variety of virtual machine (VM) sizes on Microsoft Azure, NC, ND, and NV series are optimized for workloads that require GPU-powered visualization:

  • NC, NCv2, and NCv3 series support compute and network-intensive algorithms and applications, including deep learnings, AI, Open-CL-based simulations and applications, and CUDA. The original NC-series employs Intel’s Xeon E5-2690 v3 2.60GHz v3 (Haswell) processor, versions 2 and 3-series virtual machines use Intel’s Xeon E5-2690 v4 (Broadwell) processor. The v3 series is especially apt for high-performance workloads using NVIDIA’s Tesla V100 GPU.
  • The ND-series uses an Intel Xeon Platinum 8168 (Skylake) processor while its v2 uses Intel’s Xeon E5-2690 v4 (Broadwell) processor. Both incorporate NVIDIA’s Tesla P40 GPU to focus on deep learning training and inference activities.
  • Both the NV-series versions 1 and 3 employ NVIDIA’s Tesla M60 GPU to enhance VDI scenarios using OpenGL and DirectX frameworks, as well as remote visualization, gaming, streaming, and encoding.
  • NV’s version 4 VM sizes have been designed and optimized for remote visualization and virtual desktop infrastructures (VDI). Employing AMD’s Radeon Instinct GPU, v4 only supports Windows guest operating systems but enables GPU partitioning for workloads that require lesser graphics resources.

Google Cloud

With Google Cloud, you can add GPUs to your VMs to accelerate specific workloads, like data processing and machine learning, 3D visualization and rendering, or virtual applications that require an NVIDIA GRID enhanced virtual workstation.

You can add NVIDIA Tesla K80, P100, P4, V100, and T4 GPUs to Google Kubernetes Engine (GKE) node pools, for capabilities including image recognition, natural language, and other deep learning activities, as well as image processing, video transcoding, and other compute-intensive tasks.

For fast matrix multiplications and for training convolutional networks or large transformers, the Tensorflow Processing Unit (TPU) is a very cost-efficient, specialized GPU offering, designed by Google especially for machine learning workloads.

The Tensor-Core-enabled NVIDIA V100 and TPUv2 are similar in performance for the standard ResNet50 model, but Google’s TPU is more cost-effective, thanks to its sophisticated parallelization infrastructure. This increases speed compared to GPUs when using more than one cloud TPU – each of which is equivalent to four GPUs.

Although the open-source PyTorch machine learning library is now experimenting with TPUs, they are, for now, best employed alongside additional compute resources.


In this article I reviewed the cloud GPU options offered by the world’s three biggest cloud providers:

  • AWS: offering four generations of GPU instances, with the commonly used P3 series offering up to eight NVIDIA Tesla V100 GPUs with fast NVLink networking.
  • Azure: also offering four generations of GPU instances, with NC, NCv2, and NCv3 based on NVIDIA GPUs and v4 based on AMD Radeon Instinct GPU.
  • Google Cloud: offering traditional GPUs based on NVIDIA hardware, and Google’s own Tensorflow Processing Unit (TPU), built especially for large matrix multiplications and highly cost effective for specific types of deep learning workloads.

I hope this will be helpful as you scale up your organization’s artificial intelligence infrastructure.

We use technologies such as cookies to understand how you use our site and to provide a better user experience. This includes personalizing content, using analytics and improving site operations. We may share your information about your use of our site with third parties in accordance with our Privacy Policy. You can change your cookie settings as described here at any time, but parts of our site may not function correctly without them. By continuing to use our site, you agree that we can save cookies on your device, unless you have disabled cookies.
I Accept