New Amazon EC2 Instances with Up to 8 NVIDIA Tesla V100 GPUs (P3)
Driven by customer demand and made possible by on-going advances in the state-of-the-art, we’ve come a long way since the original m1.small instance that we launched in 2006, with instances that emphasize compute power, burstable performance, memory size, local storage, and accelerated computing.
The New P3
Today we are making the next generation of GPU-powered EC2 instances available in four AWS regions. Powered by up to eight NVIDIA Tesla V100 GPUs, the P3 instances are designed to handle compute-intensive machine learning, deep learning, computational fluid dynamics, computational finance, seismic analysis, molecular modeling, and genomics workloads.
P3 instances use customized Intel Xeon E5-2686v4 processors running at up to 2.7 GHz. They are available in three sizes (all VPC-only and EBS-only):
|Model||NVIDIA Tesla V100 GPUs||GPU Memory||NVIDIA NVLink||vCPUs||Main Memory||Network Bandwidth||EBS Bandwidth|
|p3.2xlarge||1||16 GiB||n/a||8||61 GiB||Up to 10 Gbps||1.5 Gbps|
|p3.8xlarge||4||64 GiB||200 GBps||32||244 GiB||10 Gbps||7 Gbps|
|p3.16xlarge||8||128 GiB||300 GBps||64||488 GiB||25 Gbps||14 Gbps|
Each of the NVIDIA GPUs is packed with 5,120 CUDA cores and another 640 Tensor cores and can deliver up to 125 TFLOPS of mixed-precision floating point, 15.7 TFLOPS of single-precision floating point, and 7.8 TFLOPS of double-precision floating point. On the two larger sizes, the GPUs are connected together via NVIDIA NVLink 2.0 running at a total data rate of up to 300 GBps. This allows the GPUs to exchange intermediate results and other data at high speed, without having to move it through the CPU or the PCI-Express fabric.
What’s a Tensor Core?
I had not heard the term Tensor core before starting to write this post. According to this very helpful post on the NVIDIA Blog, Tensor cores are designed to speed up the training and inference of large, deep neural networks. Each core is able to quickly and efficiently multiply a pair of 4×4 half-precision (also known as FP16) matrices together, add the resulting 4×4 matrix to another half or single-precision (FP32) matrix, and store the resulting 4×4 matrix in either half or single-precision form. Here’s a diagram from NVIDIA’s blog post:
Read the entire article here, New – Amazon EC2 Instances with Up to 8 NVIDIA Tesla V100 GPUs (P3)
Via the fine folks at Amazon Web Services.