Nvidia officially unveils the Ampere-based GA100 GPU

Published in AI

Nvidia officially unveils the Ampere-based GA100 GPU

by Slobodan Simic on14 May 2020

font size decrease font size increase font size
Print
Email

54 billion transistors for 128 SMs or 8,192 CUDA cores and more

Nvidia GTC 2020 has kicked off and, as expected, we got to see the new Amper-architecture with GA100 GPU and A100 Tensor Core GPU in full details.

Manufactured on TSMC's 7nm N7 manufacturing process, the GA100 is a compute beast with 54 billion transistors on 826mm2 die size. It packs 8 GPU Processing Clusters (GPCs) with 8 Texture Processing Clusters per GPC, two Streaming Multiprocessors (SMs) per TPC, and 16 SMs per GPC.

All of these numbers translate to a total of 128 SMs for a full GA100 GPU, and with 64 FP32 CUDA cores per SM, this adds up to an impressive total of 8,192 FP32 CUDA cores.

The GA100 also has 6 HBM2 2.4Gbps stacks with 12 512-bit memory controllers (5 HBM2 stacks and 10 512-bit controllers for A100), leaving it with 48GB of VRAM on a 6144-bit memory interface, We did not get the official compute performance numbers for the fully enabled GA100 GPU, but judging from the A100 numbers, the GA100 is an impressive beast.

The GPU also comes with four 3rd generation Tensor Cores per SM, which means 512 Tensor Cores for a full GPU.

nvidia ga100 1

nvidia ga100 2

The third generation Tensor Core makes a difference

What makes the GA100 such a compute beast are the 3rd generation Tensor Cores, which bring significant improvements, including acceleration for all data types, including FP16, BF16, TF32, FP64, INT8, INT4, and Binary.

It also features 192KB of combined shared memory and L1 data cache, new asynchronous copy instruction loads data directly from global memory into shared memory and can be used with shared-memory-based barrier unit, new instructions for L2 cache management and residency controls, new warp-level reduction instructions, as well as many other programmability improvements to reduce software complexity.

Bear in mind that GA100 packs 512 of 3rd gen Tensor Cores per GPU (432 for the A100), and these add up to some impressive compute performance.

The GA100-based A100 Tensor Core GPU has 7 GPCs for 108 SMs and 6912 FP32 CUDA cores per GPU

We'll first see the GA100 GPU in the A100 Tensor Core GPU, which will be a part of Nvidia's newly announced DGX A100 AI system.

Also read

Nvidia unveils the A100 Tensor Core GPU

Last modified on 15 May 2020

Rate this item

(0 votes)

Tagged under

More in this category: « Sony aims to make smarter image sensors Nvidia unveils the A100 Tensor Core GPU »

Latest comments

magus007
Musk stood against the bureaucrats by bribing politicians and brown-nosing with America's...

Mark Klein has died · 50 minutes ago
WA
Mark Klein is a hero for standing up to government yet Elon is a villain for doing more to stand...

Mark Klein has died · 5 hours ago
Andrea Sibaldi
It's this kind of stuff that makes IT really cool. My humble 0.02 Zlotys of course.

Summit supercomputer discovers how cells repair DNA damage · 8 hours ago
Marc GP
Seriously ?. Lately you have been alienating all your allies, demonstrating how little reliable...

Musk invades Poland · 9 hours ago
Marc GP
The problem is that without a controlling stake they could be kicked out at any time. Why would...

TSMC is making serious moves on Intel · 10 hours ago