Your Cart

NVIDIA A100 40GB Tensor Core GPU PCIe

NVIDIA A100 40GB Tensor Core GPU PCIe
NVIDIA A100 40GB Tensor Core GPU PCIe
  • Stock: In Stock
  • Brand: nVidia
  • Unprecedented acceleration at every scale
  • The Most Powerful Compute Platform for Every Workload
  • Ampere Architecture
  • 312 teraFLOPS (TFLOPS) of deep learning 
  • Next Generation NV-LINK
  • Partition 7 GPU instances
  • 40GB High Bandwidth Memory

Accelerating the Most Important Work of Our Time

The NVIDIA A100 Tensor Core GPU delivers unprecedented acceleration at every scale for AI, data analytics, and high-performance computing (HPC) to tackle the world’s toughest computing challenges. As the engine of the NVIDIA data center platform, A100 can efficiently scale to thousands of GPUs or, with NVIDIA Multi-Instance GPU (MIG) technology, be partitioned into seven GPU instances to accelerate workloads of all sizes. And third-generation Tensor Cores accelerate every precision for diverse workloads, speeding time to insight and time to market.

Enterprise-Ready Software for AI

The NVIDIA EGX™ platform includes optimized software that delivers accelerated computing across the infrastructure. With NVIDIA AI Enterprise, businesses can access an end-to-end, cloud-native suite of  AI and data analytics software that’s  optimized, certified, and supprted by NVIDIA to run on VMware vSphere  with  NVIDIA-Certified  Systems. NVIDIA AI Enterprise includes key enabling technologies  from NVIDIA for  rapid deployment, management, and scaling of AI workloads  in the modern hybrid cloud. 

Deep Learning Training

AI models are exploding in complexity as they take on next-level challenges such as accurate conversational AI and deep recommender systems. Training them requires massive compute power and scalability.

NVIDIA A100’s third-generation Tensor Cores with Tensor Float (TF32) precision provide up to 20X higher performance over the prior generation with zero code changes and an additional 2X boost with automatic mixed precision and FP16.When combined with third-generation NVIDIA® NVLink®, NVIDIA NVSwitch™, PCI Gen4, NVIDIA Mellanox InfiniBand, and the NVIDIA Magnum IO™software SDK, it’s possible to scale to thousands of A100 GPUs. This means that large AI models like BERT can be trained in just 37 minutes on a cluster of 1,024 A100s, offering unprecedented performance and scalability.

NVIDIA’s training leadership was demonstrated in MLPerf 0.6, the first industry-wide benchmark for AI training.

Deep Learning Inference

A100 introduces ground breaking new features to optimize inference workloads. It brings unprecedented versatility by accelerating a full range of precisions, from FP32 to FP16 to INT8 and all the way down to INT4. Multi-Instance GPU (MIG) technology allows multiple networks to operate simultaneously on a single

A100 GPU for optimal utilization of compute resources. And structural sparsity support delivers up to 2X more performance on top of A100’s other inference performance gains.

NVIDIA already delivers market-leading inference performance, as demonstrated in an across-the-board sweep of MLPerf Inference 0.5, the first industry-wide benchmark for inference. A100 brings 20X more performance to further extend that leadership.

High-Performance Computing

To unlock next-generation discoveries, scientists look to simulations to better understand complex molecules for drug discovery, physics for potential new sources of energy, and atmospheric data to better predict and prepare for extreme weather patterns.A100 introduces double-precision Tensor Cores, providing the biggest milestone since the introduction of double-precision computing in GPUs for HPC. This enables researchers to reduce a 10-hour, double-precision simulation running on NVIDIA V100 Tensor Core GPUs to just four hours on A100.HPC applications can also leverage TF32 precision in A100’s Tensor Cores to achieve up to 10X higher throughput for single-precision dense matrix multiply operations.

High-Performance Data Analytics

Customers need to be able to analyze, visualize, and turn massive datasets into insights. But scale-out solutions often become bogged down as these datasets are scattered across multiple servers. Accelerated servers with A100 deliver the needed compute power—along with 1.6 terabytes per second (TB/sec) of memory bandwidth and scalability with third-generation NVLink and NVSwitch—to tackle these massive workloads. Combined with NVIDIA Mellanox InfiniBand, the Magnum IO SDK, and RAPIDS suite of open source software libraries, including the RAPIDS Accelerator for Apache Spark for GPU-accelerated data analytics, the NVIDIA data center platform is uniquely able to accelerate these huge workloads at unprecedented levels of performance and efficiency.

Enterprise-Ready Utilisation

A100 is part of the complete NVIDIA data center solution that incorporates building blocks across hardware, networking, software, libraries, and optimized AI models and applications from NGC™. Representing the most powerful end-to-end AI and HPC platform for data centers, it allows researchers to deliver real-world results and deploy solutions into production at scale.


A100 40GB PCIe

A100 80GB PCIe

A100 40GB SXM

A100 80GB SXM



FP64 Tensor Core




Tensor Float 32 (TF32)

156 TFLOPS | 312 TFLOPS*

BFLOAT16 Tensor Core

312 TFLOPS | 624 TFLOPS*

FP16 Tensor Core

312 TFLOPS | 624 TFLOPS*

INT8 Tensor Core

624 TOPS | 1248 TOPS*

GPU Memory


80GB HBM2e


80GB HBM2e

GPU Memory Bandwidth





Max Thermal Design Power (TDP)





Multi-Instance GPU

Up to 7 MIGs @ 5GB

Up to 7 MIGs @ 10GB

Up to 7 MIGs @ 5GB

Up to 7 MIGs @ 10GB

Form Factor




NVIDIA® NVLink® Bridge for 2 GPUs: 600GB/s **

NVLink: 600GB/s

PCIe Gen4: 64GB/s

PCIe Gen4: 64GB/s

Server Options

Partner and NVIDIA-Certified Systems with 1-8 GPUs

NVIDIA HGX A100-Partner and NVIDIA-Certified Systems with 4,8, or 16 GPUs

NVIDIA DGX A100 with 8 GPUs