Back

Types of GPU virtualization in cloud computing and benefits vs bare metal

Published:

Aug 31, 2024

Last updated:

Apr 27, 2025

Introducing GPU virtualization

GPU virtualization is the abstraction of GPU access away from the physical machine the GPU is attached to. In CPU virtualization, virtual machines can run on shared physical servers without needing to know the details of the underlying hardware. GPU virtualization applies a similar idea to accelerators: it lets software access GPUs through a virtualized interface, even when the physical GPU is partitioned, passed through to a VM, or located elsewhere in the data center.

Existing types of GPU Virtualization

GPU virtualization generally appears in three forms:

Hardware partitioning, such as NVIDIA MIG

A single physical GPU is split into fixed, isolated slices.

Virtualized GPU access, such as vGPU or passthrough

A GPU is exposed to a VM or container through a virtualized device interface. This does not mean the GPU is being shared, rather, the entire GPU (or MIG partition) is exposed virtually to a VM or container.

Network-based GPU pooling, Thunder Compute’s approach

GPUs are made available across servers over the network, allowing compute and GPU resources to be allocated independently.

Single-node GPU sharing (NVIDIA MIG)

NVIDIA supports two virtualization solutions, MIG and vGPU.

MIG physically partitions a GPU into smaller, isolated slices, each with its own dedicated compute, memory, and bandwidth allocation. This is useful when multiple workloads need strong isolation and predictable performance, but the tradeoff is rigidity: each slice can only use the portion of the GPU assigned to it. If a workload is placed on a smaller MIG partition, it cannot burst into the unused capacity of the rest of the GPU, even if that capacity is idle. Given modern data center workloads generally require large allocations of memory, partitioning memory is often undesireable.

Dedicated GPU passthrough (NVIDIA vGPU)

vGPU is primarily about exposing GPU access to virtualized environments, such as VMs or containers. Despite the name, vGPU does not inherently mean the GPU is being split, partitioned, or shared across multiple workloads. In most practical deployments, it behaves similarly to passthrough: a VM or container gets access to a GPU through a virtualized interface, while the infrastructure layer handles device presentation, isolation, and management.

This is different from MIG, which physically partitions a GPU into fixed slices. It is also different from network-based GPU pooling, where the physical GPU may live on a different server entirely. With vGPU, the workload and GPU sit on the same physical host. vGPU is best understood as a way to make GPUs usable inside virtualized infrastructure, not as a GPU-sharing architecture.

Thunder Compute's approach: Network-Based Virtualization

Thunder Compute uses network-based GPU virtualization. Instead of requiring a GPU to be physically installed in the same server as the workload, Thunder makes GPUs accessible across the data center over the network.

Traditionally, adding a GPU to a server means physically connecting that GPU to the server’s motherboard through PCIe. With Thunder Compute, that relationship is virtualized: a workload can run on one machine while accessing GPU capacity located on another. From the application’s perspective, the GPU appears locally available, even though the physical device is remote.

This creates a hyperconverged data center-wide pool of GPUs, similar to what Ceph does for storage. Rather than each server being limited to the GPUs physically attached to it, compute and GPU capacity can be allocated independently. That makes it easier to place workloads, reduce fragmentation, and keep expensive GPU resources utilized.

Takeaway

MIG is an initial step towards GPU sharing by slicing a single GPU into fixed partitions. But for many AI workloads, especially memory-heavy workloads, fixed slices are too rigid. Network-based GPU virtualization is closer to the virtualization patterns that reshaped CPUs and storage: abstract the physical resource, pool it across infrastructure, and allocate it dynamically. Thunder Compute applies that same idea to GPUs.

Carl Peterson

Try Thunder Compute

Start building AI/ML with the world's cheapest GPUs