NVIDIA Unveils Vera, an 88-Core CPU Designed to Unclog AI Bottlenecks

NVIDIA announced its Vera CPU at GTC Taipei on May 31, 2026, a new chip built specifically to solve a growing problem in large-scale AI: CPU bottlenecks. As AI agents take on more complex management tasks like tool calls and code execution, the host processor, not the GPU, is often what slows things down. Vera is NVIDIA’s answer, using a custom monolithic core design and a high-bandwidth memory system to speed up these crucial orchestration workloads.

A Monolithic Design for Predictable Speed

The Vera CPU is built on a single, monolithic piece of silicon that houses 88 custom-designed NVIDIA "Olympus" cores. These cores are fully compatible with the Armv9.2 instruction set and feature a new microarchitecture that NVIDIA claims delivers a 50% instructions per cycle (IPC) improvement over its previous Grace CPU.

All 88 cores are connected with a second-generation fabric that provides 3.4 TB/s of bandwidth. By keeping all the cores on one die, NVIDIA is aiming for uniform and predictable memory access latency, sidestepping the potential variations that can come with chiplet-based designs.

Vera also introduces a concurrency model called Spatial Multithreading (SMT-X). Unlike traditional multithreading, SMT-X physically partitions a single core's resources between two hardware threads. This creates strong performance isolation, which is critical for cloud providers running different customer workloads on the same chip. With SMT-X enabled, a single 88-core Vera CPU presents as 176 hardware threads to the system.

Fast Memory and a Direct Line to GPUs

Vera's memory system uses LPDDR5X memory on field-replaceable SOCAMM modules, delivering 1.2 TB/s of bandwidth per socket. This setup is also power-efficient, consuming less than 30 watts, a significant drop from the 100-plus watts used by typical DDR5 server memory. Early benchmarks from Phoronix show the CPU can sustain over 90% of its peak memory bandwidth under load, offering over four times the memory bandwidth per core compared to competing x86 server CPUs.

To connect to its accelerators, Vera integrates NVLink Chip-to-Chip (C2C), providing 1.8 TB/s of coherent bandwidth directly to GPUs like the upcoming Rubin series. This link establishes a unified memory architecture, allowing GPUs to access the CPU's DRAM pool directly. The design is intended to minimize data transfer overhead when managing large models or coordinating complex inference pipelines.

Early Benchmarks and System Configurations

Initial independent benchmarks have started to surface, highlighting Vera's strengths in its target workloads:

Vera is scheduled for release in the second half of 2026 in a few different configurations:

Standalone Systems: Single- and dual-socket servers with TDPs ranging from 250W to 450W.
Vera CPU Rack: A liquid-cooled rack that can integrate up to 256 Vera CPUs.
Vera Rubin NVL72: A rack-scale system that combines 36 Vera CPUs with 72 Rubin GPUs, creating a 1:2 CPU-to-GPU ratio for tightly integrated AI tasks.

Pre-production units are already being deployed to early adopters, including Anthropic, OpenAI, SpaceXAI, and Oracle Cloud Infrastructure.