AMD Launches Instinct MI325X AI Accelerator, Reveals MI355X AI Powerhouse And New DPUs
We’ve got coverage of Turin and the Ryzen AI PRO 300 Series you should check out; here we’re going to focus on AMD’s Instinct and Pensando disclosures.
AMD Instinct MI325X AI Accelerator Ships This Quarter To Battle NVIDIA's H200
If you recall, AMD revealed the Instinct MI325X, the follow-on product to the highly successful MI300X, earlier this year. The MI325X expands on its predecessor with higher performance HBM3E with nearly double the memory footprint as well, but it’s not quite the beast AMD had originally talked about. More on that in a moment.The AMD Instinct MI325X is similar to the MI300X in terms of its GPU configuration, but its power management and frequency and voltage curves have been optimized to increase compute performance and it’s paired to faster and higher-capacity HBM3E memory. The MI300X features 192GB of HBM3 memory, whereas the MI325X has 256GB of HBM3. That HBM3 memory is also clocked higher and offers over 6TB/s of peak bandwidth, which is a 13% bump over the 5.3TB/s of the MI300X. Note, however, that 256GB is a 32GB reduction from the 288GB AMD had originally announced. At the Advancing AI event, AMD reps disclosed that its targets for MI325X have changed somewhat, and the company made the decision to cut memory capacity a bit to best address the market opportunity.
Somewhat higher frequencies (we don’t have exact numbers) and all of that additional memory and memory bandwidth effectively improves the MI325X’s compute performance. By keeping more data closer to the GPU, and feeding that data into the chip more quickly, GPU resources are more effectively and more efficiently utilized, which results in higher realized performance in the real-world.
Versus NVIDIA’s H200 and H200HGX, AMD is claiming some significant performance gains with a variety of models and workloads. AMD showed significant leads in inference workloads with multiple models of various sizes, and competitive training performance as well. The Instinct MI325X’s higher memory capacity, however, and AMD’s continued optimization of its AI software stack, set the company up well for future models that are likely to require more and more memory capacity.
AMD Announces The Instinct MI355X AI Accelerator
The successor to the Instinct MI325X was also revealed today – the Instinct MI355X. Details were scarce regarding the MI355X, but a few details were given, which jibe with initial hints that were provided at Computex. The Instinct MI355X will be based on a new GPU architecture – CDNA 4 – and will arrive sometime in the second half of next year. In comparison to the current-gen CDNA 3-based MI300X family, the MI355X is manufactured using a more advanced 3nm process node, it will feature 288GB of HBM3E memory, and it supports new FP4 and FP6 data types.
AMD claims the Instinct MI355X will offer up to a 1.8x increase in AI inference performance compared to CDNA 3 with the FP8 and FP16 data types, but its software and algorithms are advancing so often, performance targets are in a constant state of flux and are likely to change as the Instinct MI355X gets closer to launch.
AMD also reiterated its it plans to release new Instinct accelerators based on next-gen architectures on an aggressive, yearly cadence. That means a CDNA 5-based based series of accelerators will arrive sometime in 2026, but AMD didn't provide any details on that one other than to say it will be branded the MI400. With AMD’s plans to unify its consumer and data center / AI GPU architectures, however, there’s a strong possibility that future products may morph a bit over the next few years.
New AMD Pensando DPU Networking Technologies
The importance of fast and reliable connectivity between all of the systems in today’s AI data centers cannot be understated. The front-end shuttles data to an AI cluster, and the backend handles data transfers between accelerators and clusters. If either the front or backend are bottlenecked, the CPUs and various accelerators in the AI system aren’t optimally fed data, which results in lower utilization and potentially lost revenue or diminished quality of service.That’s where AMD’s Pensando DPUs (Data Processing Units) come in. To accelerate and efficiently manage the front and backend networks, and offload the CPUs in a system, AMD introduced the Pensando Salina DPU for the front-end and the Pensando Pollara 400, the industry’s first Ultra Ethernet Consortium (UEC) ready AI NIC, for the back-end.
The AMD Pensando Salina DPU is the third generation of the company’s high-performance programmable DPU, which supports 400G throughput, effectively doubling the performance and bandwidth versus the second-gen “Elba” DPU.
The AMD Pensando Pollara 400 is powered by the AMD P4 Programmable engine, and the company is claiming it is the industry’s first UEC-ready AI NIC. The AMD Pensando Pollara 400 supports next-gen RDMA software and offers a number of new features to optimize and enhance the reliability and scalability of high-speed networks. For example, the Pensando Pollara 400 supports path aware congestion control, to more efficiently route network traffic. It also supports fast lost packet recovery, which can more quickly detect lost packets and resent just that single packet to optimize bandwidth utilizations. And it supports fast network failure recovery as well. The AMD Pensando Salina DPU and Pensando Pollara 400 UEC AI NIC will both be available early next year.
AMD has made some bold claims at the Advancing AI event, and while much of the information regarding its Instinct accelerators was a refinement of previous disclosures, Hearing AMD’s vision and plans, and seeing its entire portfolio, from CPUs, to DPUs and GPUs, paints a compelling strategy for the exploding AI data center opportunity.