AMD Zen 5 Architecture Reveal: A Ryzen 9000 And Ryzen AI 300 Deep Dive
A Deeper Dive Into AMD's Zen 5 Architecture, The Ryzen 9000 Series And Ryzen AI 300
AMD Ryzen AI 3000 (Mobile) And Ryzen 9000 (Desktop) AMD has revealed some new architectural information regarding its Zen 5-based Ryzen AI 3000 and Ryzen 9000 series processors, RDNA 3.5 iGPU, and XDNA 2 AI engine, and we've got the details.
|
|||
|
|
In early June, during Computex 2024, AMD revealed some initial information regarding its Zen 5 processor architecture and the Ryzen AI 300 and Ryzen 9000 mobile and desktop chips, based on Zen 5. We've previously published coverage of the Ryzen AI 300 series and Ryzen 9000 series, which offered a number of high-level details on the branding and product families, but have since spent some time with AMD and gotten additional detail on both the products and the Zen 5 architecture itself.
To quickly recap, Zen 5 is AMD's next-generation processor architecture, which will scale down to low-power mobile platforms, on up to big-iron many-core servers. It's an evolution of the Zen 4 architecture, optimized for better performance and efficiency with modern PC workloads.
AMD Zen 5 Architecture Details
A summary of what Zen 5 offers is outlined in the slide above. Zen 5 offers higher IPC thanks to improved fetch and decode, wider dispatch and execution engines, and increased cache bandwidth. Integer, Floating Point, and Vector math advances have been made as well
As is usually the case with a new processor architecture, Zen 5 offers better branch prediction with lower latency. The architecture features dual decode pipelines, that reportedly utilizes better TAGE (TAgged GEometric) prediction algorithms that are more accurate than Zen 4, and instruction cache latency and bandwidth have been improved as well.
Zen 5 features an 8-wide dispatch / retire engine -- versus 6-wide in Zen 4 -- with 6 ALUs (3 Multiples) and a new unified ALU scheduler design. Previous-gen Zen architectures featured a unique scheduler per ALU. As a result, Zen 5 has a 40% larger execution window that can support up to 448 outstanding operations. The architecture can essentially handle more instructions and better recover from misses.
AMD also grew the Zen 5 data cache by 50% (32K vs. 48KB) and doubled the maximum bandwidth to the L1 cache and FPU. Zen 5 features a 48KB 12-way L1 data cache, but maintains a 4-cycle load to keep latency low, despite the higher capacity. The bandwidth from L2 to L1 has also been doubled versus Zen 4.
In Zen 4, AVX-512 is implemented using double-pumped 256-bit data chunks. 512-bit AVX-512 workloads were broken down into two 256-bit chunks and sent through the engine. This was originally done to avoid large frequency fluctuations when executing AVX-512 workloads like Intel's architectures to date, but AMD seems to have figured out that problem with Zen 5, because it features a full 512-bit data path. The FP/vector math unit in Zen 5 features six pipelines with two-cycle latency for FADD instructions, which is a reduction from 3-cycles in Zen 4. They can also handle a larger number of in-flight FP instructions versus the previous-gen.
All told, Zen 5 cores will reportedly offer an average 16% IPC uplift versus Zen 4. There will be instances when the FP and Vector math advancements have a much larger impact, or when the additional cache bandwidth come into play where the overall performance uplift may be significantly higher than 16%, but generally speaking, clock for clock, Zen 5 should be a significant upgrade over its predecessor. And this also rings true for the Zen 5c cores in Strix Point vs. Zen 4, though Zen 5c won't clock as high as "full" Zen 5 cores at a given voltage (we're told the delta is about 20%) and they have half the amount of L3 cache.
Speaking of those enhanced FP and Vector math capabilities, Zen 5 cores should also be significantly more performant than Zen 4 with nascent AI-centric workloads. Single core performance improvements for machine learning and AES-XTS workloads, which are for cryptography of storage devices, are north of 30%.
Of course, AMD won't be stopping at Zen 5 and Zen 5c. Though no specific details were given, AMD will continue to offer generational improvements and stick to the heterogeneous core / single microarchitecture philosophy with Zen 6 and Zen 6c. According to AMD Zen 6 is on track and due to arrive next year.
The Ryzen 9000 Family, Socket AM5 And Chipset Info
Although Zen 5 is leveraged for both Strix Point-based Ryzen AI 300 series mobile and Ryzen 9000 series desktop processors, it's important to understand that this first wave of desktop and mobile chips are different animals. Though AMD has replicated desktop configurations in the past with its previous-gen Dragon Range mobile processors, Strix Point (Ryzen AI 300) is a new, monolithic die, while Granite Ridge (Ryzen 9000) is comprised of multiple compute dies and an IO die, similar to the Ryzen 7000 series (and previous-gen desktop Ryzen processors).This isn't new information, but AMD reiterated its commitment to Socket AM5. Socket AM4, which has been around for nearly a decade now, was ready for multiple generations of Ryzen processors and APUs, and new products continue to be released for the platform. AMD is not quite ready to commit to quite such a long lifespan for socket AM5, but did say say it would be supported beyond 2027. That's potentially two more processor generations.
We've already shown you this line-up before, as well. The initial wave of Ryzen 9000 series processors will span the 6-core / 12-thread Ryzen 5 9600X on up to the 16-core / 32-thread Ryzen 9 9950X. As you'd expect, total cache is reduced as the core counts go down, which is also true of TDPs. Note that with the exception of the top-end Ryzen 9 9950X, the TDPs of the rest of the line-up are lower than their Ryzen 7000 series counterparts. The initial Ryzen 7000X series chips had TDPs that span 105 - 170 watts. 65 watt non-X variants didn't arrive until many months later.
Granite Ridge processors feature dual 8-core Zen 5 CCDs, manufactured on TSMC's 4nm process, linked to a 6nm IO die. The Ryzen 7000 series also featured a 6nm IO die, but Zen 4 CCDs were manufactured at 5nm. TSMC's 4nm process (specifically N4P) offers significant power, performance, efficiency and area improvements vs. 5nm.
AMD Ryzen 9000 Series Expected Performance
In terms of performance, AMD is touting some big gains across the board, versus its previous-gen processors and Intel's competitive offerings. The Ryzen 7 9700X, for example, is reportedly about 12% faster on average while gaming versus the Ryzen 7 5800X3D, which features 3D V-Cache. The Ryzen 7 9700X, which has a TDP of only 65W, also consumes less power than the 105W Ryzen 7 5800X3D.
Versus competitive offerings from Intel's 14th Gen Core processors, AMD is also claiming performance leadership. In Productivity, Content Creation, and Gaming benchmarks, AMD claims the Ryzen 5 9600X is faster across the board -- at least in the tests presented here. Handbrake shows a massive 94% lead for the Ryzen 5 9600X, while Horizon Zero Dawn shows a 29% advantage. The 9600X's lead in the other tests fall in the 5% - 23% range.
The same is mostly true in a comparison of the Intel Core i7-14700K versus the Ryzen 7 9700X. Although the Ryzen 7 9700X's lead in Handbrake isn't quite as large, it's still an across the board win for AMD to the tune of 4% - 42%.
And, according to AMD, it's the same story for the Ryzen 9 9900X versus the Core i9-14900K. Of course, no one should draw any solid conclusions from vendor provided benchmarks, but considering the Ryzen 7000 series' performance versus Intel's 14th Gen parts, Ryzen 9000 should look pretty good. We'll know for sure soon enough. We expect to have a full review posted, with fresh benchmarks for all platforms, before month's end.
Ryzen 9000 Series Overclocking Enhancements
In addition to the new, more efficient and higher performing architecture, AMD also added some additional overclocking-related tools to the Ryzen 9000 series, which relate to both memory and the processor cores. There's a new AGESA coming with support for up to DDR5-8000 memory speeds and the platform will support memory overclocking on-the-fly. Optimized performance profiles will also allow users to eke some additional performance from their rigs, by tightening timings on memory when said memory is not clocked to its maximum frequency. For example, if you've got a DDR5-7000 EXPO-Ready memory kit, but are clocking it at DDR5-6000 to maintain a 1:1 memory controller to memory clock ratio, the optimized profile will tighten the timings to offer lower latency. We should also note that Ryzen 9000 series processors will have JEDEC support for DDR5-5600 memory speeds, and DDR5-6000 will remain the "sweet spot" to maintain that 1:1 memory controller and memory clock ratio.
A new Curve Shaper feature is also coming to Ryzen Master for Ryzen 9000 series processors. Curve Shaper builds upon Curve Optimizer by giving users the ability to selectively add or subtract from 15 different points on a matrix of frequency and temperature values. By doing so, users can further reduce voltages from bands that are stable or increase voltages when needed to minimize instability at higher frequencies. The "re-shaped" Curve Shaper curve will apply to all cores, and can be shifted by Curve Optimizer.
If all of this sounds good and you're itching to upgrade, you won't have to wait much longer. Ryzen 9000 series desktop processors will be available for purchase on July 31.
Ryzen AI 300 Series Mobile Processors
Unlike Ryzen 9000 series desktop processors, which feature multiple chiplets on a single package, Ryzen AI 300 series mobile processors -- codenamed Strix Point -- feature a monolithic die. At this point in time, Strix Point is also the only place where Zen 5c cores are used, though next-gen, high-core-count EPYC processors with up to 384 cores will also leverage Zen 5c.
The Strix Point-based Ryzen AI 300 series not only features the new Zen 5 architecture, with up to 12 CPU cores / 24 threads, but next-gen RDNA 3.5 graphics with up to 16 Compute Units (CUs) and an XDNA 2 NPU with 32 compute tiles offering up to 50 TOPS of compute performance.
We'll offer more detail on RDNA 3.5 and XDNA 2 on the next page, but for now we will cover some expected performance versus competitive offerings from Intel and Qualcomm.
Across productivity, content creation and gaming workloads, AMD is claiming performance leadership across the board for its Ryzen AI 9 HX 370 processor, versus the Intel Core Ultra 9 185H and Qualcomm Snapdraon X Elite X1E-84-100. The biggest gains are shown in the gaming tests, where the Ryzen processor's Radeon 890M graphics can really stretch its legs. Multi-threaded workloads, like 3D rendering, also show strong performance for the Ryzen AI 9 HX 370 versus Intel and Qualcomm. Again though, please keep in mind these are vendor provided benchmarks -- we'll be running our own tests soon enough.