Qualcomm Snapdragon X: Oryon CPU And Adreno GPU Architectures Explored
Qualcomm Snapdragon X: A Deep Dive Look At Oryon CPU And Adreno GPU Architectures
Qualcomm Snapdragon X Processors After mountains of performance data and product disclosures, we finally have Qualcomm Snadragon X Oryon CPU and Adreno GPU architecture details to share.
|
|||
|
|
Over the last 8 months or so, the PC industry has been abuzz with discussion and speculation regarding Qualcomm’s Snapdragon X series of processors for premium thin and light laptops. It’s not like Qualcomm has been particularly quiet since first unveiling the Snapdragon X Elite at its Snapdragon Summit last October, though. Since its initial debut, Qualcomm allowed us to witness, and later run, an array of benchmarks on some Snapdragon X reference platforms, the company unveiled the Snapdragon X Plus, there have been a myriad of AI-related demos, and most recently Qualcomm, Microsoft and an array of PC makers introduced more than 20 Copilot+ AI PCs, all exclusively powered by Snapdragon X. Qualcomm also splashed Taipei with a plethora of Snapdragon X messaging during Computex ‘24, declaring that the PC has been 'Reborn’.
As vocal as Qualcomm has been regarding the Snapdragon X series, to date we hadn’t really disclosed many deep, architectural details. We knew the fully custom Oryon CPU cores were created from technology derived from Qualcomm’s acquisition of NUVIA, and that the chips sport a powerful NPU and Adreno-based graphics engine, but that was about it. That all changes today, though. Qualcomm has disclosed an array of details regarding the Oryon CPU and Adreno GPU cores in the Snapdragon X series, and we’ve got the full scoop.
So, without further delay, let’s see what makes the Qualcomm Snapdragon X Series tick...
For the unitiated, here's a high-level look at all of the functional blocks in a Snapdragon X-based platform. The SoC itself is comprised Qualcomm Oryon CPU cores, an Adreno GPU engine, and a Hexagon NPU, linked to a memory controller, Qualcomm Spectra ISP, Secure Processing Unit, a Sensing Hub, and of course some IO. Snapdragon X chips will have 10 or 12 CPU cores, clocked at up to 3.8GHz for multi-threaded workloads, or up to 4.3GHz with one or two threads. The CPU cores also outfitted with 42MB of total cache. The Adreno GPU will offer up to 4.6 FLOPS of compute performance and the Hexagon NPU tops out at 45 TOPS. The SoC supports up to 64GB of LPDDR5X memory, operating in an 8-channel configuration at 8,448MT/s, for a healthy 135GB/s of peak memory bandwidth.
As you'd expect from Qualcomm, the chipset connected to the Snapdragon X has plenty of wireless connectivity, including the company's Snapdragon X65 5G modem and its FastConnect 7800 WiFi 7 / Bluetooth 5.4 combo radio module. Quick charge support is on-board as well, along with a Qualcomm Aqstic Hi-Fi DAC, and high-efficiency amplifier. In terms of its audio, photo / video, sensing hub, and wireless capabilities, the Snapdragon X platform resembles some of Qualcomm's top-end mobile SoCs found in today's flagship smartphones. The Snapdragon X's CPU and GPU cores, however, are different animals though...
Snapdragon X Secret Sauce: Oryon CPU Cores
As mentioned, the first wave of Snapdragon X Plus and X Elite chips will have 10 or 12 CPU cores, respectively. We've known that for a while, but now have much more insight into what differentiates these cores from Qualcomm's other processors.At a high level, the Oryon CPU cores employed in the Snapdragon X series are wider, deeper, and have much more bandwidth available at their disposal, both internally across various IP blocks and out to system memory. The cores in the SoC are arranged in three, 4-core clusters, each with their own allotment of 12MB of L2 cache and a dedicated bus interface unit.
Each of those cores features an Instruction Fetch Unit (IFU), a Vector Execution Unit (VXU), a Rename and Retire Unit (REU), an Integer Execution Unit (IXU), a Load and Store Unit (LSU) and a Memory Management Unit (MMU).
The instruction cache is a 6-way 192KB pool, with support for up to 16 fetches per cycle. The L1 TLB (translation lookaside buffer) features a 256 entry, 8-way buffer, with support for 4K and 64K translation granules, and there are multiple branch prediction tables, for single cycle, conditional, and indirect brand target prediction.
The specifics of the integer and vector pool registers and the width of the execution and load and store units are detailed in the slide above. The Integer Units can manage up to 6 ALU operations per cycle, 2 branches per cycle, and 2 multiply / multiply accumulate ops per cycle. The Vector execution units are 128-bits wide, and can handle up to 4 FP32 (ADD, MUL, MLA) or 4 INT32 (ALU, MLA) operation per cycle, with support for a wide variety of data types from INT8 through FP64.
In terms of load-store capability, the design includes 96KB of 6-way L1 cache, and a 224 entry 7-way buffer with support for 4K and 64K translation granules. The cores can handle any combination of 4 load-store operations per clock, with a 192 entry Load queue and 56 entry Store queue. Qualcomm also put a lot of work into the prefetch unit, which can prefetch into the L1 data cache, L2, and data translation buffers, with a mispredict latency of 13 clock cycles.
Snapdragon X Memory Management Unit
The memory management unit in the Oryon CPU core supports 4KB and 64KB granules, with support for virtualization and 2-stage translation, and nested virtualization -- so, a guest VM can host its own guest hypervisor. The L1 instruction and L1 data TLBs support virtual to physical translations for all traffic (1 cycle access), and the L2 TLB has a >8K entry, 8-way structure, designed to handle application with large memory footprints.
The shared 12MB of L2 cache per core cluster is fully coherent, 12-way set associative, and operates at the full core frequency. The L2 is optimized for L1 data accesses and supports 64B reads, writes, evictions and fills to and from the L1 caches, with an average latency of 17 cycles for an L1 miss to L2 hit. Qualcomm also notes that the snoop operations are optimized for both core-to-core and cluster-to-cluster operations.
There is also a 6MB System Level Cache (SLC) available, with average latency in the 26-29ns range, and 135GB/s of bi-directional bandwidth. And as mentioned previously, Snapdragon X processors support up to 64GB of LPDDR5X memory, operating in an 8, 16-bit channel configuration at 8,448MT/s, for 135GB/s of peak memory bandwidth. Latency to system memory should fall in the 102-104ns range.
Of course, there is a gauntlet of security related features present on the Snapdragon X, including side-channel mitigations, control flow integrity measures, and a dedicated random number generator per CPU cluster. Qualcomm also notes that the Snapdragon X is not succeptible to many of the attaches from recent years, including PACMAN, Augury, GoFetch and others.
We've covered expected CPU performance a handful of times before, but to quickly reiterate, Qualcomm is claiming the Snapdragon X is both higher performing and more efficient than competing architectures from AMD and Intel, on a per-core basis.
But what about Graphics performance? Adreno X1 details up next...