Intel has enjoyed relatively unchallenged occupancy of the enthusiast CPU market for several years now. If you mark the FX-8350 as the last major play prior to subsequent refreshes (like the FX-8370), that marks the last major AMD CPU launch as 2012. Of course, later launches in the FX-9000 series and FX-8000 series updates have been made, but there has not been an architectural push since the Bulldozer/Piledriver/Steamroller series.
AMD Ryzen, then, has understandably generated an impregnable wall of excitement from the enthusiast community. This is AMD’s chance to recover a market it once dominated, back in the Athlon x64 days, and reestablish itself in a position that minimally targets parity in price to performance. That’s all AMD needs: Parity. Or close to it, anyway, while maintaining comparable pricing to Intel. With Intel’s stranglehold lasting as long as it has, builders are ready to support an alternative in the market. It’s nice to claim “best” on some charts, like AMD has done with Cinebench, but AMD doesn’t have to win: they have to tie. The momentum to shift is there.
Even RTG competitor nVidia will benefit from this upgrade cycle. That’s not something you hear a lot – nVidia wanting AMD to do well with a launch – but here, it makes sense. A dump of new systems into the ecosystem means everyone experiences revenue growth. People need to buy new GPUs, new cases, new coolers, and new RAM to accompany any moves to Ryzen. Misalignment of Vega and Ryzen make sense in the sense of not smothering one announcement with the other, but does mean that AMD is now rapidly moving toward Vega’s launch. Those R7 CPUs don’t necessarily fit best with an RX 480; it’s a fine card, just not something you stick with a $400-$500 CPU. Two major launches in short order, then, one of which potentially drives system refreshes.
AMD must feel the weight borne by Atlas at this moment.
In this ~11,000 word review of AMD’s Ryzen R7 1800X, we’ll look at FPS benchmarking, Premiere & Blender workloads, thermals and voltage, and logistical challenges. (Update: 1700 review here).
AMD Ryzen R7 1800X vs. 1700, 1700X Specs
|Ryzen R7 1700||Ryzen R7 1700X||Ryzen R7 1800X||Ryzen R5 1600X||Ryzen R5 1500X|
|Base / Boost||3.0GHz / 3.7GHz||3.4GHz / 3.8GHz||3.6GHz / 4.0GHz||3.6GHz / 4.0GHz||3.5GHz / 3.7GHz|
|Stock Cooler||Wraith Stealth (65W)
|Wraith Spire (95W)
|Wraith MAX (125W)
|Memory Support||2ch/2rank: 1866-2133
|Release Date||March 2||3/2/17||3/2/17||2Q17||2Q17|
Ryzen debuts strictly with the R7 series of CPUs, leaving subsequent R5 and R3 CPU launches to later dates. The R5 1600X and R5 1500X will both ship sub-$300, from what AMD tells us, and should begin availability sometime in 2Q17. The R3 budget CPUs won’t arrive until 2H17.
For today, we’re strictly focusing on the R7 1800X ($500). We had just under a week to benchmark the R7 1800X between touching down post-event and taking off for the GTX 1080 Ti announcement, but we do have the R7 1700 and R7 1700X available for testing. Our R7 1700 review, if all goes as planned, will publish tomorrow. We hope to follow this up immediately with the R7 1700X, barring any unforeseen issues during testing – and there have been many thus far. See pages 2 & 3 for that discussion.
At time of publication, the R7 1800X will be available for $500, the R7 1700X for $400, and the R7 1700 for $330. We did not receive stock coolers with any of our test units, but have acquired some and will receive them shortly.
Interestingly, AMD explicitly indicated that Zen will offer “near perfect scalability” across multiple sockets and multiple dies. This statement was more targeted at server, where you might have a dual-socket motherboard, but we’ll go on record now as predicting an eventual dual-die package under the Zen architecture. Any such chip would almost certainly not make it to the gaming market, and we don’t cover server/enterprise.
Let’s talk architecture.
AMD Zen Architecture (“Ryzen” CPUs)
The R7 desktop CPUs are all built on 8C/16T designs, using two of AMD’s new CCX modules as a replacement to the Bulldozer architecture. The Zen CCX moves to a single FPU per core, away from the 1FPU/2INT unit design of Bulldozer. Each CCX is a 4C/8T module, resulting in two modules for the R7 CPUs (2x CCX = 8C/16T). Devices that operate on 4C/8T will use a single CCX on the die. This also means that we almost certainly will not see dual-core Zen architecture desktop CPUs based on the current design, unless AMD goes the route of disabling units on the package. This same architecture is shared across desktop/enthusiast, server (see: Naples), and the forthcoming laptop variants of Ryzen CPUs. As an aside, there are presently no portable Zen processors – but AMD does have plans to move this direction.
Each CCX is connected to its own L3 Cache, with L2 privatized on a per-core level. Each pairing of two threads will split resources between the L2 Cache, a SIMD (Single Instruction, Multiple Data), and FPU, but we understand that there will be no cross-contamination between cores of these resources. AMD is running 512K of L2 cache per core, at 1.5mm^2 per core. This 512K cache is 8-way associative, so AMD doesn’t need probes to interfere with low-latency caches during snoops. Avoiding snoops on faster caches prevents long latencies that can damage performance. L3 Cache totals 8MB at 16mm^2 of the total die area (44mm^2), is 16-way associative, and shares PLL with all the cores. The L3 cache can be shut down if only L1 and L2 are in use, serving as a power saving feature during periods of zero L3 cache hits. Keep in mind that this is something that happens without user intervention or knowledge – there’s no switch you throw.
Each core is able to access 32KB L1, 512K L2 (cache per core), and 8MB L3 (per CCX) with the same latency, thanks to the CCX’s intentionally compartmentalized layout. Alongside other optimizations made, like a new prefetcher for L1 and L2 data, AMD advertises an approximate 2x increase in L1 and L2 bandwidth, or about a 5x increase in L3 bandwidth over previous generations.
Zen’s Victim Cache will match the fastest core on the CCX. L3 will follow this core’s trajectory as it downclocks (e.g. when load dies down), permitting some power savings without sacrificing the ability of the cache to keep up with faster cores. Here’s a closer look at the L3 Cache:
Above: L3 Cache
A 2MB gated clock region is positioned in the bottom-left of the L3 block diagram. Aggressive clock gating is performed as a power savings measure, but frequency matching and sensors scattered across the chip should ensure minimum performance loss for these reductions. All tradeoffs for power are calculated, allowing more efficient utilization of the total power budget for the chip.
The static RAM cells are less dense than Intel’s mature fabrication, but AMD still boasts improvements. AMD is using six-transistor SRAM for tag and data, with eight-transistor SRAM for state.
AMD’s L1 cache has also now been switched over to a writeback L1 Cache and off of the previous writethrough design. We have an interview with AMD Chief Architect Mike Clarke going live this week (check here), which will grant a cursory look at the implications of this change. For now, here’s an excerpt:
“On the writethrough cache, your writes would both go into the L1, and then they would be propagated again to go into the L2. With the writeback cache, the writes go into the L1 cache, and they don’t go into the L2 and the states maintain in the L1. They may transfer to the L2 once they’re evicted from the cache, but they’re not kept updated in both places […] not moving the data ‘til you absolutely have to.
“The shadow tags were a nice optimization. We have a victim cache for our L3, and so when a core misses in its L2, it might miss in the L3, but it might be in another L2 cache local in the core. Typically, we would just probe all those to find it. That causes some performance problems with bandwidth in the L2 and burns a lot of power; instead, we built the Shadow Tags within the L3 Macro, and that lets us quickly know which one of the cores the data is in and go get it. We also did it in a unique two-stage mechanism. With a partial lookup, we can know whether we’re going to hit or not, and only fire the second stage if we hit on the first stage. That lets us save about 75% of power than an equivalent implementation where you’d probe everywhere]. It’s pretty amazing.”
There’s also a new uOp-cache, which Clarke details as:
“One of the hardest problems in trying to build a high-frequency x86 processor is that the instructions are actually of variable length. That means to try to get a lot of them to dispatch in a wide form, it’s a serial process. To do that, generally we’ve had to build deep pipelines -- very power-hungry to do that.
“We call it an op-cache because it stores them in a more dense format than the past. Having seen them once, we store them in this op-cache with those boundaries removed, so when you find the first [instruction], you find all its neighbors with it. We can actually put them in that cache eight at a time, so we can pull eight out per cycle. We can cut two stages off that pipeline of trying to figure out the instructions. It gives us that double-whammy of a power savings and a huge performance uplift.”
AMD Zen Core
On a per-core level, the biggest item of note is that AMD’s move to integrate SMT (simultaneous multithreading) results in two threads per core. This can be thought of as similar in concept to Intel’s Hyperthreading, though the two are executed in different ways. Developer implementation of each approach to SMT will require fine-tuning as Ryzen ships and matures.
The uOp queue is fed 4 instructions/cycle from decode, then sent down the pipe to AMD’s segmented integer and floating point units. Within Integer land, the rename space is fitted with 168 registers, and can handle 192 in-flight instructions. We also see four ALUs and two AGUs per INT block within the core, eventually feeding down to 2 L/S units capable of 2 loads and a store per cycle. Zen’s 2x AGUs (versus the 3x AGU layout of competing Intel arch) further aids in power savings, though with some specific workload limitations that would not be present on Skylake. That said, the decision should help AMD’s clocks, and the company clearly believes the trade-off to be worthwhile. David Kanter has an excellent piece further discussing Zen architecture and its decisions as contrasted to Skylake/Haswell, and can be read in the Microprocessor Report from August, 2016.
This design, coupled with AMD’s decision to defer to 2x 128-bit wide AVX units (versus Skylake’s 256-bit wide option), mean that less die space is consumed for the core. If you were wondering about AMD’s 44mm^2 vs. 49mm^2 metrics, these decision are part of that.
Loads are now faster to the FPU, taking 7 cycles instead of 9 on previous architectures. The floating point block of the core is outfitted with 2x MUL and 2x ADD, with Zen now capable of performing fused multiply-adds. Fused Multiply-Adds reduce rounding to a single execution, rather than two roundings for the add and then the multiply (e.g. x*y+z will only be rounded once with FMA). This isn’t new, but it’s new to Zen. Alongside other instruction types, advanced vector support includes AVX2 and AVX512 on AMD’s Zen architecture, new in the time since Bulldozer. Blender, the 3D animation and rendering software, supports AVX2 in render engine cycles, for a real-world use case. Prime95 also uses AVX.
AMD’s opting out of FIVR, the fully-integrated voltage regulator that Intel has recently used. Instead, AMD is opting for a more traditional linear voltage regulator and LDO (low-dropout regulator), and is localizing voltage regulators across the core. Continuing its newfound obsession with sensors (for the record, we think this is a good thing), AMD has 9 vDroop detectors alongside 48 power supply monitors that help ensure stable and fine-tuned voltage regulation. When asked the benefit of LDOs, we were told that an LDO allows for faster cores to operate with lower voltage, thus saving power (and heat).
Adaptive voltage and frequency scaling (AVFS) also makes an appearance here, with Zen supporting 25MHz granularity of the clock as it attempts to reach boost states. This sees play in several ways and is comparable in concept to recent Polaris and Pascal posts we’ve made regarding boost features. Although power and heat may seem boring from an enthusiast user’s standpoint, the two are closely related to a chip’s overall performance. It is not possible to clock high if heat is a concern, and heat is usually a concern because power draw is a concern. The two are interwoven.
Boosting technologies allow for rapid regulation of frequency versus load, voltage, and temperature. If the CPU runs along the maximum Junction temperature (TjMax = 75C), the CPU – just like a GPU – will downclock itself rapidly and in (hopefully) small increments. This will prevent a thermal runaway scenario wherein the chip and cooler can no longer hold temperature, eventually resulting in a thermal trip. This might be a thermal shutdown or severe downclocking, depending on EFI configuration. By downclocking, temperatures can be checked on an as-needed basis, but performance loss can be profound under extreme conditions.
This comes with an obvious performance detriment: The less stable the clock, the more hits to framerate (particularly frametimes, in our GPU thermal testing). The smaller increments implemented in AMD’s new architecture mean that clock stability should remain relatively flat in an over time chart.
If this interests you, we have a thermal, voltage, and power testing section of this review with more information.
AMD AM4 Chipset Comparison
|USB 3.1 G1||6||2||2|
|USB 3.1 G2||2||2||1|
(2x SATA III or 2x PCIe 3.0)
|SATA RAID||0, 1, 10||0, 1, 10||0, 1, 10||0, 1||0, 1|
x8/x8 or multiplexed
(Requires better cooling)
Here’s a block diagram that GN previously made to help explain Ryzen / Zen architecture:
To learn more about the AM4 chipset differences in depth, check our previous content piece: AM4 Chipset Comparison – X370, B350, A320, and A/B/X300.
Before moving into methodology and testing, we’ll address some of the internet’s pre-launch leak concerns regarding memory support and logistical challenges for Ryzen.