AMD’s High-Bandwidth Cache Controller protocol is one of the keystones to the Vega architecture, marked by RTG lead Raja Koduri as a personal favorite feature of Vega, and highlighted in previous marketing materials as offering a potential 50% uplift in average FPS when in VRAM-constrained scenarios. With a few driver revisions now behind us, we’re revisiting our Vega 56 hybrid card to benchmark HBCC in A/B fashion, testing in memory-constrained scenarios to determine efficacy in real gaming workloads.
Variations of “HBM2 is expensive” have floated the web since well before Vega’s launch – since Fiji, really, with the first wave of HBM – without many concrete numbers on that expression. AMD isn’t just using HBM2 because it’s “shiny” and sounds good in marketing, but because Vega architecture is bandwidth starved to a point of HBM being necessary. That’s an expensive necessity, unfortunately, and chews away at margins, but AMD really had no choice in the matter. The company’s standalone MSRP structure for Vega 56 positions it competitively with the GTX 1070, carrying comparable performance, memory capacity, and target retail price, assuming things calm down for the entire GPU market at some point. Given HBM2’s higher cost and Vega 56’s bigger die, that leaves little room for AMD to profit when compared to GDDR5 solutions. That’s what we’re exploring today, alongside why AMD had to use HBM2.
There are reasons that AMD went with HBM2, of course – we’ll talk about those later in the content. A lot of folks have asked why AMD can’t “just” use GDDR5 with Vega instead of HBM2, thinking that you just swap modules, but there are complications that make this impossible without a redesign of the memory controller. Vega is also bandwidth-starved to a point of complication, which we’ll walk through momentarily.
Let’s start with prices, then talk architectural requirements.
This week's hardware news recap covers rumors of Corsair's partial acquisition, HBM2 production ramping, Threadripper preparation, and a few other miscellaneous topics. Core industry topics largely revolve around cooler prep for Threadripper this week, though HBM2 increasing production output (via Samsung) is also a critical item of note. Both nVidia and AMD now deploy HBM2 in their products, and other devices are beginning to eye use cases for HBM2 more heavily.
The video is embedded below. As usual, the show notes rest below that.
This episode of Ask GN (#28) addresses the concept of HBM in non-GPU applications, primarily concerning its imminent deployment on CPUs. We also explore GPU Boost 3.0 and its variance within testing when working on the new GTX 1080 cards. The question of Boost's functionality arose as a response to our EVGA GTX 1080 FTW Hybrid vs. MSI Sea Hawk 1080 coverage, and asked why one 1080 was clock-dropping differently from another. We talk about that in this episode.
Discussion begins with proof that the Cullinan finally exists and has been sent to us – because it was impossible to find, after Computex – and carries into Knights Landing (Intel) coverage for MCDRAM, or “CPU HBM.” Testing methods are slotted in between, for an explanation on why some hardware choices are made when building a test environment.
In additional hardware news to what we published yesterday -- a look at Intel's Kaby Lake (7600K, 7700K, etc.), the X2 Empire unique enclosure, and Logitech's G Pro mouse -- we are today visiting topics of Samsung's GDDR6, SK Hynix's HBM3 R&D, PCIe Gen4 power budget, and Zen's CCX architecture.
The biggest news here is Samsung's GDDR6, due for 2018, but it's all important stuff. PCI-e Gen4 is looking at being fully ratified EOY 2016, HBM3 is in R&D, and Zen is imminent and finalized architecturally. We'll talk about it more specifically in our reviews.
Update: Tom's misreported on PCI-e power draw. The Gen4 PCIe interface will still be 75W.
Anyway, here's the news recap:
Memory manufacturer Samsung is developing GDDR6 as a successor to Micron's brand new GDDR5X, presently only found in the GTX 1080 and Titan XP cards. GDDR6 may feel like a more meaningful successor to GDDR5, though, which has been in production use since 2008.
In its present, fully matured form, GDDR5 operates at 8Gbps maximally, including on the RX 480 and GTX 10 series GPUs. Micron demonstrated GDDR5X as capable of approaching 12-13Gbps with proper time to mature the architecture, but is presently shipping the memory in 10Gbps speeds for the nVidia devices.
Samsung indicates an operating range of approximately 14Gbps to 16Gbps on GDDR6 at 1.35V, coupled with lower voltages than even GDDR5X by using LP4X. Samsung indicates a power reduction upwards of 20% with post-LP4 memory technology.
Samsung is looking toward 2018 for production of GDDR6, giving GDDR5X some breathing room yet. As for HBM, SK Hynix is already looking toward HBM3, with HBM2 only presently available in the GP100 Accelerator cards. HBM3 will theoretically run a 4096-bit interface with upwards of 2TB/s throughput, at 512GB/s per stack. We'll talk about this tech more in the semi-distant future.
Tom's Hardware this week reported on the new PCI Express 4.0 specification, primarily detailing a push toward a minimum spec of 300W power transfer through the slot, but could be upwards of 500W. Without even talking about the bandwidth promises – moving to nearly 2GB/s for a single lane – the increase of power budget will mean that the industry could begin a shift away from PCI-e cables. The power would obviously still come form the power supply, but would be delivered through pins in the PCI-e slots rather than through an extra cable.
This same setup is what allows cards like a 750 Ti to function only off the PCI-e slot, because the existing spec allows for 75W to push through the PCIe bus. PCI-e 4.0 should be ratified by the end of 2016 by the PCI-SIG team, but we don't yet know the roll-out plans for consumer platforms.
AMD also detailed more of its Zen CPU architecture, something we talked about last week when the company camped out near IDF for an unveil event. The Summit Ridge chips have primarily been on display thus far, showing an 8C/16T demo with AMD's implementation of SMT, but we haven't heard much about other processors.
AMD is ditching modules in favor of CPU Complexes, or a CCX, each of which will host four CPU cores. Each CCX runs 512KB of L2 Cache per core, as seen in this block diagram, with L3 sliced into four pieces for 8MB total low-order address interleave cache. AMD says that each core can communicate with all cache on the CCX, and promises the same latency for all accesses.
It looks like the lowest SKU chips will still be quad-cores at a minimum.
Host: Steve "Lelldorianx" Burke
Video: Andrew "ColossalCake" Coleman
One of newest memory technologies on the market is HBM (High Bandwidth Memory), introduced on the R9 Fury X. HBM stacks 4 memory dies atop an interposer (packaged on the substrate) to get higher density modules, while also bringing down power consumption and reducing physical transaction distance. HBM is not located on the GPU die itself, but is on the GPU package – much closer than PCB-bound GDDR5/5X memory modules.
AMD's GPU architecture roadmap from its Capsaicin event revealed the new “Vega” and “Navi” architectures, which have effectively moved the company to a stellar naming system. A reasonable move away from things associated with hot, at least – Volcanic Islands, Hawaii, and Capsaicin included.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.