AMD’s High-Bandwidth Cache Controller protocol is one of the keystones to the Vega architecture, marked by RTG lead Raja Koduri as a personal favorite feature of Vega, and highlighted in previous marketing materials as offering a potential 50% uplift in average FPS when in VRAM-constrained scenarios. With a few driver revisions now behind us, we’re revisiting our Vega 56 hybrid card to benchmark HBCC in A/B fashion, testing in memory-constrained scenarios to determine efficacy in real gaming workloads.
What is HBCC?
HBCC is the controller for AMD’s high-bandwidth cache, what the company has functionally renamed from VRAM. There is no hard threshold as to what governs the naming designation of “high-bandwidth cache,” and should AMD produce a hypothetic GDDR5 Vega GPU, its framebuffer would also be named “high-bandwidth cache.” The card does not need HBM to have its framebuffer designated as HBC, in other words.
AMD’s High-Bandwidth Cache Controller is disabled by default. When enabled, the controller effectively converts VRAM into a last-level cache equivalent, then reserves a user-designated amount of system memory for allocation to the GPU. If the applications page-out of the on-card 8GB of HBM2, a trade-off between latency and capacity occurs and the GPU taps-in to system memory to grab its needed pages. If you’re storing 4K textures in HBM2 and exceed that 8GB capacity, and maybe need another 1GB for other assets, those items can be pushed to system memory and pulled via the PCIe bus. This is less effective than increasing on-card memory, but is significantly cheaper than doing so – even in spite of DDR pricing. Latency is introduced by means of traveling across the PCIe interface, through the CPU, and down the memory bus, then back, but it’s still faster than having to dump memory and swap data locally.
Hypothetically, this technology would permit large, working datasets to exist on system memory and give the GPU’s memory controller somewhat direct access to that dataset. Although it’s worse for latency, technically speaking, the additional capacity should outweigh that latency deficit in certain scenarios. It’s just going to take proper implementation to ever realize the gains, and that implementation must happen at both the software developers and AMD.
In driver version 17.10.1 (tested here), AMD’s HBCC segment remains disabled by default. The option can be toggled through Radeon Settings, whereupon the user can designate a total capacity for system memory allocation to Vega GPUs. We’re allocating a total of 18GB of memory (8GB VRAM + 10GB system RAM) to our Vega 56 Hybrid card (with V56 original BIOS). For Shadow of War, we drop that allocation to 12GB, just to ensure no issues with the 4K texture pack and its system memory capacity sensitivities. We are testing pre-FCU.
For memory, we’re running 3200MHz CL16 Corsair Vengeance LPX DRAM with a 32GB total capacity.
|GN Test Bench 2017||Name||Courtesy Of||Cost|
|Video Card||This is what we're testing||-||-|
|CPU||Intel i7-7700K 4.5GHz locked||GamersNexus||$330|
|Memory||Corsair Vengeance LPX 3200MHz||Corsair||-|
|Motherboard||Gigabyte Aorus Gaming 7 Z270X||Gigabyte||$240|
|Power Supply||NZXT 1200W HALE90 V2||NZXT||$300|
|Case||Top Deck Tech Station||GamersNexus||$250|
|CPU Cooler||Asetek 570LC||Asetek||-|
BIOS settings include C-states completely disabled with the CPU locked to 4.5GHz at 1.32 vCore. Memory is at XMP1.
The tests here primarily investigate practical applications of HBCC in gaming workloads, including synthetic / gaming-adjacent workloads. The goal is to determine when (or if) video games can make use of AMD’s HBCC.
FireStrike Benchmark – HBCC with Vega 56
To really drill-down into differences and build our accuracy and confidence, we scripted FireStrike to execute 20 test passes with HBCC on and HBCC off, using another in-house script to export all the data. There is some variance, but this many passes will average it out. We plotted a standard deviation of 27.48 points with HBCC Off and 35.8 points with HBCC On – and that’s against a score range nearing 20,000 points.
As for scoring, the HBCC Off test scored 19418.13 points, where HBCC On with 18GB allocation scored 19683.2 points. This marks HBCC enablement as providing a 1.37% performance improvement over HCC Off, which manifests itself as a boost in FPS, which we’ve charted, of 93.8 versus 95.2 for GT1, and 76.8 versus 77.8 for GT2. Again, this is over 20 test passes for each test, so we can be certain that, within the parameters of our test system, this is a repeatable difference.
TimeSpy Benchmark – HBCC On vs. Off
TimeSpy underwent the same treatment, using the same script for its repeated execution. We found much closer scores for this one, with the average graphics score at 6062.3 for HBCC disabled, or 6094.37 for HBCC enabled. That grants HBCC On an advantage of 0.53%, repeatably, and translates into a scoring of 41.7FPS versus 41.9FPS for GT1, favoring HBCC enabled, and 33.2FPS versus 33.39FPS for GT2.
Superposition Benchmark – HBCC On vs. Off
Superposition is our next synthetic test, offering a look at performance scaling from the eyes of Unigine, rather than Futuremark. Superposition wasn’t repeated as many times, but we do have five test passes for each configuration, with minor standard deviations of 2.9 points for HBCC on and 1.4 points for HBCC off. That’s against >3000 points total, marking this application as impressively consistent in scoring.
HBCC Off results in a score of 3484 points in Superposition, as opposed to 3624 with HBCC enabled. The performance difference amounts to 4% improvement, which is the biggest we’ve measured thus far; of course, we’re so bound by other elements of the card – like the core and streaming processors – that a 4% improvement is only represented by a 1FPS increase.
As for frequency when using stock settings, enabling HBCC plotted a stable frequency of about 1310-1325MHz for the first half of the test, while disabling HBCC plotted a more variable (but higher) frequency of 1315-1352MHz. This difference occurred frequently, and seemed contingent on the HBCC toggle. Although we could control for frequency to some extent, we also wanted to study the toggle’s impact on stock performance – this is part of that. Despite the slight frequency deficit on the HBCC enabled test, the card manages to improve performance a few percent.
Superposition tests were conducted at 8K as well, but plotted nearly 0 difference.