AMD Vega Liquid Cooling Overclocking
The Vega FE Hybrid mod posted reasonable gains in overclocking over the air-cooled counterpart, something we originally thought was due to more aggressive clock scaling at lower temperatures – similar to what’s seen on Pascal. We later learned it was to do with power leakage and power limitations on the GPU, as we’ll dig into momentarily.
We were able to max-out our stock card overclock at around 1660MHz with an 1100MHz HBM2 OC, which we ultimately found to govern performance gains most heavily.
Here’s a look at the overclock stepping for the stock card:
|Vega FE Stock OC|
|Peak Clock (MHz)||AVG Clock (MHz)||Target CLK||Core Offset (MHz)||MEM CLK (MHz)||MEM Offset (MHz)||Power Target %||Fan||GPU TMP (C)||Pass/Fail|
|1680||-||-||80||945||150%||42%||79||F - Hang|
|1670||1600~1670||1670||70||945||150%||50%||79||F - Crash|
And here’s the overclock stepping with our Vega FE Hybrid mod:
|GamersNexus Vega FE Hybrid Mod|
|Peak Clock (MHz)||AVG Clock (MHz)||Target CLK||Core Offset (MHz)||MEM CLK (MHz)||MEM Offset (MHz)||Power Target %||Radiator Fan||GPU TMP (C)||Pass/Fail||Current (A)||FET Upper (C)||FET Lower||Ambient||Logged|
|1715||1715||1715||115||1050MHz||105||150%||100%||49||F - System Crash||31~33.3||63.6||54.1||24||10:27|
|1710||1710||1710||110||1050MHz||105||150%||100%||49||F- System Crash|
|Crash / Restart|
|1700||1700||1700||100||1125MHz||180||150%||100%||49||F - System Crash||31~33.3||65.3||55.4||24.1||10:46|
The Hybrid card, with the help of some fans pointed at the VRM components, was able to overclock to 1705MHz completely stable, with an 1105MHz HBM2 OC. We ended up running all our tests at 1700/1100 for now, but will be revisiting with slightly higher clocks later. The VRM fans proved unnecessary after more testing, but we’ll get to that momentarily.
Pushing to 1710MHz resulted in a near-instant crash, and measuring at the PCIe cables shows that this is when power throttling begins to occur with greater frequency (causing the instability and subsequent crash). We’re hoping to attempt some BIOS mods – no promises – to increase TDP. Just depends on what tools are made available to us, or what we can figure out through external tools.
AMD Vega FE Hybrid Power Consumption at the Rails
Speaking of power consumption, the OC stepping table above shows some of that: Our overclocked Vega FE Hybrid mod had us around 33.3A at the rails, not counting PCIe power, for a GPU power consumption of 400W. That’s GPU power, not total system power (which is new for our tests, hence italics to emphasize). This is the limit for Vega, and is the reason we can’t OC higher.
We can’t yet measure at the PCIe interface, but we’re working on it. For now, we know that 33.3A * 12V = ~400W through the cables, plus whatever’s going through PCIe (up to 75W, by spec).
The VRM can definitely handle more current, it’s just a matter of getting Vega: FE to accept it.
Either way, the most notable element here is what’s not shown in temperature plots: noise output.
AMD Vega Air vs. Liquid Mod Noise Levels
The Hybrid mod outputs a constant noise level of 43dBA with the radiator fan only, or 50-53dBA with the maglev fan setup (+1 to +2 Corsair ML120 fans). Meanwhile, in order to sustain its lower overclock, the air-cooled card was whining at about 67dBA output. The perceived noise difference is significant, at more than a 2x perceived reduction with the Hybrid mod.
The ML120 additions proved nice to have, but ultimately were unnecessary. We could keep the FETs down without them.
Power Leakage Testing: AMD Vega FE Air vs. Liquid
This next test is new. We’re looking at power leakage here, using a current clamp to measure power consumption at the 12V rails from the PSU (+/-2% error). We’re not measuring PCIe power yet, but maybe in the future.
This test requires the card to operate at a fixed, reliable speed without the fluctuations observed in the stock air-cooled card, which can never really attain 1600MHz fixed anyway. As such, we’re running the core clock at 1300MHz – since that can be sustained without change – and a memory clock of 1000MHz, with the power offset at +50%.
The result is above. With the air-cooled card, we started encountering power throttling that had us bouncing between 18A and 25A once the GPU hit 85C, and so we were forced to increase the fan RPM manually from its auto configuration. This fluctuation is indicative of small throttling occurring at a level that doesn’t always show up in clock measuring software. There is performance loss from the high thermals.
Increasing the fan RPM to 51dBA and sustaining about 80-83C on the core, we see that power consumption is around 26A from the PSU, whereas the Hybrid card’s power consumption – with temperatures around 43C – sits closer to the 23.4A mark. This is also largely unchanging from the start of the test to the end, while the air-cooled card scales from about 22A to 26A over the period of the test. After the relatively short burn-in period, the air-cooled card is drawing an additional 2.6A over the hybrid mod, resulting in a power leakage of 31W due to heat.
Keep in mind again that this is with intentionally lower clocks for a leakage test, not with the high clocks we OC to.
One more note: Here’s a look at the FET temperatures under this particular load. Our Hybrid mod is able to keep the MOSFETs around or below 50C for the entire test. The stock card is hitting the upper 80s and 90s in this particular load & fan configuration. Note that these numbers are not comparable to the initial review tests due to a completely different test methodology and software benchmark. This is within spec, but is certainly contributing to the power leakage and boosted temperature of the GPU core, given that they ultimately share the cooling solution.
Bumping Into the Power Limit
This is really cool data, and isn’t something we knew how to do back when the Fury X launched. If we had, we’d be able to show more specifically why there’s a reason for liquid cooling – or at least more intense air cooling – on these power-hungry cards. In the previous test, we showed that we achieved an additional 40MHz OC on the core with the liquid mod. The reason for that, we think, is the additional power headroom; by reducing power leakage and improving efficiency from the hybrid mod, we’re able to gain an additional 31W of power to send through the core. This number scales differently at higher loads, of course.
This also verifies what we already learned: Vega is limited on power, not on thermals. There’s some cross-over in power leakage, clearly, but after we solve for the thermal limitation, we bump against the power limit of about 400W at the PCIe cables (drawing ~33A with overclocks). Our next goal is to increase the power limit in BIOS. We might try this with a Raspberry Pi mod, as ATIFlash doesn’t yet work with Vega: FE.
AMD Vega FE Air vs. Hybrid Clock Stability Testing
Just to illustrate the clock-rate stability with these overclocks, here’s a look at frequency versus time as our test bench trudges through an automated series of 3DMark benchmarks, scaling from Ultra down to Extreme and Normal, followed-up with TimeSpy. Both clocks are fairly stable in this particular run, but that’s because we’ve boosted the fan speed on the air-cooled card to prevent throttling.
If we plot the stock air-cooled card with no OC and auto fan speeds, this next line is what shows up:
More throttling is apparent in the stock configuration, including auto fan speeds, as the clock fluctuates to try and keep up with thermal demand.
Vega Overclocking Power Consumption – SPECviewperf (Air vs. Hybrid FE)
Moving on to power versus thermals in a SPECviewperf workload, which cycles load as the test executes, we see the overclocked Hybrid configuration drawing about 600W from the wall during the heaviest load. To compare versus the stock card, this is an increase from around 440W load from the wall. Again, that’s from the wall, not the rails, as we started taking these measurements before we got our current clamp.
Despite the increased power draw of the Hybrid overclock, we’re actually way down in thermals from the air-cooled card. The liquid cooler also soaks thermal changes and high thermal load spikes with greater finesse, ultimately topping out at around 47C, where the stock card was hitting 81C.
Vega Hybrid Mod vs. Vega FE Air Component Thermals
Finally, here’s a quick steady state thermal chart to show the temperature performance of various board components when the card is put through a power virus scenario. We see significant improvement not only on the GPU temperature, but the two hotspots on the VRM. Even without the additional fans providing direct airflow to the VRM components, our Hybrid mod is well within any component heat specifications – most of which are 125C. We’re only showing Vega on this chart. For comparative thermals and noise-normalized thermals, check our initial Vega: FE review.