AMD’s architecture hasn’t generally shown a large gain from increasing CU count between top-tier and second-to-top cards. The Fury and Fury X, for instance, could be made to match with an overclock on the lower-tiered card. Additional gains on the higher-tiered card often amount from the increased power limit and clock, not from a straight shader increase. We’re putting that knowledge to the test on Vega architecture, equalizing the Vega 56 & Vega 64 clocks (and 945MHz HBM2 clocks) to determine how much of a difference emerges from the 4096 shaders on V64 to 3584 shaders on V56. Purely counting shaders, that’s a 14% increase to V64, but like most performance metrics, that won’t result in a linear performance increase.
We were able to crush Vega 64’s performance with our heavily modded Vega 56 card, using powerplay tables and liquid to jump to 1742MHz clock speeds. That's with modding, though, and isn't out-of-box performance -- it also doesn't give us any indication as to shader differences. Going less crazy about overclocking and limiting clocks to matched speeds, we can reveal the shader count difference.
Thermals and noise to align with final launch.
There were a lot of challenges going into this build: A lack of magnetism, a lack of lighting on the show floor of a convention center, and some surprises in between. Cooler Master allowed us to build in the brand-new Cosmos C700P case – a modular chassis with an invertible or rotatable motherboard tray – live at PAX West. After being faced with some challenges along the way, we recruited Cooler Master’s Wei Yang to turn it into a collaborative team build. It was one of the most fun builds we’ve done in a while, and the pressure of time meant that we were both taking turns dropping screws and reworking our aspects of the build. This was a real PC build. There were unplanned changes, parts that GN hasn’t used before, and sacrifices made along the way.
All said and done, the enclosure is exceptionally easy to work within: Every single panel can be removed with relative ease, so we were able to strip-down the case to barebones for the build. Our biggest timesink was asking to invert the motherboard tray to face the other side, since that’d add some flare to the build. This process isn’t intrinsically difficult, but it does require removal of a lot of screws – after all, the entire case can be flipped, and there are a lot of structural elements there. The motherboard tray detaches by removing 4-6 screws on the back-side, followed by six screws in the rear of the case, followed by a few more screws for the shrouds. We got some help for this process, as the case is one of the first working samples of the Cosmos C700P and there’s not yet a manual for which screws have to be removed.
(The video for this one is a read-through of this article -- same content, just read to you.)
Everyone talks game about how they don’t care about power consumption. We took that comment to the extreme, using a registry hack to give Vega 56 enough extra power to kill the card, if we wanted, and a Floe 360mm CLC to keep temperatures low enough that GPU diode reporting inaccuracies emerge. “I don’t care about power consumption, I just want performance” is now met with that – 100% more power and an overclock to 1742MHz core. We've got room to do 200% power, but things would start popping at that point. The Vega 56 Hybrid mod is our most modded version of the Hybrid series to date, and leverages powerplay table registry changes to provide that additional power headroom. This is an alternative to BIOS flashing, which is limited to signed drivers (like V64 on V56, though we had issues flashing V64L onto V56). Last we attempted it, a modified BIOS did not work. Powerplay tables do, though, and mean that we can modify power target to surpass V56’s artificial power limitation.
The limitation on power provisioned to the V56 core is, we believe, fully to prevent V56 from too easily outmatching V64 in performance. The card’s BIOS won’t allow greater than 300-308W down the PCIe cables natively, even though official BIOS versions for V64 cards can support 350~360W. The VRM itself easily sustains 360W, and we’ve tested it as handling 406W without a FET popping. 400W is probably pushing what’s reasonable, but to limit V56 to ~300W, when an additional 60W is fully within the capabilities of the VRM & GPU, is a means to cap V56 performance to a point of not competing with V64.
We fixed that.
AMD’s CU scaling has never been that impacting to performance – clock speed closes most gaps with AMD hardware. Even without the extra shaders of V64, we can outperform V64’s stock performance, and we’ll soon find out how we do versus V64’s overclocked performance. That’ll have to wait until after PAX, but it’s something we’re hoping to further study.
We’re revisiting an old topic. A few years ago, we posted an article entitled “How Many Watts Does a Gaming PC Really Need,” which focused on testing multiple configurations for power consumption. We started working on this revisit last week, using a soon-to-be-released Bronze 450W PSU as a baseline, seeing as we’ve recently advocated for more 400-450W PSUs in PC builds. We'll be able to share more about this PSU (and its creator and name) soon. This content piece shows how far we can get on lower wattage PSUs with modern hardware.
Although we’ll be showing an overclocked 7700K + GTX 1080 FTW as the high-end configuration, we’d recommend going higher than 450W for that particular setup. It is possible to run on 450W, but we begin pushing the continuous load on the PSU to a point of driving up noise levels (from the PSU fan) and abusing the power supply. There’s also insufficient headroom for 100% GPU / 100% CPU workloads, but that should be uncommon for most of our audience. Most the forum builds we see host PSUs ranging from 700-800W+, which is often overkill for most modern gaming PCs. You’d want the higher capacity for something like Threadripper, for instance, or X299, but those are HEDT platforms. For gaming platforms, power requirements largely stop around 600W, sans serious overclocking, and most systems can get by lower than that.
Since AMD’s high-core-count Ryzen lineup has entered the market, there seems to be an argument in every comment thread about multitasking and which CPUs handle it better. Our clean, controlled benchmarks don’t account for the demands of eighty browser tabs and Spotify running, and so we get constant requests to do in-depth testing on the subject. The general belief is that more threads are better able to handle more processes, a hypothesis that would increasingly favor AMD.
There are a couple reasons we haven’t included tests like these all along: first, “multitasking” means something completely different to every individual, and second, adding uncontrolled variables (like bloatware and network-attached software) makes tests less scientific. Originally, we hoped this article would reveal any hidden advantages that might emerge between CPUs when adding “multitasking” to the mix, but it’s ended up as a thorough explanation of why we don’t do benchmarks like this. We’re using the R3 1200 and G4560 to primarily run these trials.
This is the kind of testing we do behind-the-scenes to build a new test plan, but often don’t publish. This time, however, we’re publishing the trials of finding a multitasking benchmark that works. The point of publishing the trials is to demonstrate why it’s hard to trust “multitasking” tests, and why it’s hard to conduct them in a manner that’s representative of actual differences.
In listening to our community, we’ve learned that a lot of people seem to think Discord is multitasking, or that a Skype window is multitasking. Here’s the thing: If you’re running Discord and a game and you’re seeing an impact to “smoothness,” there’s something seriously wrong with the environment. That’s not even remotely close to enough of a workload to trouble even a G4560. We’re not looking at such a lightweight workload here, and we’re also not looking at the “I keep 100 tabs of Chrome open” scenarios, as that’s wholly unreliable given Chrome’s unpredictable caching and behaviors. What we are looking at is 4K video playback while gaming and bloatware while gaming.
In this piece, the word “multitasking” will be used to describe “running background software while gaming.” The term "bloatware" is being used loosely to easily describe an unclean operating system with several user applications running in the background.
Variations of “HBM2 is expensive” have floated the web since well before Vega’s launch – since Fiji, really, with the first wave of HBM – without many concrete numbers on that expression. AMD isn’t just using HBM2 because it’s “shiny” and sounds good in marketing, but because Vega architecture is bandwidth starved to a point of HBM being necessary. That’s an expensive necessity, unfortunately, and chews away at margins, but AMD really had no choice in the matter. The company’s standalone MSRP structure for Vega 56 positions it competitively with the GTX 1070, carrying comparable performance, memory capacity, and target retail price, assuming things calm down for the entire GPU market at some point. Given HBM2’s higher cost and Vega 56’s bigger die, that leaves little room for AMD to profit when compared to GDDR5 solutions. That’s what we’re exploring today, alongside why AMD had to use HBM2.
There are reasons that AMD went with HBM2, of course – we’ll talk about those later in the content. A lot of folks have asked why AMD can’t “just” use GDDR5 with Vega instead of HBM2, thinking that you just swap modules, but there are complications that make this impossible without a redesign of the memory controller. Vega is also bandwidth-starved to a point of complication, which we’ll walk through momentarily.
Let’s start with prices, then talk architectural requirements.
Before Vega buried Threadripper, we noted interest in conducting a simple A/B comparison between Noctua’s new TR4-sized coldplate (the full-coverage plate) and their older LGA115X-sized coldplate. Clearly, the LGA115X cooler isn’t meant to be used with Threadripper – but it offered a unique opportunity, as the two units are largely the same aside from coldplate coverage. This grants an easy means to run an A/B comparison; although we can’t draw conclusions to all coldplates and coolers, we can at least see what Noctua’s efforts did for them on the Threadripper front.
Noctua’s NH-U14S cooler possesses the same heatpipe count and arrangement, the same (or remarkably similar) fin stack, and the same fan – though we controlled for that by using the same fan for each unit. The only difference is the coldplate, as far as we can tell, and so we’re able to more easily measure performance deltas resultant primarily from the coldplate coverage change. Noctua’s LGA115X version, clearly not for TR4, wouldn’t cover the entire die area of even one module under the HIS. The smaller plate maximally covers about 30% of the die area, just eyeballing it, and doesn’t make direct contact to the rest. This is less coverage than the Asetek CLCs, which at least make contact with the entire TR4 die area, if not the entire IHS. Noctua modified their unit to equip a full-coverage plate as a response, including the unique mounting hardware that TR4 needs.
The LGA115X NH-U14S doesn’t natively mount to Threadripper motherboards. We modded the NH-U14S TR4 cooler’s mounting hardware with a couple of holes, aligning those with the LGA115X holes, then routed screws and nuts through those. A rubber bumper was placed between the mounting hardware and the base of the cooler, used to help ensure even and adequate mounting pressure. We show a short clip of the modding process in our above video.
Vega’s partnership with the Samsung CF791, prior to the card even launching, was met with unrelenting criticism of the monitor’s placement in bundles. Consumer reports on the monitor mention flickering with Ultimate Engine as far back as January, now leveraged as a counter to the CF791’s inclusion in AMD’s bundle. All these consumer reports and complaints largely hinged on Polaris or Fiji products, not Vega (which didn’t exist yet), so we thought it’d be worth a revisit with the bundled card. Besides, if it’s the bundle of the CF791 with Vega that caused the resurgence in flickering concerns, it seems that we should test the CF791 with Vega. That’s the most relevant comparison.
And so we did: Using Vega 56, Vega: FE, and an RX 580 Gaming X (Polaris refresh), we tested Samsung’s CF791 34” UltraWide display, running through permutations of FreeSync. Some such permutations include “Standard Engine” (OSD), “Ultimate Engine” (OSD), and simple on/off toggles (drivers + OSD).
As exciting as it is to see “+242% power offset” in overclocking tools, it’s equally deflating to see that offset only partly work. It does, though, and so we’ve minimally managed to increase our overclocking headroom from the stock +50% offset. The liquid cooler helps, considering we attached a 360mm radiator, two Corsair 120mm maglev fans, a Noctua NF-F12 fan, and a fourth fan for VRM cooling. Individual heatsinks were also added to hotter VRM components, leaving two sets unsinked, but cooled heavily with direct airflow.
This mod is our coolest-running hybrid mod yet, with large thanks to the 360mm radiator. There’s reason for that, too – we’re now able to push peak power of about 370-380W through the card, up from our previous limitation of ~308W. We were gunning for 400W, but it’s just not happening right now. We’re still working on BIOS mods and powerplay table mods.
Following an initial look at thermal compound spread on AMD’s Threadripper 1950X, we immediately revisited an old, retired discussion: Thermal paste application methods and which one is “best” for a larger IHS. With most of the relatively small CPUs, like the desktop-grade Intel and AMD CPUs, it’s more or less been determined that there’s no real, appreciable difference in application methods. Sure – you might get one degree Centigrade here or there, but the vast majority of users will be just fine with the “blob” method. As long as there’s enough compound, it’ll spread fairly evenly across Intel i3/i5/i7 non-HEDT CPUs and across Ryzen or FX CPUs.
Threadripper feels different: It’s huge, with the top of the IHS measuring at 68x51mm, and significantly wider on one axis. Threadripper also has a unique arrangement of silicon, with four “dies” spread across the substrate. AMD has told us that only two of the dies are active and that it should be the same two on every Threadripper CPU, with the other two being branded “silicon substrate interposers.” Speaking with Der8auer, we believe there may be more to this story than what we’re told. Der8auer is investigating further and will be posting coverage on his own channel as he learns more.
Anyway, we’re interested in how different thermal compound spreading methods may benefit Threadripper specifically. Testing will focus on the “blob” method, X-pattern, parallel lines pattern, Asetek’s stock pattern, and AMD’s recommended five-point pattern. Threadripper’s die layout looks like this, for a visual aid:
Because of the spacing centrally, we are most concerned about covering the two clusters of dies, not the center of the IHS; that said, it’s still a good idea to cover the center as that is where the cooler’s copper density is located and most efficient.
Our video version of this content uses a sheet of Plexiglass to illustrate how compound spreads as it is applied. As we state later in the video, this is a nice, easy mode of visualization, but not really an accurate way to show how the compound spreads when under the real mounting force of a socketed cooler. For that, we later applied the same NZXT Kraken X62 cooler with each method, then took photos to show before/after cooler installation. Thermal testing was also performed. Seeing as AMD has permitted several other outlets to post their thermal results already, we figured we'd add ours to the growing pool of testing.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.