Final EVGA VRM Torture Test: VRM Thermals Not the Killer of Cards

By Published November 23, 2016 at 4:40 pm
  •  

Two EVGA GTX 1080 FTW cards have now been run through a few dozen hours of testing, each passing through real-world, synthetic, and torture testing. We've been following this story since its onset, initially validating preliminary thermal results with thermal imaging, but later stating that we wanted to follow-up with direct thermocouple probes to the MOSFETs and PCB. The goal with which we set forth was to create the end-all, be-all set of test data for VRM thermals. We have tested every reasonable scenario for these cards, including SLI, and have even intentionally attempted to incinerate the cards by running ridiculous use scenarios.

Thermocouples were attached directly to the back-side of the PCB (hotspot previously discovered), the opposing MOSFET (#2, from bottom-up), and MOSFET #7. The seventh and second MOSFETs are those which seem to be most commonly singed or scorched in user photos of allegedly failed EVGA 10-series ACX 3.0 cards, including the GTX 1060 and GTX 1070. Our direct probe contact to these MOSFETs will provide more finality to testing results, with significantly greater accuracy and understanding than can be achieved with a thermal imager pointed at the rear-side of the PCB. Even just testing with a backplate isn't really ideal with thermal cameras, as the emissivity of the metal begins to make for questionable results -- not to mention the fact that the plate visually obstructs the actual components. And, although we did mirror EVGA & Tom's DE's testing methodology when checking the impact of thermal pads on the cards, even this approach is not perfect (it does turn out that we were pretty damn accurate, though, but it's not perfect. More on that later.). The pads act as an insulator, again hiding the components and assisting in the spread of heat across a larger surface area. That's what they're designed to do, of course, but for a true reading, we needed today's tests.

Video Version of this Content: EVGA GTX 10-Series Temperatures Not the Issue

Note: We will almost certainly not make return on this testing investment. Content like this is made possible only with the support of our readers, like through Patreon.

Some of the testing content in this article will be straight from the script of the video -- no reason to write it twice -- but we do also have a handful of extra charts that won't be found in the video content. One set of charts, for instance, is thermal testing of the cards with the thermal pads installed, but without the VBIOS update.

Recap of EVGA VRM "Issues"

Let's recap the basics again. We recently validated a Tom's Hardware test, which suggested that EVGA ACX devices were heating up on the back-side of the VRM to north of 100C. Note that VRMs can handle 100C no problem, but the temperatures that Tom's had shown -- hitting 114C in some reports -- were beginning to enter a range of being concerning. The fact that a few users began sharing photos of scorched PCBs furthered this concern of temperature-related damage to EVGA cards.

We validated their methods by deploying a thermal camera just like them, but noted that emissivity and the delta between the back-side PCB and front-side VRMs could be significant, and we decided that thermal imaging was not sufficient to fully evaluate the situation. EVGA issued thermal pad mods optionally, and a VBIOS update that increased the aggression of the fan speed curve. We declared that both of these, by thermal imaging, were enough to fix the problem.

Until today, this problem was largely assumed to be because of EVGA's lack of thermal pads between the baseplate and heatsink on the VRM side, and between the PCB and the backplate on the back-side. But we've got new findings which definitively indicate that this is not the only cause of failure.

Now, in addition to the tests posted by Tom's DE and by our later follow-up, some users have complained that high VRM temperatures are causing black screen defects. This is not true. EVGA had black screen issues on the first ~4% of its shipping product, resolved a few months ago, but they were entirely unrelated to the VRM temperature. If a VRM gets too hot, it will not do so with grace. There will be no "black screen" that can be resolved by a restart. The FETs / power stages will go up in a puff of smoke, and the card will never turn on again. These are two unrelated issues. The black screen defect -- for which we own one card exhibiting the issue -- was already resolved.

A few users have also indicated that VRAM contact is not sufficient between the heatsink and the VRAM thermal pads. We have not observed this on our cards. That is not to say that there is no such issue, but does mean that we can't validate it. VRAM modules can handle pretty high heat, anyway, and EVGA has begun shipping VRAM pads in addition to the VRM pads.

New Testing Procedures

evga-vrm-burn-in

evga-vrm-burn-in-2

So, then, we need to investigate the impact of thermals on card life. EVGA's issuance of thermal pads might suggest that there is something more to learn here, and so we'll be performing the following tests:

  • Tests with the card stock, with the old VBIOS and no thermal pads

  • Tests with just the VBIOS update

  • Tests with thermal pads only (no VBIOS u pdate)

  • And tests with the thermal pads and the VBIOS update

This is all being done on our pair of EVGA GTX 1080 FTWs.

In addition to these test categories, we will run about a half-dozen tests on each configuration:

  • Kombustor's implementation of FurMark

  • Metro: Last Light and DiRT Rally

  • Overclocking and overvolting

  • Brief SLI testing

  • And high ambient torture tests

A few additional tests were performed, like FurMark (non-Kombustor) testing, 3DMark, and a few other games, but we began scrapping a few of the less useful tests as we narrowed the useful data-set to the above passes.

In our previous video on EVGA test planning, we explain that our new tests apply K-type thermocouples directly to the rear-side of the PCB and to hotspot MOSFETs numbers 2 and 7 when counting from the bottom of the PCB. The thermocouples used are flat and are self-adhesive (from Omega), as recommended by thermal engineers in the industry -- including Bobby Kinstle of Corsair, whom we previously interviewed.

K-type thermocouples have a known range of approximately 2.2C. We calibrated our thermocouples by providing them an "ice bath," then providing them a boiling water bath. This provided us the information required to understand and adjust results appropriately.

As for other concerns, these were largely discussed in that EVGA test planning content. We'd mostly have to look out for (1) thermal conductivity and the impact of a thermocouple in its area of placement, and (2) electrical conductivity and avoiding inadvertent damage to components by accidentally causing an electrical short.

With Kinstle's help, we were able to locate flat thermocouples with an adhesive that will not prohibit transfer of heat between the MOSFET casing and its present thermal pads. As a reminder, EVGA included thermal pads between the FETs and the base plate on all cards from the get-go. The only places in which pads were not provided were between the base plate and the heatsink, and between the backplate and the PCB.

Our next point of concern was smaller, as it'd be easier to resolve and spot: EMI caused by inductors or the power plane PCB. We were able to avoid electromagnetic interference by routing the thermocouple wiring right, toward the less populated half of the board, and then down. The cables exit the board near the PCI-e slot and avoid crossing inductors. This resulted in no observable/measurable EMI with regard to temperature readings.

We decided to deploy AIDA64 and GPU-Z to measure direct temperatures of the GPU and the CPU (becomes relevant during torture testing, when we dump the CPU radiator's heat straight into the VRM fan). In addition to this, logging of fan speeds, VID, vCore, and other aspects of power management were logged. Because VRMs are not measurable through software, our direct thermocouples will handle that aspect of testing.

The test platform is detailed below:

GN Test Bench 2015 Name Courtesy Of Cost
Video Card EVGA GTX 1080 FTWs EVGA ~$740
CPU Intel i7-5930K CPU 3.8GHz iBUYPOWER   
$580
Memory Corsair Dominator 32GB 3200MHz Corsair $210
Motherboard EVGA X99 Classified GamersNexus $365
Power Supply NZXT 1200W HALE90 V2 NZXT $300
SSD HyperX Savage SSD Kingston Tech. $130
Case Top Deck Tech Station GamersNexus $250
CPU Cooler NZXT Kraken X61 CLC NZXT $110

What The VRM Can Theoretically Handle

As for the VRM, EVGA's TjMax is 150C, and probably—one would hope—should initiate OTP at 180C. The power stages best operate at 100C continuous, but have a tCase of 125C. Inductors really don't much matter for heat dissipation since they can take so damn much -- it's just copper wire coiled inside of a natural heatsink -- but they do heat up neighboring components. The FETs have a thermal pad contacting the baseplate, it's just that the original cards had no pad to allow transfer of heat from the base plate to the heatsink.

And after our tutorial on applying those thermal pads, we saw some misguided comments about EVGA's suggested placement of the pad atop the chokes. That is the best way to get contact to the fins of the heatsink and transfer heat and, even though it's not as much surface area as a coldplate, it still performs damn well—a far cry better than the stock configuration. Some folks seemed to think that this pad placement stopped air from getting down there; well, air never got down there anyway, and the thermal pads flanking the chokes dictate that it couldn't get beyond the inductors to begin with. Air also has a terribly low thermal conductivity -- you're looking at something like 0.3W/mK at 25C, as opposed to thermal pads (minimally 10W/mK, though we don't have an exact number) and aluminum (~205W/mK at 25C).

The argument about air not "being able to get to the VRM with the new pads" is uninformed. Ignore it. We will prove that with testing later.

Then, of course, there are rumor mills like WCCFTech, which take no shame in pumping-out headlines like "VRM Burn-Out Issue Caught On Camera" without ever even attempting to validate if the VRMs are the issue. Other sites have used the word "explode" generously in headlines.

We're here to bring some actual testing to the discussion, hopefully bringing it back down to reality.

EVGA VRM Noise Test with VBIOS Profile Update

First, let's recap on noise. Our original noise tests on the EVGA VRM fan were conducted using preliminary information on the new VBIOS and its more aggressive fan speed profile. Since that time, EVGA's publicly issued VBIOS update reduced the fan speed profile from what we were initially provided. The final, maximum fan speed for a single card seems to sit around 1900~2050RPM, rather than the initially planned 2200RPM. The impact on noise is somewhat substantial, since our first tests showed a ~10dBA increase from the ~1600RPM of the original VBIOS. Here's the update with the final, public VBIOS profile:

evga-1080-ftw-noise-2

EVGA VRM Thermal Testing (FurMark Kombustor): Old VBIOS, No TPADS

Let's start with the complete stock card, as it originally shipped from EVGA.

ftw-vrm-fm-old-vbios-no-oc

This first test is the stock card without overclock, running Kombustor FurMark as a burn-in. Remember that FurMark is sort of a power virus, and loads the VRM more heavily than any game will ever do. Also note that FurMark doesn't blast the clock as much as a game would, but load is still heavy.

Here's the chart. The colors will be the same for every chart shown, so memorize them n ow: Yellow is MOSFET 7, counting bottom-up, and is a significant hotspot on the card. This is yellow. MOSFET 2 is a common scorch point on photos we've seen online, toward the bottom of the card. This is orange. PCB is cyan, and is measured on the hotspot on the rear-side of the video card with the backplate on. GPU temperature is white and measured by software. The ambient temperature is also critical to these tests, as we'll later double ambient. That's the darker blue line at the bottom.

We're seeing the PCB achieving temperatures just shy of 100C after a one-hour burn-in. The MOSFETs are both at around 90-94C, with MOSFET 7 running a bit warmer. Ambient was in the low 20s. Case ambient, as we show in our 570X review, can be upwards of 40C in some enclosures. That would account for some gains in temperature, but not a 1-to-1 gain. We'll test for this situation later in this article.

So far, though, these are all numbers that the card is built to handle -- and that's with FurMark.

EVGA VRM Thermal Testing (Gaming): Old VBIOS, No TPADS

ftw-vrm-mll-old-vbios-no-oc

Here's Metro: Last Light running a burn-in. We're seeing temperatures closer to 85C for the PCB backside and MOSFET #7, with MOSFET #2 around 80C. That's about 10-20C cooler than with FurMark. Other games show similar performance results.

EVGA VRM Thermal Testing (Overclocking): Old VBIOS, No TPADS

ftw-vrm-fm-old-vbios-OC

This chart shows the overclocking impact on a 1080 FTW without the VBIOS update and without thermal pads, as benchmarked using FurMark. Temperatures get a little warmer here, now nearing 105C on the PCB and MOSFET 7. That's hot enough that high case ambient would decrease your efficiency as you near 110C, but you will still be within safe operating range as we show in the forthcoming high ambient tests. The overclock was +30% power (engaging the master switch), +100% OV allowance, and +125MHz core / +450MHz memclock.

Now, before that high ambient test, and before applying thermal pads and VBIOS updates, the next goal is to test SLI 1080 FTWs with a one-slot spacing between them.

Continue to the next page for SLI testing and VBIOS/TPAD testing.

Continue to Page 2.


Prev Next »

Last modified on November 23, 2016 at 4:40 pm
Steve Burke

Steve started GamersNexus back when it was just a cool name, and now it's grown into an expansive website with an overwhelming amount of features. He recalls his first difficult decision with GN's direction: "I didn't know whether or not I wanted 'Gamers' to have a possessive apostrophe -- I mean, grammatically it should, but I didn't like it in the name. It was ugly. I also had people who were typing apostrophes into the address bar - sigh. It made sense to just leave it as 'Gamers.'"

First world problems, Steve. First world problems.

  VigLink badge