The biggest rule in testing coolers is to never trust anything: Don’t trust the numbers, don’t trust the software, don’t trust firmware, and don’t trust the test bench. Every step of the way is a trap lying in wait to sabotage data accuracy. We’ve spent the last 3 years refining our liquid cooler bench and the last 6 months refining our new testing that will feature air coolers and liquid coolers alike. With millions of cells of data, we now know enough to have identified nearly every hidden pitfall in testing and finally feel confident in providing a full picture for accurate CPU cooler performance. The downside is that we’ll never trust anyone else’s numbers again, but the upside is that we can finally start really collecting data. This dissertation will be on the most common and the most obscure landmines for testing, laying a plan for our CPU cooler reviews and helping establish a baseline for quality and data accuracy. We promised a CPU air cooler round-up back at the end of 2016 or 2017, and we’re finally getting around to it and will be publishing a lot of cooler content over the next month or so. We’ll start with an A500 review after this methodology piece goes live, then we’ll break for our factory tour series, then we’ll be back to coolers.
This content is detailed and specific to CPU cooler testing methodology and processes. We will be using this as a reference piece for years, as it will establish testing practices to ensure accurate data. Most data out there regarding CPU coolers is flawed in some way or another, especially the stuff posted in random reddit comments, but the trick is minimizing flaws to the extent possible while remaining real-world, because total elimination of variables and pitfalls is impossible on PC hardware. Users will often randomly post a temperature number and say something like, “my Spire is at 70 degrees,” as if that actually means anything to anyone. Temperature isn’t a 3DMark score – it is completely dependent on each configuration, and so unless you’re looking at relative performance by swapping coolers in a controlled environment, you’re not really providing useful data to the discussion.
In this content, we’re going to show you 6 months of rigorous testing adventures that we’ve embarked on, including several months’ worth of discovering flaws in testing, common and uncommon errors, and bad data that could invalidate most reviews without the reviewer ever even knowing. We know because we’ve spent months catching them, hence our long wait time on publishing this content. Several of these issues will exist in other reviewer configurations without technician knowledge, but the trick is to have the right tools to flag errant testing. These concepts will range from extremely basic to advanced. We wanted to skip some basics, but realized that there’s so much bad information out there that we’d better just cover it all in one big dissertation.
Our viewers have long requested that we add standardized case fan placement testing in our PC case reviews. We’ve previously talked about why this is difficult – largely logistically, as it’s neither free in cost nor free in time – but we are finally in a good position to add the testing. The tests, we think, clearly must offer some value, because it is one of our most-requested test items over the past two years. We ultimately want to act on community interests and explore what the audience is curious about, and so we’ve added tests for standardized case fan benchmarking and for noise normalized thermal testing.
Normalizing for noise and running thermal tests has been our main, go-to benchmark for PC cooler testing for about 2-3 years now, and we’ve grown to really appreciate the approach to benchmarking. Coolers are simpler than cases, as there’s not really much in the way of “fan placement,” and normalizing for a 40dBA level has allowed us to determine which coolers have the most efficient means of cooling when under identical noise conditions. As we’ve shown in our cooler reviews, this bypasses the issue where a cooler with significantly higher RPM always chart-tops. It’s not exactly fair if a cooler at 60dBA “wins” the thermal charts versus a bunch of coolers at, say, 35-40dBA, and so normalizing the noise level allows us to see if any proper differences emerge when the user is subjected to the same “volume” from their PC cooling products. We have also long used these for GPU cooler reviews. It’s time to introduce it to case reviews, we think, and we’ll be doing that by sticking with the stock case fan configuration and reducing case fan RPMs equally to meet the target noise level (CPU and GPU cooler fans remain unchanged, as these most heavily dictate CPU and GPU coolers; they are fixed speeds constantly).
This content marks the beginning of our in-depth VR testing efforts, part of an ongoing test pattern that hopes to determine distinct advantages and disadvantages on today’s hardware. VR hasn’t been a high-performance content topic for us, but we believe it’s an important one for this release of Kaby Lake & Ryzen CPUs: Both brands have boasted high VR performance, “VR Ready” tags, and other marketing that hasn’t been validated – mostly because it’s hard to do so. We’re leveraging a hardware capture rig to intercept frames to the headsets, FCAT VR, and a suite of five games across the Oculus Rift & HTC Vive to benchmark the R7 1700 vs. i7-7700K. This testing includes benchmarks at stock and overclocked configurations, totaling four devices under test (DUT) across two headsets and five games. Although this is “just” 20 total tests (with multiple passes), the process takes significantly longer than testing our entire suite of GPUs. Executing 20 of these VR benchmarks, ignoring parity tests, takes several days. We could do the same count for a GPU suite and have it done in a day.
VR benchmarking is hard, as it turns out, and there are a number of imperfections in any existing test methodology for VR. We’ve got a good solution to testing that has proven reliable, but in no way do we claim that perfect. Fortunately, by combining hardware and software capture, we’re able to validate numbers for each test pass. Using multiple test passes over the past five months of working with FCAT VR, we’ve also been able to build-up a database that gives us a clear margin of error; to this end, we’ve added error bars to the bar graphs to help illustrate when results are within usual variance.
Benchmarking Mass Effect: Andromeda immediately revealed a few considerations for our finalized testing. Frametimes, for instance, were markedly lower on the first test pass. The game also prides itself in casting players into a variety of environs, including ship interiors, planet surfaces of varying geometric complexity (generally simpler), and space stations with high poly density. Given all these gameplay options, we prefaced our final benchmarking with an extensive study period to research the game’s performance in various areas, then determine which area best represented the whole experience.
Our Mass Effect: Andromeda benchmark starts with definitions of settings (like framebuffer format), then goes through research, then the final benchmarks at 4K, 1440p, and 1080p.
Not long ago, we opened discussion about AMD’s new OCAT tool, a software overhaul of PresentMon that we had beta tested for AMD pre-launch. In the interim, and for the past five or so months, we’ve also been silently testing a new version of FCAT that adds functionality for VR benchmarking. This benchmark suite tackles the significant challenges of intercepting VR performance data, further offering new means of analyzing warp misses and drop frames. Finally, after several months of testing, we can talk about the new FCAT VR hardware and software capture utilities.
This tool functions in two pieces: Software and hardware capture.
Thermal cameras have proliferated to a point that people are buying them for use as tech toys, made possible thanks to new prices nearer $200 than the multi-thousand thermal imaging cameras that have long been the norm. Using a thermal camera that connects to a mobile phone eliminates a lot of the cost for such a device, relying on the mobile device’s hardware for post-processing and image cleanup that make the cameras semi-useful. They’re not the most accurate and should never be trusted over a dedicated, proper thermal imaging device, but they’re accurate enough for spot-checking and rapid concepting of testing procedures.
Unfortunately, we’ve seen them used lately as hard data for thermal performance of PC hardware. For all kinds of reasons, this needs to be done with caution. We urged in our EVGA VRM coverage that thermal imaging was not perfect for the task, and later stuck thermal probes directly to the card for more accurate measurements. Even ignoring the factors of emission, transmission, and reflection (today’s topics), using thermal imaging to take temperature measurements of core component temperatures is methodologically flawed. Measuring the case temperature of a laptop or chassis tells us nothing more than that – the temperature of the surface materials, assuming an ideal black body with an emissivity close to 1.0. We’ll talk about that contingency momentarily.
But even so: Pointing a thermal imager at a perfectly black surface and measuring its temperature is telling us the temperature of the surface. Sure, that’s useful for a few things; in laptops, that could be determining if case temperature exceeds the skin temp specification of a particular manufacturer. This is good for validating whether a device might be safe to touch, or for proving that a device is too hot for actual on-lap use. We could also use this information as troubleshooting to help us determine where hotspots are under the hood, potentially useful in very specific cases.
That doesn’t, however, tell us the efficacy of the cooling solution within the computer. For that, we need software to measure the CPU core temperatures, the GPU diode, and potentially other components (PCH and HDD/SSD are less popular, but occasionally important). Further analysis would require direct thermocouple probes mounted to the SMDs of interest, like VRM components or VRAM. Neither of these two examples are equipped with internal sensors that software, and even the host GPU, is capable of reading.
Two EVGA GTX 1080 FTW cards have now been run through a few dozen hours of testing, each passing through real-world, synthetic, and torture testing. We've been following this story since its onset, initially validating preliminary thermal results with thermal imaging, but later stating that we wanted to follow-up with direct thermocouple probes to the MOSFETs and PCB. The goal with which we set forth was to create the end-all, be-all set of test data for VRM thermals. We have tested every reasonable scenario for these cards, including SLI, and have even intentionally attempted to incinerate the cards by running ridiculous use scenarios.
Thermocouples were attached directly to the back-side of the PCB (hotspot previously discovered), the opposing MOSFET (#2, from bottom-up), and MOSFET #7. The seventh and second MOSFETs are those which seem to be most commonly singed or scorched in user photos of allegedly failed EVGA 10-series ACX 3.0 cards, including the GTX 1060 and GTX 1070. Our direct probe contact to these MOSFETs will provide more finality to testing results, with significantly greater accuracy and understanding than can be achieved with a thermal imager pointed at the rear-side of the PCB. Even just testing with a backplate isn't really ideal with thermal cameras, as the emissivity of the metal begins to make for questionable results -- not to mention the fact that the plate visually obstructs the actual components. And, although we did mirror EVGA & Tom's DE's testing methodology when checking the impact of thermal pads on the cards, even this approach is not perfect (it does turn out that we were pretty damn accurate, though, but it's not perfect. More on that later.). The pads act as an insulator, again hiding the components and assisting in the spread of heat across a larger surface area. That's what they're designed to do, of course, but for a true reading, we needed today's tests.
Virtual reality has begun its charge to drive technological development for the immediate future. For better or worse, we've seen the backpacks, the new wireless tether agents, the "VR cases," the VR 5.25" panels -- it's all VR, all day. We still believe that, although the technology is ready, game development has a way to travel yet -- but now is the time to start thinking about how VR works.
NVIDIA's Tom Petersen, Director of Technical Marketing, recently joined GamersNexus to discuss the virtual reality pipeline and the VR equivalent to frametimes, stutters, and tearing. Petersen explained that a "warp miss" or "drop frame" (both unfinalized terminology) are responsible for an unpleasant experience in VR, but that the consequences are far worse for stutters given the biology involved in VR.
In the video below, we talk with Petersen about the VR pipeline and its equivalencies to a traditional game refresh pipeline. Excerpts and quotations are below.
We're working on finalizing our validation of the EVGA VRM concerns that arose recently, addressed by the company with the introduction of a new VBIOS and optional thermal pad solution. We tested each of these updates in our previous content piece, showing a marked improvement from the more aggressive fan speed curve.
Now, that stated, we still wanted to dig deeper. Our initial testing did apply one thermocouple to the VRM area of the video card, but we weren't satisfied with the application of that probe. It was enough to validate our imaging results, which were built around validating Tom's Hardware DE's results, but we needed to isolate a few variables to learn more about EVGA's VRM.
We toured Corsair's new offices about a year ago, where we briefly posted about some of the validation facilities and the then-new logo. Now, with the offices fully populated, we're revisiting to talk wind tunnels, thermal chambers, and test vehicles for CPU coolers and fans. Corsair Thermal Engineer Bobby Kinstle walks us through the test processes for determining on-box specs, explaining hundreds of thousands of dollars worth of validation equipment along the way.
This relates to some of our previous content, where we got time with a local thermal chamber to validate our own methodology. You might also be interested to learn about when and why we use delta values for cooler efficacy measurements, and why we sometimes go with straight diode temperatures (like thermal limits on GPUs).
Video here (posting remotely -- can't embed): https://www.youtube.com/watch?v=Mf1uI2-I05o
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.