Test Methodology: Standard & Gaming
Windows 10 b970 was used for testing. R7 CPUs have been retested; some i7 & i5 CPUs have been retested. Game settings were manually controlled for the DUT. All games were run at presets defined in their respective charts. All other game settings are defined in respective game benchmarks, which we publish separately from GPU and CPU reviews.
Average FPS, 1% low, and 0.1% low times are measured. We do not measure maximum or minimum FPS results as we consider these numbers to be pure outliers. Instead, we take an average of the lowest 1% of results (1% low) to show real-world, noticeable dips; we then take an average of the lowest 0.1% of results for severe spikes.
Hardware Used
Core Components (Unchanging)
- NZXT 1200W Hale90v2
- For DDR4 platforms: Corsair Vengeance LPX 32GB 3200MHz*
- ** For Initial Ryzen DDR4: Corsair Vengeance LPX 3000MHz clocked to 2933MHz (See Page 2)
- For Ryzen R5 CPUs, Retested R7 CPUs: GEIL EVO X 3200MHz memory (clocked to 3200MHz)
- Premiere & Blender tests do not exceed 8GB DRAM. Capacity is a non-issue for our testing, so long as it is >16GB
- For DDR3 platforms: HyperX Savage 32GB 2400MHz
- Intel 730 480GB SSD
- Open Air Test Bench
- Cooler #1 (Air): Be Quiet! Dark Rock 3
- Cooler #2 (Cheap liquid): Asetek 570LC w/ Gentle Typhoon fan
- Cooler #3 (High-end): Kraken X62
- EVGA GTX 1080 FTW1
Note: fan and pump settings are configured on a per-test basis.
X299 Platform:
- ASUS X299 Prime Deluxe with latest EFI (0402)
- Corsair Vengeance LPX 3200MHz
AM4 Platform:
- GEIL X 3200MHz CL16 (R5s, R7 1700, R7 1800X)
- GSkill Trident Z 3200MHz CL14 (R7 1700X)
Used for R7 1800X, R7 1700X, R7 1700.
Z270 Platforms:
- Gigabyte Aorus Gaming 7 (primary)
- MSI Gaming Pro Carbon (secondary - for thermal validation)
- i7-7700K (x2) samples from motherboard vendors
Both used for the 7700K.
Z170 Platform:
- MSI Gaming M7
- i7-6700K retail
Z97 Platform:
- Gigabyte Z97X G1 WIFI-BK
- i7-4790K
Z77 Platform:
- MSI GD65 Z77
- i7-2600K
Dx12 games are benchmarked using PresentMon onPresent, with further data analysis from GN-made tools.
Note: We'd like to add the i5, i3, and FX CPUs, but this was enough for now. We'll add those as we expand into coverage of Zen or i5 Kaby Lake products.
Thermal Test Methodology
Thermal measurement on Ryzen is not necessarily trivial, as most software is incorrect or inaccurate in these early days. See this page from our 1800X review for further information, or AMD’s own statement.
Power testing is simply done at the wall. We do not presently tap into the rails, and openly identify this as our weakest point in current test methodology. This is something we will eventually work toward revamping. For now, we use wall meters to determine a power delta in A/B tests.
Test Methodology: VR Benchmarking
We previously ran an introductory piece to the behind-the-scenes process of trying to figure out VR testing, something we started in September. To go through some of the basics:
Two rigs are established. There is a game benchmark machine and a hardware capture machine, which must meet high specifications for storage and for incoming data from the split headsets. The configurations are as follows:
Intel VR Game Test Bench | |||
Component | Provided by | Price | |
CPU | Intel i7-7700K | GamersNexus | $345 |
Cooler | Asetek 570LC w/ Gentle Typhoon | Asetek GamersNexus |
- |
Motherboard | Gigabyte Z270 Gaming 7 | Gigabyte | $230 |
RAM | Corsair Vengeance LPX 3200MHz | Corsair | $135 |
GPU | GTX 1080 Ti Hybrid | NVIDIA | $700 |
Storage 1 | Plextor M7V | Plextor | $96 |
Storage 2 | Crucial MX300 1TB | GamersNexus | $280 |
PSU | NZXT Hale90 v2 1200W | NZXT | $270 |
Case | Open-air test bench | GamersNexus | - |
And for AMD:
AMD VR Game Test Bench | |||
Component | Provided by | Price | |
CPU | AMD R7 1700 | AMD | $330 |
Cooler | Asetek 570LC w/ Gentle Typhoon | Asetek GamersNexus |
- |
Motherboard | Gigabyte Gaming 5 X370 | Gigabyte | $213 |
RAM | Corsair Vengeance LPX 3000MHz | AMD | $120 |
GPU | GTX 1080 Ti Hybrid | NVIDIA | $700 |
Storage 1 | Plextor M7V | Plextor | $96 |
Storage 2 | Crucial MX300 1TB | GamersNexus | $280 |
PSU | NZXT Hale90 v2 1200W | NZXT | $270 |
Case | Open-air test bench | GamersNexus | - |
Our hardware capture system is as follows:
Hardware Capture VR Test Bench | |||
Component | Provided by | Price | |
CPU | Intel i7-4790K | GamersNexus | $330 |
Cooler | Stock | GamersNexus | - |
Motherboard | Gigabyte Z97X Gaming 7 G1 BK | GamersNexus | $300 |
RAM | HyperX Genesis 2400MHz | HyperX | - |
GPU | ASUS GTX 960 Strix 4GB | ASUS | - |
Storage 1 | Intel 750 SSD 1.2TB | BS Mods | $880 |
Capture Card | Vision SC-HD4 | NVIDIA | $2,000 |
PSU | Antec Edge 550W | Antec | - |
Case | Open-air test bench | GamersNexus | $250 |
The hardware capture system is the most important. We need to sustain capability to process heavy IO, and so use an Intel 750 SSD 1.2TB SSD as provided by our friends at BS Mods. The 1.2TB capacity isn’t just for show, either: Our VR capture files can take upwards of 30-50GB per capture. GamersNexus uses an in-house made compression script (programmed by Patrick Lathan & Steve Burke) to compress our files into a playable format for YouTube, while also allowing us to retain the files without high archival storage requirements. The files compress down to around ~200~500MB, but do not perceptibly lose quality for YouTube playback.
Prior to compression, though, we analyze the files with an extractor tool, which looks at color overlays frame-by-frame to determine (1) if any frames were dropped by the capture machine (they never are, because our storage device is fast & $2000 capture card supports the throughput), and (2) if any frames were dropped by the game machine. The latter happens when the DUT cannot support fluid playback, e.g. if a low-end GPU or CPU gets hammered by the VR application in a way that causes drop frames, warp misses, or other unpleasant frame output.
The VR gaming machine spits out DisplayPort to our monitor, then HDMI to a splitter box. The splitter box feeds into the capture machine via a splitter cable, then into the capture card. The other output in the splitter box goes either to the headset or to the HTC Vive Link Box, which then goes to the headset & the game machine (USB, audio, display).
In total, it’s about ~10 cables in true octopus fashion to connect everything. The cables must be connected in the correct order to get everything working. No output will go to the HMD if they are connected out of sequence.
Data Interpretation: We’re Still Learning
The gaming machine, meanwhile, is running FCAT VR software capture to intercept frame delivery at a software-level, which then generates files that look something like this:
Each file contains tens of thousands of cells of data. We feed this data into our own spreadsheets and into FCAT VR, then generate both chart types from that data. The hard part, it turns out, is still data interpretation. We can identify what a “good” and “really bad” experience is in VR, but identifying anything in between is still a challenge. You could drop 100/m frames on DUT A and 0 on DUT B, and the experience will be perceptibly/appreciably the same to the end-user. If you think about it, 100 dropped frames in a 5400-interval period is still about 1.85% of all intervals missed, which isn’t all that bad. Likely not noticeable, unless they’re all clumped together and dotted with warp misses.
Defining the Terminology
We still haven’t defined those terms, so let’s do that.
Drop Frame: When the VR runtime takes the prior frame and modifies it to institute the latest head position. The VR HMD is reprojecting or adjusting the prior frame, but failing to update animation in time for the next runtime hit. With regard to animation, this is a dropped frame. With regard to user experience, we are updating in a way that avoids inducing user sickness or discomfort (provided there aren’t too many in rapid succession). We can get synthesized frames out of this.
Warp Miss: The VR HMD has missed the refresh interval (90Hz, so every ~11ms +/- 2ms), and doesn’t have time to show a new frame. There is also not enough time to synthesize a new frame. We’ve instead replayed an old frame in its entirety, effectively equivalent to a “stutter” in regular nomenclature. Nothing moves. Animation does not update and head tracking does not update. This is a warp miss, which means that the runtime couldn’t get anything done in time, and so the video driver recasts an old frame with no updates.
Delivered Frame: A frame delivered to the headset successfully (see also: Delivered Synthesized Frame).
Unconstrained FPS: A convenient metric to help extrapolate theoretical performance of the DUT when ignoring the fixed refresh rate (90Hz effective v-sync) of the HMD. This helps bring VR benchmarks back into a realm of data presentation that people are familiar with for “standard” benchmarks, and aids in the transition process. It’s not a perfect metric, and we’re still in the air about how useful this is. For now, we’re showing it. Unconstrained FPS is a calculation of 1000ms/AVG frametime. This shows what our theoretical frame output would be, given no fixed refresh interval, and helps with the challenge of demonstrating high-end device advantages over DUTs which may otherwise appear equivalent in delivered frame output.
Average Frametime: The average time in milliseconds to generate a frame and send it to the HMD. We want this to be low; ideally, this is below 11ms.
Interval Plot: A type of chart we’re using to better visualize frame performance over the course of the headset’s refresh intervals. In a 60-second test, there are 5400 refresh intervals.
Warp misses are miserable experiences, particularly with multiple in a big clump. Warp misses intermixed with drop frames illustrate that the hardware cannot keep up with the game, and so the user experiences VR gameplay that could feel unpleasant physiologically in addition to mechanically.
Learn more about these definitions here, in our previous content. We also spoke with Tom Petersen about these terms.
Test Methodology: Streaming Benchmarks
Stream benchmarking is something that we’re still working on developing. This is our first implementation, and as such, it’s not fully conclusive or indicative of all game streaming performance – but it’s a start.
We’re testing using two types of streaming: (1) Twitch only, (2) Twitch & YouTube simultaneously. We’ll soon add more to this.
For Twitch, the following settings are used:
- 1080p60
- 6Mbps
- H264 “Faster”
For YouTube, the below:
- 1080p60
- 10Mbps
- H264 “Faster”
All of this is fed through OBS and uploaded on a 40Mbps up connection, which is sufficient for the task. We log game performance during streaming, log stream performance, and then combine and align all of the data in spreadsheets later. It’s a manual process right now, but that’ll improve with time.
For now, we’re just using DiRT: Rally on a looping benchmark for 240s. As we continue to improve our understanding of game broadcasting, we’ll add more titles. For now, we wanted something that won’t receive updates and that would be reproducible. The game’s settings are 1080p, Windowed, Ultra, 4xMSAA, and with Advanced Blending enabled.
Three test passes are performed for each benchmark, producing enough data to determine the mean and if additional tests are needed to investigate outliers. Thus far, we haven’t had much of a problem with that scenario. It’s a possibility, though, particularly when relying on the network for benchmarking – hence the multi-pass solution. Our logs presently contain frametimes, framerate, and two metrics that we’re calling “delayframes” and “dropframes” (the latter terminology is from our VR benchmarking).
“Delayframes” are when the CPU couldn’t complete the encode in time to hit the 16.67ms refresh window on our 60FPS stream, but when it was close enough that it followed-up soon after the window passed. As we understand it now, this seems to happen most frequently when the CPU has the headroom to perform the encode, but is bumping into a GPU limitation – e.g. a handshake mismatch between the GPU and CPU at that particular instant.
“Dropframes” are when the CPU cannot keep up with the encoding and gaming workloads simultaneously. Because the software will prioritize local playback to streamed playback, the CPU will drop frames intended for delivery to the stream in order to better keep pace with the host system. An extreme instance might be dropping every other frame, which would have a look comparable to micro-stutter in incompatible multi-GPU gaming scenarios.
We have a lot more to learn and do here. It’s not complete, but it’s a good start.