We’re testing gaming while streaming on the R5 1500X & i5-8400 today, both CPUs that cost the same (MSRP is about $190) and appeal to similar markets. Difficulties stemming from stream benchmarking make it functionally impossible to standardize. CPU changes drastically impact performance during our streaming + gaming benchmarks, which means that each CPU test falls closer to a head-to-head than an overall benchmark. Moving between R5s and R7s, for instance, completely changes the settings required to produce a playable game + stream experience – and that’s good. That’s what we want. The fact that settings have to be tuned nearly on a per-tier basis means that we’re min-maxing what the CPUs can give us, and that’s what a user would do. Creating what is effectively a synthetic test is useful for outright component comparison, but loses resolution as a viable test candidate.
The trouble comes with lowering the bar: As lower-end CPUs are accommodated and tested for, higher-end components perform at lower-than-maximum throughput, but are capped in benchmark measurements. It is impossible, for example, to encode greater than 100% of frames to stream. That will always be a limitation. At this point, you either declare the CPU as functional for that type of encoding, or you constrict performance with heavier duty encoding workloads.
H264 ranges from Ultrafast to Slowest settings, with facets in between identified as Superfast, Veryfast, Faster, Fast, and Medium. As encoding speed approaches the Slow settings, quality enters into “placebo” territory. Quality at some point becomes indistinguishable from faster encoding settings, despite significantly more strain placed on the processor. The goal of the streamer is to achieve a constant framerate output – whether that’s 30FPS or 60FPS – while also maintaining a playable player-side framerate. We test both halves of the equation in our streaming benchmarks, looking at encode output and player output with equal discernment.
We’ve already sent off the information contained in this video to Buildzoid, who has produced a PCB & VRM analysis of the ROG Strix Vega 64 by ASUS. That content will go live within the next few days, and will talk about whether the Strix card manages to outmatch AMD’s already-excellent reference PCB design for Vega. Stay tuned for that.
In the meantime, the below is a discussion of the cooling solution and disassembly process for the ASUS ROG Strix Vega 64 card. For cooling, ASUS is using a similar triple-fan solution that we highly praised in its 1080 Ti Strix model (remarkable for its noise-normalized cooling performance), along with similar heatsink layout.
Learn more here:
We’re winding-down coverage of Vega, at this point, but we’ve got a couple more curiosities to explore. This content piece looks at a mix of clock scalability for Vega across a few key clocks (for core and HBM2), and hopes to constrain for a CU difference, to some extent. We obviously can’t fully control down to the shader level (as CUs carry more than just shaders), but we can get close to it. Note that the video content does generally refer to the V56 & V64 difference as one of shaders, but more than shaders are contained in the CUs.
In our initial AMD shader comparison between Vega 56 and Vega 64, we saw nearly identical performance between the cards when clock-matched to roughly 1580~1590MHz core and 945MHz HBM2. We’re now exploring performance across a range of frequency settings, from ~1400MHz core to ~1660MHz core, and from 800MHz HBM2 to ~1050MHz HBM2.
This content piece was originally written and filmed about ten days ago, ready to go live, but we then decided to put a hold on the content and update it. Against initial plans, we ended up flashing V64 VBIOS onto the V56 to give us more voltage headroom for HBM2 clocks, allowing us to get up to 1020MHz easily on V56. There might be room in there for a bit more of an OC, but 1020MHz proved stable on both our V64 and V56 cards, making it easy to test the two comparatively.
No surprise in that headline, really.
Some of this information is rehashed, but has been bulked-up by alleged AMD slides leaked to Informatica Cero. The slides, which are functionally in “rumor” status, indicate AMD’s code-named Matisse processors as launching in 2019. The Matisse CPUs will carry AMD’s Zen 2 architecture, but aim to continue supporting AM4 platforms. Before that launch, AMD’s Zen Plus iteration is targeted for 2018, allegedly, and will exist primarily as an optimization on the existing Ryzen CPUs. This can be thought of as an analog to the retired Intel tick-tock cadence, with Zen Plus likely targeting frequency tuning.
AMD’s architecture hasn’t generally shown a large gain from increasing CU count between top-tier and second-to-top cards. The Fury and Fury X, for instance, could be made to match with an overclock on the lower-tiered card. Additional gains on the higher-tiered card often amount from the increased power limit and clock, not from a straight shader increase. We’re putting that knowledge to the test on Vega architecture, equalizing the Vega 56 & Vega 64 clocks (and 945MHz HBM2 clocks) to determine how much of a difference emerges from the 4096 shaders on V64 to 3584 shaders on V56. Purely counting shaders, that’s a 14% increase to V64, but like most performance metrics, that won’t result in a linear performance increase.
We were able to crush Vega 64’s performance with our heavily modded Vega 56 card, using powerplay tables and liquid to jump to 1742MHz clock speeds. That's with modding, though, and isn't out-of-box performance -- it also doesn't give us any indication as to shader differences. Going less crazy about overclocking and limiting clocks to matched speeds, we can reveal the shader count difference.
It’s illegal to outright fix prices of products. Manufacturers have varying levels of sway when establishing cost to distributor partners and suggested retail prices, acted on much lower in the chain, and have to produce supply based on expectations of demand. We’ve previously talked about how MDF or other exchanges can be used to inspire retailers to work within some guidelines, but there are limits to the financial and legal extension of those means.
This context in mind, it makes sense that the undertone of discussion pertaining to video card prices – not just AMD’s, but nVidia’s – plants much of the blame squarely on retailers. There’s only so much that AMD and nVidia can do to drive prices at least somewhat close to MSRP. One of those actions is to put out more supply to sate demand but, as we saw during the last mining boom & bust (with emergent ASIC miners), there’s reason for manufacturers to remain hesitant of a major supply commitment. If AMD or nVidia were to place a large order with their fabs, there’d better be some level of confidence that the product will sell. Factory-to-shelf turn-around is a period of months, weeks of which can be shipping (unless opting for prohibitively expensive air freight). A period of months is a wide window. We’ve seen mining markets “crash” and recover in a period of days, or hours, with oft unpredictable frequency and intensity. That’d explain why AMD might be hesitant to issue large orders of older product, like the RX 500 series, to try and meet demand.
While traveling, the major story that unfolded – and then folded – pertained to the alleged unlocking of Vega 56 shaders, permitting the cards to turn into a “Vega 58” or “Vega 57,” depending. This ultimately was due to a GPU-Z reporting bug, and users claiming increases in performance hadn’t normalized for the clock change or higher power budget. Still, the BIOS flash will modify the DPM tables to adjust for higher clocks and permit greater HBM2 voltage to the memory. Of these changes, the latter is the only real, relevant change – clocks can be manually increased on V56, and the core voltage remains the same after a flash. Powerplay tables can be used to bypass BIOS power limits on V56, though a flash to V64 BIOS permits higher power budget.
Even with all this, it’s still impossible (presently) to flash a modified, custom BIOS onto Vega. We tried this upon review of Vega 56, finding that the card was locked-down to prevent modding. This uses an on-die security coprocessor, relegating our efforts to powerplay tables. Those powerplay tables did ultimately prove successful, as we recently published.
Everyone talks game about how they don’t care about power consumption. We took that comment to the extreme, using a registry hack to give Vega 56 enough extra power to kill the card, if we wanted, and a Floe 360mm CLC to keep temperatures low enough that GPU diode reporting inaccuracies emerge. “I don’t care about power consumption, I just want performance” is now met with that – 100% more power and an overclock to 1742MHz core. We've got room to do 200% power, but things would start popping at that point. The Vega 56 Hybrid mod is our most modded version of the Hybrid series to date, and leverages powerplay table registry changes to provide that additional power headroom. This is an alternative to BIOS flashing, which is limited to signed drivers (like V64 on V56, though we had issues flashing V64L onto V56). Last we attempted it, a modified BIOS did not work. Powerplay tables do, though, and mean that we can modify power target to surpass V56’s artificial power limitation.
The limitation on power provisioned to the V56 core is, we believe, fully to prevent V56 from too easily outmatching V64 in performance. The card’s BIOS won’t allow greater than 300-308W down the PCIe cables natively, even though official BIOS versions for V64 cards can support 350~360W. The VRM itself easily sustains 360W, and we’ve tested it as handling 406W without a FET popping. 400W is probably pushing what’s reasonable, but to limit V56 to ~300W, when an additional 60W is fully within the capabilities of the VRM & GPU, is a means to cap V56 performance to a point of not competing with V64.
We fixed that.
AMD’s CU scaling has never been that impacting to performance – clock speed closes most gaps with AMD hardware. Even without the extra shaders of V64, we can outperform V64’s stock performance, and we’ll soon find out how we do versus V64’s overclocked performance. That’ll have to wait until after PAX, but it’s something we’re hoping to further study.
UPDATE: We have run new CPU benchmarks for the launch of this game. Please view the Destiny 2 launch CPU benchmarks here.
Our Destiny 2 GPU benchmark was conducted alongside our CPU benchmark, using many of the same learnings from our research for the GPU bench. For GPU testing, we found Destiny 2 to be remarkably consistent between multiplayer and campaign performance, scaling all the way down to a 1050 Ti. This remained true across the campaign, which performed largely identically across all levels, aside from a single level with high geometric complexity and heavy combat. We’ll recap some of that below.
For CPU benchmarking, GN’s Patrick Lathan used this research (starting one hour after the GPU bench began) to begin CPU tests. We ultimately found more test variance between CPUs – particularly at the low-end – when switching between campaign and multiplayer, and so much of this content piece will be dedicated to the research portion behind our Destiny 2 CPU testing. We cannot yet publish this as a definitive “X vs. Y CPU” benchmark, as we don’t have full confidence in the comparative data given Destiny 2’s sometimes nebulous behaviors.
For one instance, Destiny 2 doesn’t utilize SMT with Ryzen, producing utilization charts like this:
Since AMD’s high-core-count Ryzen lineup has entered the market, there seems to be an argument in every comment thread about multitasking and which CPUs handle it better. Our clean, controlled benchmarks don’t account for the demands of eighty browser tabs and Spotify running, and so we get constant requests to do in-depth testing on the subject. The general belief is that more threads are better able to handle more processes, a hypothesis that would increasingly favor AMD.
There are a couple reasons we haven’t included tests like these all along: first, “multitasking” means something completely different to every individual, and second, adding uncontrolled variables (like bloatware and network-attached software) makes tests less scientific. Originally, we hoped this article would reveal any hidden advantages that might emerge between CPUs when adding “multitasking” to the mix, but it’s ended up as a thorough explanation of why we don’t do benchmarks like this. We’re using the R3 1200 and G4560 to primarily run these trials.
This is the kind of testing we do behind-the-scenes to build a new test plan, but often don’t publish. This time, however, we’re publishing the trials of finding a multitasking benchmark that works. The point of publishing the trials is to demonstrate why it’s hard to trust “multitasking” tests, and why it’s hard to conduct them in a manner that’s representative of actual differences.
In listening to our community, we’ve learned that a lot of people seem to think Discord is multitasking, or that a Skype window is multitasking. Here’s the thing: If you’re running Discord and a game and you’re seeing an impact to “smoothness,” there’s something seriously wrong with the environment. That’s not even remotely close to enough of a workload to trouble even a G4560. We’re not looking at such a lightweight workload here, and we’re also not looking at the “I keep 100 tabs of Chrome open” scenarios, as that’s wholly unreliable given Chrome’s unpredictable caching and behaviors. What we are looking at is 4K video playback while gaming and bloatware while gaming.
In this piece, the word “multitasking” will be used to describe “running background software while gaming.” The term "bloatware" is being used loosely to easily describe an unclean operating system with several user applications running in the background.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.