Update: Square Enix is aware of this issue, has acknowledged its existence, and is working on an update for launch.
Although we don't believe this to be intentional, the Final Fantasy XV benchmark is among the most misleading we’ve encountered in recent history. This is likely a result of restrictive development timelines and a resistance to delaying product launch and, ultimately, that developers see this as "just" a benchmark. That said, the benchmark is what's used for folks to get an early idea of how their graphics cards will perform in the game. From what we've seen, that's not accurate to reality. Not only does the benchmark lack technology shown in tech demonstrations (we hope these will be added later, like strand deformation), but it is still taking performance hits for graphics settings that fail to materialize as visual fidelity improvements. Much of this stems from GameWorks settings, so we've been in contact with nVidia over these findings for the past few days.
As we discovered after hours of testing the utility, the FFXV benchmark is disingenuous in its execution, rendering load-intensive objects outside the camera frustum and resulting in a lower reported performance metric. We accessed the hexadecimal graphics settings for manual GameWorks setting tuning, made easier by exposing .INI files via a DLL, then later entered noclip mode to dig into some performance anomalies. On our own, we’d discovered that HairWorks toggling (on/off) had performance impact in areas where no hair existed. The only reason this would happen, aside from anomalous bugs or improper use of HairWorks (also likely, and not mutually exclusive), would be if the single hair-endowed creature in the benchmark were drawn at all times.
The benchmark is rendering creatures that use HairWorks even when they’re miles away from the character and the camera. Again, this was made evident while running benchmarks in a zone with no hairworks whatsoever – zero, none – at which point we realized, by accessing the game’s settings files, that disabling HairWorks would still improve performance even when no hairworks objects were on screen. Validation is easy, too: Testing the custom graphics settings file by toggling each setting, we're able to (1) individually confirm when Flow is disabled (the fire effect changes), (2) when Turf is disabled (grass strands become textures or, potentially, particle meshes), (3) when Terrain is enabled (shows tessellation of the ground at the demo start' terrain is pushed down and deformed, while protrusions are pulled up), and (3) when HairWorks is disabled (buffalo hair becomes a planar alpha texture). We're also able to confirm, by testing the default "High," "Standard," and "Low" settings, that the game's default GameWorks configuration is set to the following (High settings):
- VXAO: Off
- Shadow libs: Off
- Flow: On
- HairWorks: On
- TerrainTessellation: On
- Turf: On
Benchmarking custom settings matching the above results in identical performance to the benchmark launcher window, validating that these are the stock settings. We must use the custom settings approach, as going between Medium and High offers no settings customization, and also changes multiple settings simultaneously. To isolate whether a performance change is from GameWorks versus view distance and other settings, we must individually test each GameWorks setting from a baseline configuration of "High."
Final Fantasy XV is shaping up to be intensely demanding of GPU hardware, with greater deltas developing between nVidia & AMD devices at High settings than Medium settings. The implication is that, although other graphics settings (LOD, draw distance) change between High and Medium, the most significant change is that of GameWorks options. HairWorks, Shadow libraries, and heavy ground tessellation are all toggled on with High and off with Medium. The ground tessellation is one of the most impactful to performance, particularly on AMD hardware; that said, although nVidia fares better, the 10-series GPUs still struggle with frametime consistency when running all the GameWorks options. This is something we’re investigating further, as we’ve (since writing this benchmark) discovered how to toggle graphics settings individually, something natively disabled in the FFXV benchmark. Stay tuned for that content.
In the meantime, we still have some unique GPU benchmarks and technical graphics analysis for you. One of our value adds is 1440p benchmarks, which are, for some inexplicable reason, disabled in the native FFXV benchmark client. We automated and scripted our benchmarks, enabling us to run tests at alternative resolutions. Another value-add is that we’re controlling our benchmarks; although it is admirable and interesting that Square Enix is collecting and aggregating user benchmark data, that data is also poisoned. The card hierarchy makes little sense at times, and that’s because users run benchmarks with any manner of variables – none of which are accounted for (or even publicly logged) in the FFXV benchmark utility.
Separately, we also confirmed with Square Enix that the graphics settings are the same for all default resolutions, something that we had previously questioned.
We recently bought the MSI GTX 1070 Ti Duke for a separate PC build, and decided we’d go ahead and review the card while at it. The MSI GTX 1070 Ti Duke graphics card uses a three-fan cooler, which MSI seems to now be officially calling the “tri-frozr” cooler, and was among the more affordable GTX 1070 Ti cards on the market. That reign has ended as GPU prices have re-skyrocketed, but perhaps it’ll return again to $480. Until then, we’ll write this assuming that price. Beyond $480, it’s obviously not worth it, just to spell that out right now.
The MSI GTX 1070 Ti Duke has one of the thinner heatsinks of the 10-series cards, and a lot of that comes down to card form factor: The Duke fits in a 2-slot form factor, but runs a three-fan cooler. This mixture necessitates a thin, wide heatsink, which means relatively limited surface area for dissipation, but potentially quieter fans from the three-fan solution.
NOTE: We wrote this review before CES. Card prices have since skyrocketed. Do not buy any 1070 Ti for >$500. This card was reviewed assuming a $470-$480 price-point. Anything more than that, it's not worth it.
We’re revisiting a topic from July 2017, initially published in the middle of one of last year’s cryptocurrency booms. That topic was our discussion with GPU add-in board partners and PSU makers, where we collected anonymized, aggregate thoughts on cryptomining and its impact on the consumer GPU market. Given the tremendous growth of the cryptocurrency community in the time since, and the recent explosion of GPU prices up to 3-5x their MSRP (depending on if it’s a primary or secondary seller), we decided it was time to revisit the topic once more.
This information is anonymized and aggregated for a few reasons: One, no one would be able to share their thoughts otherwise, as this isn’t a topic that can be officially approached; two, it allows folks to speak more freely, as if there were an official response, you can be assured it’d tread the line of neutrality to a point of being bereft of insight. We spoke to most of the major GPU board partners and some PSU maker representatives, including the original group of folks we spoke with in mid-2017, now back to re-evaluate their positions from six months ago.
The goal for today is to trick an nVidia GPU into drawing more power than its Boost 3.0 power budget will allow it. The theoretical result is that more power will provide greater clock stability; we won’t necessarily get better overclocks or bigger offsets, but should stabilize and flatline the frequency, which improves performance overall. Typically, Boost clock bounces around based on load and power budget or voltage. We have already eliminated the thermal checkpoint with our Hybrid mod, and must now help eliminate the power budget checkpoint.
This content piece is relatively agnostic toward nVidia devices. Although we are using an nVidia Titan V graphics card, priced at $3000, the same practice of shunt resistor shorting can be applied to a 1080 Ti, 1070, 1070 Ti, or other nVidia GPUs.
“Shunts” are in-line resistors that have a known input voltage, which ultimately comes from the PCIe connectors or PCIe slot. In this case, we care about the in-line shunt resistors for the PCIe cables. The GPU knows the voltage across the shunt (12V, as it’s in-line with the power connectors), and the GPU also knows the resistance from the shunt (5mohm). By measuring the voltage drop across the shunt, the GPU can figure out how much current is being pulled, and then adjust to match power limitations accordingly. The shunt itself is not a limiter or a “dam,” but a measuring stick used to determine how much current is being used to drive the card.
This content piece is video-centric, but we have a full-length feature article coming tomorrow -- and it's focused on shunt shorting, something we have spent the past few days playing around with. For today's, however, we point you toward our render rig's GPU diagnostics, where we pull a Maxwell Titan from the machine, try to determine why it's overheating, and show some CLC / AIO permeation testing in the process. Rather than weigh the loops, which makes no sense (given the different manufacturing tolerances for the radiators and pumps), we emptied two loops -- one new and one old -- to see if the older unit's liquid had permeated the tubes. If it had, then we'd measure less liquid in the older loop, showing that a year of heavy wear had caused the permeation. You can find out what happened in the video below.
The short of it is that, between the two loops, we saw no meaningful permeation -- we also noted that the pump impellers were still spinning, and that the thermal paste seemed fine. Our next steps will be to remount the CLC and test again.
Fortunately, this GTX 1060 isn't prepped for mass market or DIY consumer adoption -- we've got enough confusing naming as is. The GTX 1060 presently exists in 3GB and 6GB AICs, with the former also containing one fewer SM (or a 10% core reduction). There is also the lesser-known 1060 6GB card with boosted 9Gbps memory speeds, part of a refreshed effort by nVidia and its partners earlier this year. According to Chinese language website Expreview, a new GTX 1060 5GB card is allegedly planned for release in Asian markets, primarily targeted for use in internet cafes and PC bangs. We have not independently verified the story at this time.
From what the story indicates, it seems as if this particular GTX 1060 model will carry the original 1280 CUDA cores (as opposed to the 1152 FP32 lanes on the 1060 3GB), with the primary difference existing in a 1GB reduction to capacity and 160-bit memory interface.
This episode of Ask GN, shipping on Christmas day, answers a few pertinent questions from the last few weeks: We'll talk about whether we made ROI on the Titan V, whether it makes more sense to buy Ryzen now or wait for Ryzen+/Ryzen2, and then dive into the "minor" topics for the segment. Smaller topics include discussion on choosing games for benchmarking -- primarily, why we don't like ROTTR -- and our thoughts on warranty/support reviews, with some reinforced information on vertical GPU mounting. The conclusion focuses on an ancient video card and some GN modmat information.
The embedded video below contains the episode. Timestamps are below that.
The monstrosity shown in our latest GN livestream was what ranked us among the top-10 on several 3D Mark world benchmarks: It was a mixture of things, primarily including benefit of having high-end hardware (read: buying your way to the top), but also compensating for limited CPU OC skills with a liquid cooling mod. Our Titan V held high clocks longer than it had any business doing, and that was because of our Titan V Hybrid Mod.
It all comes down to Boost 3.0, as usual, and even AMD’s Vega now behaves similarly. The cards look at their thermal situation, with nVidia targeting 83-84C as a limiter, and then adjust clocks according to thermal headroom. This is also why there’s no hard guarantee on clock speed, because the card functionally “overclocks” (or “downclocks,” depending on perspective) itself to match its thermal budget. If we haven’t exceeded the thermal budget – achievable primarily with AIB partner coolers or with a liquid mod – then we have new budgets to abide to, primarily power and voltage.
We can begin solving for the former with shunt mods, something we’ve done and for which we’ll soon publish data, but we can’t do much more than that. These cards are fairly locked down, including BIOS, and we’re going to be limited to whatever hard mods we can pull off.
We’ve previously found unexciting differences of <1% gains between x16 vs. x8 PCIe 3.0 arrangements, primarily relying on GTX 1080 Ti GPUs for the testing. There were two things we wanted to overhaul on that test: (1) Increase the count of GPUs to at least two, thereby placing greater strain on the PCIe bus (x16/x16 vs. x8/x8), and (2) use more powerful GPUs.
Fortunately, YouTube channel BitsBeTrippin agreed to loan GamersNexus its Titan V, bringing us up to a total count of two cards. We’ll be able to leverage these for determining bandwidth limitations in supported applications; unfortunately, as expected, most applications (particularly games) do not support 2x Titan Vs. The nature of being a scientific/compute card is that SLI must go away, and instead be replaced with NVLink. We must therefore rely on explicit multi-GPU via DirectX 12. This means that Ashes of the Singularity will support our test, and also left us with a list of games that might support testing: Hitman, Deus Ex: Mankind Divided, Total War: Warhammer, Civilization VI, and Rise of the Tomb Raider. None of these games saw both Titan V cards, though, and so we only really have Ashes to go off of. It goes without saying, but that means this test isn’t representative of the whole, naturally, but will give us a good baseline for analysis. Something like GPUPI may further provide a dual-GPU test application.
We also can’t test NVLink, as we don’t have one of the $600 bridges – but our work can be done without a bridge, thanks to explicit multi-GPU in DirectX 12.
It’s time to revisit PCIe bandwidth testing. We’re looking at the point at which a GPU can exceed the bandwidth limitations of the PCIe Gen3 slot, particularly when in x8 mode. This comparison includes dual Titan V testing in x8 and x16 configurations, pushing the limits of the 1GB/s/lane limits of the PCIe slots.
Testing PCIe x8 vs. x16 lane arrangements can be done a few ways, including: (1) Tape off the physical pins on the PCIe foot of the GPU, thereby forcing x8 m ode; (2) switch motherboard PCIe generation to Gen2 for half the bandwidth, but potentially introduce variables; (3) use a motherboard with slots which are physically wired for x8 or x16.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.