The goal for today is to trick an nVidia GPU into drawing more power than its Boost 3.0 power budget will allow it. The theoretical result is that more power will provide greater clock stability; we won’t necessarily get better overclocks or bigger offsets, but should stabilize and flatline the frequency, which improves performance overall. Typically, Boost clock bounces around based on load and power budget or voltage. We have already eliminated the thermal checkpoint with our Hybrid mod, and must now help eliminate the power budget checkpoint.
This content piece is relatively agnostic toward nVidia devices. Although we are using an nVidia Titan V graphics card, priced at $3000, the same practice of shunt resistor shorting can be applied to a 1080 Ti, 1070, 1070 Ti, or other nVidia GPUs.
“Shunts” are in-line resistors that have a known input voltage, which ultimately comes from the PCIe connectors or PCIe slot. In this case, we care about the in-line shunt resistors for the PCIe cables. The GPU knows the voltage across the shunt (12V, as it’s in-line with the power connectors), and the GPU also knows the resistance from the shunt (5mohm). By measuring the voltage drop across the shunt, the GPU can figure out how much current is being pulled, and then adjust to match power limitations accordingly. The shunt itself is not a limiter or a “dam,” but a measuring stick used to determine how much current is being used to drive the card.
The monstrosity shown in our latest GN livestream was what ranked us among the top-10 on several 3D Mark world benchmarks: It was a mixture of things, primarily including benefit of having high-end hardware (read: buying your way to the top), but also compensating for limited CPU OC skills with a liquid cooling mod. Our Titan V held high clocks longer than it had any business doing, and that was because of our Titan V Hybrid Mod.
It all comes down to Boost 3.0, as usual, and even AMD’s Vega now behaves similarly. The cards look at their thermal situation, with nVidia targeting 83-84C as a limiter, and then adjust clocks according to thermal headroom. This is also why there’s no hard guarantee on clock speed, because the card functionally “overclocks” (or “downclocks,” depending on perspective) itself to match its thermal budget. If we haven’t exceeded the thermal budget – achievable primarily with AIB partner coolers or with a liquid mod – then we have new budgets to abide to, primarily power and voltage.
We can begin solving for the former with shunt mods, something we’ve done and for which we’ll soon publish data, but we can’t do much more than that. These cards are fairly locked down, including BIOS, and we’re going to be limited to whatever hard mods we can pull off.
We’ve previously found unexciting differences of <1% gains between x16 vs. x8 PCIe 3.0 arrangements, primarily relying on GTX 1080 Ti GPUs for the testing. There were two things we wanted to overhaul on that test: (1) Increase the count of GPUs to at least two, thereby placing greater strain on the PCIe bus (x16/x16 vs. x8/x8), and (2) use more powerful GPUs.
Fortunately, YouTube channel BitsBeTrippin agreed to loan GamersNexus its Titan V, bringing us up to a total count of two cards. We’ll be able to leverage these for determining bandwidth limitations in supported applications; unfortunately, as expected, most applications (particularly games) do not support 2x Titan Vs. The nature of being a scientific/compute card is that SLI must go away, and instead be replaced with NVLink. We must therefore rely on explicit multi-GPU via DirectX 12. This means that Ashes of the Singularity will support our test, and also left us with a list of games that might support testing: Hitman, Deus Ex: Mankind Divided, Total War: Warhammer, Civilization VI, and Rise of the Tomb Raider. None of these games saw both Titan V cards, though, and so we only really have Ashes to go off of. It goes without saying, but that means this test isn’t representative of the whole, naturally, but will give us a good baseline for analysis. Something like GPUPI may further provide a dual-GPU test application.
We also can’t test NVLink, as we don’t have one of the $600 bridges – but our work can be done without a bridge, thanks to explicit multi-GPU in DirectX 12.
It’s time to revisit PCIe bandwidth testing. We’re looking at the point at which a GPU can exceed the bandwidth limitations of the PCIe Gen3 slot, particularly when in x8 mode. This comparison includes dual Titan V testing in x8 and x16 configurations, pushing the limits of the 1GB/s/lane limits of the PCIe slots.
Testing PCIe x8 vs. x16 lane arrangements can be done a few ways, including: (1) Tape off the physical pins on the PCIe foot of the GPU, thereby forcing x8 m ode; (2) switch motherboard PCIe generation to Gen2 for half the bandwidth, but potentially introduce variables; (3) use a motherboard with slots which are physically wired for x8 or x16.
Hardcore overclocker "Buildzoid" just finished his VRM and PCB analysis of the Titan V, released on the GN channel moments ago. The Titan V uses a 16-phase VRM from nVidia with an interesting design, including some "mystery" in-line phases that we think are used to drop 12v. This VRM is one of the best that nVidia has built on a 'reference' card, and that makes sense, seeing as there won't be other Titan V cards from board partners. We do think the cooling solution needs work, and we've done a hybrid mod to fix that, but the VRM and PCB put us in a good place for heavier modding, including shunt modding.
Shunt modding is probably the most interesting, as that's what will give a bit more voltage headroom for overclocking, and should help trick the card's regulation into giving us more power to play with. Buildzoid talks about this mod during the video, for any willing to attempt it. We may attempt the mod on our own card.
We took our nVidia Titan V Volta card apart when we first received it, following our gaming benchmarks, and are now embarking on a mission to take some Top 10 scores in HWBot Firestrike rankings. Admittedly, we can only get close to top 10 from access – we bought the card early, and so it’s a bit of an unfair advantage – but we’re confident that the top 10 slots will soon belong entirely to the XOC community.
For now, though, we can have a moment of glory. If only a moment.
Getting there will require better cooling, as we just aren’t as good at CPU overclocking as some of the others in the top 10. To make up for our skill and LN2 deficit, we can throw more cooling at the Titan V and put up a harder fight. Liquid cooling the V is the first step, and will help us stabilize higher clocks at lower temperatures. Volta, like Pascal, increases its clock (and the stability of that clock) as the GPU core temperature decreases. Driving temperatures down under 60C will help tremendously in stability, and driving them under 40C – if possible – will be even better. We’ll see how far we get. Our Top 10 efforts will be livestreamed at around 5 or 6PM EST today, December 16, 2017.
This test is another in a series of studies to learn more about nVidia’s new Volta architecture. Although Volta in its present form is not the next-generation gaming architecture, we would anticipate that key performance metrics can be stripped from Volta and used to extrapolate future behavior of nVidia’s inevitable gaming arch, even if named differently. One example would be our gaming benchmarks, where we observed significant performance uplift in games leveraging asynchronous compute pipelines and low-level APIs. Our prediction is that nVidia is moving toward a future of heavily support asynchronous compute job queuing, where the company is presently disadvantaged versus its competition; that’s not to say that nVidia doesn’t do asynchronous job queuing on Pascal (it does), but that AMD has, until now, put greater emphasis on that particular aspect of development.
This, we think, may also precipitate more developer movement toward these more advanced programming techniques. With the only two GPU vendors in the market supporting lower level APIs and asynchronous compute with greater emphasis, it would be reasonable to assume that development would follow, as would marketing development dollars.
In this testing, we’re running benchmarks on the nVidia Titan V to determine whether GPU core or memory (HBM2) overclocks have greater impact on performance. For this test, we’re only using a few key games, as selected from our gaming benchmarks:
- Sniper Elite 4: DirectX 12, asynchronous compute-enabled, and showed significant performance uplift in Volta over Pascal. Sniper responds to GPU clock changes in drastic ways, we find. This represents our async titles.
- Ashes of the Singularity: DirectX 12, but less responsive than Sniper. We were seeing ~10% uplift over the Titan Xp, whereas Sniper showed ~30-40% uplift. This gives us a middle-ground.
- Destiny 2: DirectX 11, not very responsive to the Titan V in general. We saw ~4% uplift over the Titan Xp at some settings, though other settings combinations did produce greater change. This gives us a look at games that don’t necessarily care for Volta’s new async capabilities.
We are also using Firestrike Ultra and Superposition, the latter of which is also fairly responsive to the Titan’s dynamic ray-casting performance.
We are running the fan at 100% for all tests, with the power offset at 120% (max) for all tests. Clocks are changed according to their numbers in the charts.
As we work toward our inevitable hybrid mod on the nVidia Titan V, we must visit the usual spread of in-depth thermal, power, and clock behavior testing. The card uses a slightly modified Titan Xp cooler, with primary modifications found in the vapor chamber’s switch to copper heatfins. That’s the primary change, and not one that’s necessarily all that meaningful. Still, the card needs whatever it can get, and short of a complete cooler rework, this is about the most that can fit on the current design.
In this Titan V benchmark, we’ll be looking at the card’s power consumption during various heavy workloads, thermal behavior of the MOSFETs and GPU core, and how frequency scales with thermals and power. The frequency scaling is the most important: We’ve previously found that high-end nVidia cards leave noteworthy performance (>100MHz boost) on the table with their stock coolers, and suspect the same to remain true on this high-wattage GPU.
The nVidia Titan V is not a gaming card, but gives us some insights as to how the Volta architecture could react to different games and engines. The point here isn’t to look at raw performance in a hundred different titles, but to think about what the performance teaches us for future cards. This will teach us about the Volta architecture; obviously, you shouldn’t be spending $3000 to use a scientific card on gaming, but that doesn’t mean we can’t learn from it. Our tear-down is already online, but now we’re focusing on Titan V overclocking and FPS benchmarks, and then we’ll move on to production, power, and thermal content.
This nVidia Titan V gaming benchmark tests the Volta architecture versus Pascal architecture across DirectX 11, DirectX 12, Vulkan, and synthetic applications. We purchased the Titan V for editorial purposes, and will be dedicating the next few days to dissecting every aspect of the card, much like we did for Vega: Frontier Edition in the summer.
Vega 64 may consume more power than a GTX 1080, but until now, we haven’t known if that impact is relevant to room temperature. That’s what we wanted to know, and we eventually expanded that concept to include how much a 900W+ mining machine increases room temperature, a 600W machine, and so on. We were able to effectively replace any need of a heater for the past week, and right when it started to get colder.
In this test, we’re looking at the room ambient impact of various PC builds. This helps to conceptualize the real-world impact of all those power and thermal tests you see us (and others) publish, as it puts real numbers to the user experience outside of the case. Although this concept has about a million variables and “what ifs,” we controlled to the best of our abilities, are laying-out all the major variables, and can present an academic experiment that demonstrates room temperature increase from computer equipment. All watts are basically created equal, for the purposes of this test: A 940W mining rig will output just as much heat into the room as a 940W gaming rig, or a 940W rendering machine, and so forth; as long as the power load is equal between all of these (read: constant), watts are watts, and you can extrapolate room temperature for each type of machine.
The testing originally was concepted after our Vega 56 Hybrid mod, which used power mods and other mods to push the card up towards 400W of power consumption. We wanted to test a straight Vega 56 versus GTX 1070 for room ambient impact, but shifted that up a tier (to Vega 64 and a GTX 1080) for some parts that are more likely to show a difference. After that, we shifted up to a 940W mining machine, then picked a middle-ground ~600W machine (which could also represent SLI gaming or HEDT render systems).
AMD’s partner cards have been on hold for review for a while now. We first covered the Vega 64 Strix when we received it, which was around October 8th. The PowerColor card came in before Thanksgiving in the US, and immediately exhibited similar clock reporting and frequency bugginess with older driver revisions. AMD released driver version 17.11.4, though, which solved some of those problems – theoretically, anyway. There are still known issues with clock behavior in 17.11.4, but we wanted to test whether or not the drivers would play nice with the partner cards. For right now, our policy is this: (1) We will review the cards immediately upon consumer availability or pre-order, as that is when people will need to know if they’re any good; (2) we will review the cards when either the manufacturer declares them ready, or at a time when the cards appear to be functioning properly.
This benchmark is looking at the second option: We’re testing whether the ASUS Strix Vega 64 and PowerColor Red Devil 64 are ready for benchmarking, and looking at how they match versus the reference RX Vega 64. Theoretically, the cards should have slightly higher clocks, and therefore should perform better. Now, PowerColor has set clock targets at 1632MHz across the board, but “slightly higher clocks” doesn’t just mean clock target – it also means power budget, which board partners have control over. Either one of these, particularly in combination with superior cooling, should result in higher sustained boost clocks, which would result in higher framerates or scores.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.