We’re revisiting a topic from July 2017, initially published in the middle of one of last year’s cryptocurrency booms. That topic was our discussion with GPU add-in board partners and PSU makers, where we collected anonymized, aggregate thoughts on cryptomining and its impact on the consumer GPU market. Given the tremendous growth of the cryptocurrency community in the time since, and the recent explosion of GPU prices up to 3-5x their MSRP (depending on if it’s a primary or secondary seller), we decided it was time to revisit the topic once more.
This information is anonymized and aggregated for a few reasons: One, no one would be able to share their thoughts otherwise, as this isn’t a topic that can be officially approached; two, it allows folks to speak more freely, as if there were an official response, you can be assured it’d tread the line of neutrality to a point of being bereft of insight. We spoke to most of the major GPU board partners and some PSU maker representatives, including the original group of folks we spoke with in mid-2017, now back to re-evaluate their positions from six months ago.
The goal for today is to trick an nVidia GPU into drawing more power than its Boost 3.0 power budget will allow it. The theoretical result is that more power will provide greater clock stability; we won’t necessarily get better overclocks or bigger offsets, but should stabilize and flatline the frequency, which improves performance overall. Typically, Boost clock bounces around based on load and power budget or voltage. We have already eliminated the thermal checkpoint with our Hybrid mod, and must now help eliminate the power budget checkpoint.
This content piece is relatively agnostic toward nVidia devices. Although we are using an nVidia Titan V graphics card, priced at $3000, the same practice of shunt resistor shorting can be applied to a 1080 Ti, 1070, 1070 Ti, or other nVidia GPUs.
“Shunts” are in-line resistors that have a known input voltage, which ultimately comes from the PCIe connectors or PCIe slot. In this case, we care about the in-line shunt resistors for the PCIe cables. The GPU knows the voltage across the shunt (12V, as it’s in-line with the power connectors), and the GPU also knows the resistance from the shunt (5mohm). By measuring the voltage drop across the shunt, the GPU can figure out how much current is being pulled, and then adjust to match power limitations accordingly. The shunt itself is not a limiter or a “dam,” but a measuring stick used to determine how much current is being used to drive the card.
This content piece is video-centric, but we have a full-length feature article coming tomorrow -- and it's focused on shunt shorting, something we have spent the past few days playing around with. For today's, however, we point you toward our render rig's GPU diagnostics, where we pull a Maxwell Titan from the machine, try to determine why it's overheating, and show some CLC / AIO permeation testing in the process. Rather than weigh the loops, which makes no sense (given the different manufacturing tolerances for the radiators and pumps), we emptied two loops -- one new and one old -- to see if the older unit's liquid had permeated the tubes. If it had, then we'd measure less liquid in the older loop, showing that a year of heavy wear had caused the permeation. You can find out what happened in the video below.
The short of it is that, between the two loops, we saw no meaningful permeation -- we also noted that the pump impellers were still spinning, and that the thermal paste seemed fine. Our next steps will be to remount the CLC and test again.
Fortunately, this GTX 1060 isn't prepped for mass market or DIY consumer adoption -- we've got enough confusing naming as is. The GTX 1060 presently exists in 3GB and 6GB AICs, with the former also containing one fewer SM (or a 10% core reduction). There is also the lesser-known 1060 6GB card with boosted 9Gbps memory speeds, part of a refreshed effort by nVidia and its partners earlier this year. According to Chinese language website Expreview, a new GTX 1060 5GB card is allegedly planned for release in Asian markets, primarily targeted for use in internet cafes and PC bangs. We have not independently verified the story at this time.
From what the story indicates, it seems as if this particular GTX 1060 model will carry the original 1280 CUDA cores (as opposed to the 1152 FP32 lanes on the 1060 3GB), with the primary difference existing in a 1GB reduction to capacity and 160-bit memory interface.
This episode of Ask GN, shipping on Christmas day, answers a few pertinent questions from the last few weeks: We'll talk about whether we made ROI on the Titan V, whether it makes more sense to buy Ryzen now or wait for Ryzen+/Ryzen2, and then dive into the "minor" topics for the segment. Smaller topics include discussion on choosing games for benchmarking -- primarily, why we don't like ROTTR -- and our thoughts on warranty/support reviews, with some reinforced information on vertical GPU mounting. The conclusion focuses on an ancient video card and some GN modmat information.
The embedded video below contains the episode. Timestamps are below that.
The monstrosity shown in our latest GN livestream was what ranked us among the top-10 on several 3D Mark world benchmarks: It was a mixture of things, primarily including benefit of having high-end hardware (read: buying your way to the top), but also compensating for limited CPU OC skills with a liquid cooling mod. Our Titan V held high clocks longer than it had any business doing, and that was because of our Titan V Hybrid Mod.
It all comes down to Boost 3.0, as usual, and even AMD’s Vega now behaves similarly. The cards look at their thermal situation, with nVidia targeting 83-84C as a limiter, and then adjust clocks according to thermal headroom. This is also why there’s no hard guarantee on clock speed, because the card functionally “overclocks” (or “downclocks,” depending on perspective) itself to match its thermal budget. If we haven’t exceeded the thermal budget – achievable primarily with AIB partner coolers or with a liquid mod – then we have new budgets to abide to, primarily power and voltage.
We can begin solving for the former with shunt mods, something we’ve done and for which we’ll soon publish data, but we can’t do much more than that. These cards are fairly locked down, including BIOS, and we’re going to be limited to whatever hard mods we can pull off.
We’ve previously found unexciting differences of <1% gains between x16 vs. x8 PCIe 3.0 arrangements, primarily relying on GTX 1080 Ti GPUs for the testing. There were two things we wanted to overhaul on that test: (1) Increase the count of GPUs to at least two, thereby placing greater strain on the PCIe bus (x16/x16 vs. x8/x8), and (2) use more powerful GPUs.
Fortunately, YouTube channel BitsBeTrippin agreed to loan GamersNexus its Titan V, bringing us up to a total count of two cards. We’ll be able to leverage these for determining bandwidth limitations in supported applications; unfortunately, as expected, most applications (particularly games) do not support 2x Titan Vs. The nature of being a scientific/compute card is that SLI must go away, and instead be replaced with NVLink. We must therefore rely on explicit multi-GPU via DirectX 12. This means that Ashes of the Singularity will support our test, and also left us with a list of games that might support testing: Hitman, Deus Ex: Mankind Divided, Total War: Warhammer, Civilization VI, and Rise of the Tomb Raider. None of these games saw both Titan V cards, though, and so we only really have Ashes to go off of. It goes without saying, but that means this test isn’t representative of the whole, naturally, but will give us a good baseline for analysis. Something like GPUPI may further provide a dual-GPU test application.
We also can’t test NVLink, as we don’t have one of the $600 bridges – but our work can be done without a bridge, thanks to explicit multi-GPU in DirectX 12.
It’s time to revisit PCIe bandwidth testing. We’re looking at the point at which a GPU can exceed the bandwidth limitations of the PCIe Gen3 slot, particularly when in x8 mode. This comparison includes dual Titan V testing in x8 and x16 configurations, pushing the limits of the 1GB/s/lane limits of the PCIe slots.
Testing PCIe x8 vs. x16 lane arrangements can be done a few ways, including: (1) Tape off the physical pins on the PCIe foot of the GPU, thereby forcing x8 m ode; (2) switch motherboard PCIe generation to Gen2 for half the bandwidth, but potentially introduce variables; (3) use a motherboard with slots which are physically wired for x8 or x16.
Hardcore overclocker "Buildzoid" just finished his VRM and PCB analysis of the Titan V, released on the GN channel moments ago. The Titan V uses a 16-phase VRM from nVidia with an interesting design, including some "mystery" in-line phases that we think are used to drop 12v. This VRM is one of the best that nVidia has built on a 'reference' card, and that makes sense, seeing as there won't be other Titan V cards from board partners. We do think the cooling solution needs work, and we've done a hybrid mod to fix that, but the VRM and PCB put us in a good place for heavier modding, including shunt modding.
Shunt modding is probably the most interesting, as that's what will give a bit more voltage headroom for overclocking, and should help trick the card's regulation into giving us more power to play with. Buildzoid talks about this mod during the video, for any willing to attempt it. We may attempt the mod on our own card.
We took our nVidia Titan V Volta card apart when we first received it, following our gaming benchmarks, and are now embarking on a mission to take some Top 10 scores in HWBot Firestrike rankings. Admittedly, we can only get close to top 10 from access – we bought the card early, and so it’s a bit of an unfair advantage – but we’re confident that the top 10 slots will soon belong entirely to the XOC community.
For now, though, we can have a moment of glory. If only a moment.
Getting there will require better cooling, as we just aren’t as good at CPU overclocking as some of the others in the top 10. To make up for our skill and LN2 deficit, we can throw more cooling at the Titan V and put up a harder fight. Liquid cooling the V is the first step, and will help us stabilize higher clocks at lower temperatures. Volta, like Pascal, increases its clock (and the stability of that clock) as the GPU core temperature decreases. Driving temperatures down under 60C will help tremendously in stability, and driving them under 40C – if possible – will be even better. We’ll see how far we get. Our Top 10 efforts will be livestreamed at around 5 or 6PM EST today, December 16, 2017.
This test is another in a series of studies to learn more about nVidia’s new Volta architecture. Although Volta in its present form is not the next-generation gaming architecture, we would anticipate that key performance metrics can be stripped from Volta and used to extrapolate future behavior of nVidia’s inevitable gaming arch, even if named differently. One example would be our gaming benchmarks, where we observed significant performance uplift in games leveraging asynchronous compute pipelines and low-level APIs. Our prediction is that nVidia is moving toward a future of heavily support asynchronous compute job queuing, where the company is presently disadvantaged versus its competition; that’s not to say that nVidia doesn’t do asynchronous job queuing on Pascal (it does), but that AMD has, until now, put greater emphasis on that particular aspect of development.
This, we think, may also precipitate more developer movement toward these more advanced programming techniques. With the only two GPU vendors in the market supporting lower level APIs and asynchronous compute with greater emphasis, it would be reasonable to assume that development would follow, as would marketing development dollars.
In this testing, we’re running benchmarks on the nVidia Titan V to determine whether GPU core or memory (HBM2) overclocks have greater impact on performance. For this test, we’re only using a few key games, as selected from our gaming benchmarks:
- Sniper Elite 4: DirectX 12, asynchronous compute-enabled, and showed significant performance uplift in Volta over Pascal. Sniper responds to GPU clock changes in drastic ways, we find. This represents our async titles.
- Ashes of the Singularity: DirectX 12, but less responsive than Sniper. We were seeing ~10% uplift over the Titan Xp, whereas Sniper showed ~30-40% uplift. This gives us a middle-ground.
- Destiny 2: DirectX 11, not very responsive to the Titan V in general. We saw ~4% uplift over the Titan Xp at some settings, though other settings combinations did produce greater change. This gives us a look at games that don’t necessarily care for Volta’s new async capabilities.
We are also using Firestrike Ultra and Superposition, the latter of which is also fairly responsive to the Titan’s dynamic ray-casting performance.
We are running the fan at 100% for all tests, with the power offset at 120% (max) for all tests. Clocks are changed according to their numbers in the charts.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.