This content piece was highly requested by the audience, although there is presently limited point to its findings. Following the confluence of the Meltdown and Spectre exploits last week, Microsoft pushed a Windows security software update that sought to fill some of the security gaps, something which has been speculated as causing a performance dip between 5% and 30%. As of now, today, Intel has not yet released its microcode update, which means that it is largely folly to undertake the benchmarks we’re undertaking in this content piece – that said, there is merit to it, but the task must be looked at from the right perspective.
From the perspective of advancing knowledge and building a baseline for the next round of tests – those which will, unlike today’s, factor-in microcode patches – we must eventually run the tests being run today. This will give us a baseline for performance, and will grant us two critical opportunities: (1) We may benchmark baseline, per-Windows-patch performance, and (2) we can benchmark post-patch performance, pre-microcode. Both will allow us to see the isolated impact from Intel’s firmware update versus Microsoft’s software update. This is important, and alone makes the endeavor worthwhile – particularly because our CPU suite is automated, anyway, so no big time loss, despite CES looming.
Speaking of, we only had time to run one CPU through the suite, and only with a few games, as, again, CES is looming. This is enough for now, though, and should sate some demand and interest.
As we pack before CES, this is just a quick video update in a non-standard format. We decided to put together a loose video that details the practical learnings of delidding -- things we've picked up over the past few months of taking the IHS off processors. During this time, we've learned a few tricks pertaining to resealing, preventing electrical shorts and damage, and applying liquid metal. These are all things that we could have used when learning about delidding, and so we decided to compile it into one content piece. The format is less formal and in our "tear-down" setup, just with a different tone to the content.
Just a quick update for everyone: We've got a major feature -- an end-of-year special that includes a short film (something we've never done before) -- going up tomorrow at around 9AM EST. That'll be sort of an end-of-year recap of a few key components, primarily those that disappointed us.
In the meantime, while we were playing one-day roles of directors and cinematographers, we've set to work on delidding another 7980XE. This will be our third delid of the 18C CPU, with another ~4~5 delids of lower-end CPUs from the past few months. Our previous delid was for Kyle of "Bitwit," which later led to our Intel X299 VRM thermal investigation of the ASUS Rampage VI Extreme motherboard's VRM temperatures. It was an excellent opportunity for us to explore potential sideshoot content pieces in more depth, and gave us multiple samples to build a larger sample size.
We're now up to 3x 18C CPUs delidded, and are collecting data on the latest for Ed from Tech Source. The delid just completed, and we're now in the resealing stage.
The goal for today is to trick an nVidia GPU into drawing more power than its Boost 3.0 power budget will allow it. The theoretical result is that more power will provide greater clock stability; we won’t necessarily get better overclocks or bigger offsets, but should stabilize and flatline the frequency, which improves performance overall. Typically, Boost clock bounces around based on load and power budget or voltage. We have already eliminated the thermal checkpoint with our Hybrid mod, and must now help eliminate the power budget checkpoint.
This content piece is relatively agnostic toward nVidia devices. Although we are using an nVidia Titan V graphics card, priced at $3000, the same practice of shunt resistor shorting can be applied to a 1080 Ti, 1070, 1070 Ti, or other nVidia GPUs.
“Shunts” are in-line resistors that have a known input voltage, which ultimately comes from the PCIe connectors or PCIe slot. In this case, we care about the in-line shunt resistors for the PCIe cables. The GPU knows the voltage across the shunt (12V, as it’s in-line with the power connectors), and the GPU also knows the resistance from the shunt (5mohm). By measuring the voltage drop across the shunt, the GPU can figure out how much current is being pulled, and then adjust to match power limitations accordingly. The shunt itself is not a limiter or a “dam,” but a measuring stick used to determine how much current is being used to drive the card.
The monstrosity shown in our latest GN livestream was what ranked us among the top-10 on several 3D Mark world benchmarks: It was a mixture of things, primarily including benefit of having high-end hardware (read: buying your way to the top), but also compensating for limited CPU OC skills with a liquid cooling mod. Our Titan V held high clocks longer than it had any business doing, and that was because of our Titan V Hybrid Mod.
It all comes down to Boost 3.0, as usual, and even AMD’s Vega now behaves similarly. The cards look at their thermal situation, with nVidia targeting 83-84C as a limiter, and then adjust clocks according to thermal headroom. This is also why there’s no hard guarantee on clock speed, because the card functionally “overclocks” (or “downclocks,” depending on perspective) itself to match its thermal budget. If we haven’t exceeded the thermal budget – achievable primarily with AIB partner coolers or with a liquid mod – then we have new budgets to abide to, primarily power and voltage.
We can begin solving for the former with shunt mods, something we’ve done and for which we’ll soon publish data, but we can’t do much more than that. These cards are fairly locked down, including BIOS, and we’re going to be limited to whatever hard mods we can pull off.
We’ve previously found unexciting differences of <1% gains between x16 vs. x8 PCIe 3.0 arrangements, primarily relying on GTX 1080 Ti GPUs for the testing. There were two things we wanted to overhaul on that test: (1) Increase the count of GPUs to at least two, thereby placing greater strain on the PCIe bus (x16/x16 vs. x8/x8), and (2) use more powerful GPUs.
Fortunately, YouTube channel BitsBeTrippin agreed to loan GamersNexus its Titan V, bringing us up to a total count of two cards. We’ll be able to leverage these for determining bandwidth limitations in supported applications; unfortunately, as expected, most applications (particularly games) do not support 2x Titan Vs. The nature of being a scientific/compute card is that SLI must go away, and instead be replaced with NVLink. We must therefore rely on explicit multi-GPU via DirectX 12. This means that Ashes of the Singularity will support our test, and also left us with a list of games that might support testing: Hitman, Deus Ex: Mankind Divided, Total War: Warhammer, Civilization VI, and Rise of the Tomb Raider. None of these games saw both Titan V cards, though, and so we only really have Ashes to go off of. It goes without saying, but that means this test isn’t representative of the whole, naturally, but will give us a good baseline for analysis. Something like GPUPI may further provide a dual-GPU test application.
We also can’t test NVLink, as we don’t have one of the $600 bridges – but our work can be done without a bridge, thanks to explicit multi-GPU in DirectX 12.
It’s time to revisit PCIe bandwidth testing. We’re looking at the point at which a GPU can exceed the bandwidth limitations of the PCIe Gen3 slot, particularly when in x8 mode. This comparison includes dual Titan V testing in x8 and x16 configurations, pushing the limits of the 1GB/s/lane limits of the PCIe slots.
Testing PCIe x8 vs. x16 lane arrangements can be done a few ways, including: (1) Tape off the physical pins on the PCIe foot of the GPU, thereby forcing x8 m ode; (2) switch motherboard PCIe generation to Gen2 for half the bandwidth, but potentially introduce variables; (3) use a motherboard with slots which are physically wired for x8 or x16.
Hardcore overclocker "Buildzoid" just finished his VRM and PCB analysis of the Titan V, released on the GN channel moments ago. The Titan V uses a 16-phase VRM from nVidia with an interesting design, including some "mystery" in-line phases that we think are used to drop 12v. This VRM is one of the best that nVidia has built on a 'reference' card, and that makes sense, seeing as there won't be other Titan V cards from board partners. We do think the cooling solution needs work, and we've done a hybrid mod to fix that, but the VRM and PCB put us in a good place for heavier modding, including shunt modding.
Shunt modding is probably the most interesting, as that's what will give a bit more voltage headroom for overclocking, and should help trick the card's regulation into giving us more power to play with. Buildzoid talks about this mod during the video, for any willing to attempt it. We may attempt the mod on our own card.
We took our nVidia Titan V Volta card apart when we first received it, following our gaming benchmarks, and are now embarking on a mission to take some Top 10 scores in HWBot Firestrike rankings. Admittedly, we can only get close to top 10 from access – we bought the card early, and so it’s a bit of an unfair advantage – but we’re confident that the top 10 slots will soon belong entirely to the XOC community.
For now, though, we can have a moment of glory. If only a moment.
Getting there will require better cooling, as we just aren’t as good at CPU overclocking as some of the others in the top 10. To make up for our skill and LN2 deficit, we can throw more cooling at the Titan V and put up a harder fight. Liquid cooling the V is the first step, and will help us stabilize higher clocks at lower temperatures. Volta, like Pascal, increases its clock (and the stability of that clock) as the GPU core temperature decreases. Driving temperatures down under 60C will help tremendously in stability, and driving them under 40C – if possible – will be even better. We’ll see how far we get. Our Top 10 efforts will be livestreamed at around 5 or 6PM EST today, December 16, 2017.
This test is another in a series of studies to learn more about nVidia’s new Volta architecture. Although Volta in its present form is not the next-generation gaming architecture, we would anticipate that key performance metrics can be stripped from Volta and used to extrapolate future behavior of nVidia’s inevitable gaming arch, even if named differently. One example would be our gaming benchmarks, where we observed significant performance uplift in games leveraging asynchronous compute pipelines and low-level APIs. Our prediction is that nVidia is moving toward a future of heavily support asynchronous compute job queuing, where the company is presently disadvantaged versus its competition; that’s not to say that nVidia doesn’t do asynchronous job queuing on Pascal (it does), but that AMD has, until now, put greater emphasis on that particular aspect of development.
This, we think, may also precipitate more developer movement toward these more advanced programming techniques. With the only two GPU vendors in the market supporting lower level APIs and asynchronous compute with greater emphasis, it would be reasonable to assume that development would follow, as would marketing development dollars.
In this testing, we’re running benchmarks on the nVidia Titan V to determine whether GPU core or memory (HBM2) overclocks have greater impact on performance. For this test, we’re only using a few key games, as selected from our gaming benchmarks:
- Sniper Elite 4: DirectX 12, asynchronous compute-enabled, and showed significant performance uplift in Volta over Pascal. Sniper responds to GPU clock changes in drastic ways, we find. This represents our async titles.
- Ashes of the Singularity: DirectX 12, but less responsive than Sniper. We were seeing ~10% uplift over the Titan Xp, whereas Sniper showed ~30-40% uplift. This gives us a middle-ground.
- Destiny 2: DirectX 11, not very responsive to the Titan V in general. We saw ~4% uplift over the Titan Xp at some settings, though other settings combinations did produce greater change. This gives us a look at games that don’t necessarily care for Volta’s new async capabilities.
We are also using Firestrike Ultra and Superposition, the latter of which is also fairly responsive to the Titan’s dynamic ray-casting performance.
We are running the fan at 100% for all tests, with the power offset at 120% (max) for all tests. Clocks are changed according to their numbers in the charts.
As we work toward our inevitable hybrid mod on the nVidia Titan V, we must visit the usual spread of in-depth thermal, power, and clock behavior testing. The card uses a slightly modified Titan Xp cooler, with primary modifications found in the vapor chamber’s switch to copper heatfins. That’s the primary change, and not one that’s necessarily all that meaningful. Still, the card needs whatever it can get, and short of a complete cooler rework, this is about the most that can fit on the current design.
In this Titan V benchmark, we’ll be looking at the card’s power consumption during various heavy workloads, thermal behavior of the MOSFETs and GPU core, and how frequency scales with thermals and power. The frequency scaling is the most important: We’ve previously found that high-end nVidia cards leave noteworthy performance (>100MHz boost) on the table with their stock coolers, and suspect the same to remain true on this high-wattage GPU.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.