NVidia’s Turing architecture has entered the public realm, alongside an 83-page whitepaper, and is now ready for technical detailing. We have spoken with several nVidia engineers over the past few weeks, attended the technical editor’s day presentations, and have read through the whitepaper – there’s a lot to get through, so we will be breaking this content into pieces with easily navigable headers.
Turing is a modified Volta at its core, which is a heavily modified Pascal. Core architecture isn’t wholly unrecognizable between Turing and Pascal – you’d be able to figure out that they’re from the same company – but there are substantive changes within the Turing core.
Intel's 10nm CPUs may have had their last delay -- and it's through the 'holidays' of 2019. Intel's latest earnings call indicates a finalized release target of EOY/holiday of 2019, continuing the saga of 10nm delays since 2015-2016. Note, however, that although TSMC and GF 7nm comparisons are prevalent, it's not as simple as comparing the numbers "7" and "10" -- density matters, as does architecture, and this is something we discussed with David Kanter in an upcoming video interview from GamersNexus.
Other hardware news revolves around a mixture of rumors and actual news, the latter represented by AMD's best quarterly earnings report in 7 years, and the former represented by Intel 9000-series specs and Samsung GPU development.
As always, the show notes are below the video.
Revealed to press under embargo at last week’s GTC, the nVidia-hosted GPU Technology Conference, nVidia CEO Jensen Huang showcased the new TITAN W graphics card. The Titan W is nVidia’s first dual-GPU card in many years, and comes after the compute-focused Titan V GPU from 2017.
The nVidia Titan W graphics card hosts two V100 GPUs and 32GB of HBM2 memory, claiming a TDP of 500W and a price of $8,000.
“I’m really just proving to shareholders that I’m healthy,” Huang laughed after his fifth consecutive hour of talking about machine learning. “I could do this all day – and I will,” the CEO said, with a nod to PR, who immediately locked the doors to the room.
This test is another in a series of studies to learn more about nVidia’s new Volta architecture. Although Volta in its present form is not the next-generation gaming architecture, we would anticipate that key performance metrics can be stripped from Volta and used to extrapolate future behavior of nVidia’s inevitable gaming arch, even if named differently. One example would be our gaming benchmarks, where we observed significant performance uplift in games leveraging asynchronous compute pipelines and low-level APIs. Our prediction is that nVidia is moving toward a future of heavily support asynchronous compute job queuing, where the company is presently disadvantaged versus its competition; that’s not to say that nVidia doesn’t do asynchronous job queuing on Pascal (it does), but that AMD has, until now, put greater emphasis on that particular aspect of development.
This, we think, may also precipitate more developer movement toward these more advanced programming techniques. With the only two GPU vendors in the market supporting lower level APIs and asynchronous compute with greater emphasis, it would be reasonable to assume that development would follow, as would marketing development dollars.
In this testing, we’re running benchmarks on the nVidia Titan V to determine whether GPU core or memory (HBM2) overclocks have greater impact on performance. For this test, we’re only using a few key games, as selected from our gaming benchmarks:
- Sniper Elite 4: DirectX 12, asynchronous compute-enabled, and showed significant performance uplift in Volta over Pascal. Sniper responds to GPU clock changes in drastic ways, we find. This represents our async titles.
- Ashes of the Singularity: DirectX 12, but less responsive than Sniper. We were seeing ~10% uplift over the Titan Xp, whereas Sniper showed ~30-40% uplift. This gives us a middle-ground.
- Destiny 2: DirectX 11, not very responsive to the Titan V in general. We saw ~4% uplift over the Titan Xp at some settings, though other settings combinations did produce greater change. This gives us a look at games that don’t necessarily care for Volta’s new async capabilities.
We are also using Firestrike Ultra and Superposition, the latter of which is also fairly responsive to the Titan’s dynamic ray-casting performance.
We are running the fan at 100% for all tests, with the power offset at 120% (max) for all tests. Clocks are changed according to their numbers in the charts.
The nVidia Titan V is not a gaming card, but gives us some insights as to how the Volta architecture could react to different games and engines. The point here isn’t to look at raw performance in a hundred different titles, but to think about what the performance teaches us for future cards. This will teach us about the Volta architecture; obviously, you shouldn’t be spending $3000 to use a scientific card on gaming, but that doesn’t mean we can’t learn from it. Our tear-down is already online, but now we’re focusing on Titan V overclocking and FPS benchmarks, and then we’ll move on to production, power, and thermal content.
This nVidia Titan V gaming benchmark tests the Volta architecture versus Pascal architecture across DirectX 11, DirectX 12, Vulkan, and synthetic applications. We purchased the Titan V for editorial purposes, and will be dedicating the next few days to dissecting every aspect of the card, much like we did for Vega: Frontier Edition in the summer.
We took time aside at AMD’s Threadripper & Vega event to speak with leading architects and engineers at the company, including Corporate Fellow Mike Mantor. The conversation eventually became one that we figured we’d film, as we delved deeper into discussion on small primitive discarding and methods to cull unnecessary triangles from the pipeline. Some of the discussion is generic – rules and concepts applied to rendering overall – while some gets more specific to Vega’s architecture.
The interview was sparked from talk about Vega’s primitive shader (or “prim shader”), draw-stream binning rasterization (DSBR), and small primitive discarding. We’ve transcribed large portions of the first half below, leaving the rest in video format. GN’s Andrew Coleman used Unreal Engine and Blender to demonstrate key concepts as Mantor explained them, so we’d encourage watching the video to better conceptualize the more abstract elements of the conversation.
Every now and then, a content piece falls to the wayside and is archived indefinitely -- or just lost under a mountain of other content. That’s what happened with our AMD Ryzen pre-launch interview with Sam Naffziger, AMD Corporate Fellow, and Michael Clark, Chief Architect of Zen. We interviewed the two leading Zen architects at the Ryzen press event in February, had been placed under embargo for releasing the interview, and then we simply had too many other content pieces to make a push for this one.
The interview discusses topics of uOp cache on Ryzen CPUs, power optimizations, shadow tags, and victim cache. Parts of the interview have been transcribed below, though you’ll have to check the video for discussion on L1 writeback vs. writethrough cache designs and AMD’s shadow tags.
“Disillusioned and confused” could describe much of the response to initial AMD Vega: Frontier Edition testing and reviews. The card’s market positioning is somewhat confusing, possessing neither the professional-level driver certification nor the gaming-level price positioning. This makes Vega: FE ($1000) a very specifically placed card and, like the Titan Xp, doesn’t exactly look like the best price:performance argument for a large portion of the market. But that’s OK – it doesn’t have to be, and it’s not trying to be. The thing is, though, that AMD’s Vega architecture has been so long hyped, so long overdue, that users in our segment are looking for any sign of competition with nVidia’s high-end. It just so happens that, largely thanks to AMD’s decision to go with “Vega” as the name of its first Vega arch card, the same users saw Vega: FE as an inbound do-all flagship.
But it wasn’t really meant to compete under those expectations, it turns out.
Today, we’re focusing our review efforts most heavily on power, thermals, and noise, with the heaviest focus on power and thermals. Some of this includes power draw vs. time charts, like when Blender is engaged in long render cycles, and other tests include noise-normalized temperature testing. We’ve also got gaming benchmarks, synthetics (FireStrike, TimeSpy), and production benchmarks (Maya, 3DS Max, Blender, Creo, Catia), but those all receive less focus than our primary thermal/power analysis. This focus is because the thermal and power behavior can be extrapolated most linearly to Vega’s future supplements, and we figure it’s a way to offer a unique set of data for a review.
NVidia’s Volta GV100 GPU and Tesla V100 Accelerator were revealed yesterday, delivering on a 2015 promise of Volta arrival by 2018. The initial DGX servers will ship by 3Q17, containing multiple V100 Accelerator cards at a cost of $150,000, with individual units priced at $18,000. These devices are obviously for enterprise, machine learning, and compute applications, but will inevitably work their way into gaming through subsequent V102 (or equivalent) chips. This is similar to the GP100 launch, where we get the Accelerator server-class card prior to consumer availability, which ultimately helps consumers by recuperating some of the initial R&D cost through major B2B sales.
Our third and final interview featuring Scott Wasson, current AMD RTG team member and former EIC of Tech Report, has just gone live with information on GPU architecture. This video focuses more on a handful of reader and viewer questions, pooled largely from our Patreon backer discord, with the big item being “GPU IPC.” Patreon backer “Streetguru” submitted the question, asking why a ~1300~1400MHz RX 480 could perform comparably to an ~1800MHz GTX 1060 card. It’s a good question – it’s easy to say “architecture,” but to learn more about the why aspect, we turned to Wasson.
The main event starts at 1:04, with some follow-up questions scattered throughout Wasson’s explanation. We talk about pipeline stage length and its impact on performance, wider versus narrower machines with frequencies that match, and voltage “spent” on each stage.
We’ll leave this content piece primarily to video, as Wasson does a good job to convey the information quickly.
We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.