Hardware Guides stub

Shader Intrinsic Functions to Bypass Abstraction Layers (w/ Raja Koduri)

Posted on October 17, 2016

Abstraction layers that sit between the game code and hardware create transactional overhead that worsens software performance on CPUs and GPUs. This has been a major discussion point as DirectX 12 and Vulkan have rolled-out to the market, particularly with DOOM's successful implementation. Long-standing API incumbent Dx 11 sits unmoving between the game engine and the hardware, preventing developers from leveraging specific system resources to efficiently execute game functions or rendering.

Contrary to this, it is possible, for example, to optimize tessellation performance by making explicit changes in how its execution is handled on Pascal, Polaris, Maxwell, or Hawaii architectures. A developer could accelerate performance by directly commanding the GPU to execute code on a reserved set of compute units, or could leverage asynchronous shaders to process render tasks without getting “stuck” behind other instructions in the pipeline. This can't be done with higher level APIs like Dx 11, but DirectX 12 and Vulkan both allow this lower-level hardware access; you may have seen this referred to as “direct to metal,” or “programming to the metal.” These phrases reference that explicit hardware access, and have historically been used to describe what Xbox and Playstation consoles enable for developers. It wasn't until recently that this level of support came to PC.

In our recent return trip to California (see also: Corsair validation lab tour), we visited AMD's offices to discuss shader intrinsic functions and performance acceleration on GPUs by leveraging low-level APIs.

What Are Shader Intrinsic Functions?

Under traditional application programming interfaces, executing game code requires passing that code to the drivers and compilers, demanding two additional abstraction layers between the code and the hardware. Modern APIs have minimized this, but still force abstraction layers into the pipeline.

AMD's newest effort looks to offer game developers similarly low-level access to PC hardware as found with consoles, allowing developers to fine-tune process execution on the GPU. With GCN, this is theoretically relatively easy when porting titles between platforms – especially Xbox, given an already somewhat unified software infrastructure between Windows and Xbox.

Raja Koduri, SVP and Chief Architect of Radeon Technologies Group, explained AMD's take on shader intrinsic functions in our interview:

"The reason we are very much excited about it and we started the whole Mantle initiative – it was around reducing the overhead of the APIs, so we don't need as many CPU cycles to push a frame to the GPU and also take advantage of all the cores available in the system and all the resources available in the system. That was the intent of those APIs. There was a basis for that: game consoles were all based around low-overhead, low-level APIs. If you look at a game console and compare it to a PC, as a gamer, you pay $299, $399 and you get the whole box. If you compare the experience per dollar of the game console to the experience per dollar of the PC, the game console is far ahead. I'm not saying that PC games don't look better, but they cost a lot more – it's just experience per dollar is so much better.

“That bothered us all the time, that it's the same hardware, you have actually more hardware sitting on a PC – why is it that we can't give better experience per dollar on the PC? When you analyze it from a technical standpoint, all these software infrastructure issues come about. One is the API itself is high overhead. The developer has more control of the CPU cores and GPU on the console and not on the PC. The first step to that was, 'hey, can we do a low overhead API on the PC?' Which was a check with Dx12 and Vulkan. The next step is, there are a whole bunch of hardware features and instructions that are not exactly exposed through standard APIs like Dx12 and Vulkan. We see that these features are leveraged by the developer on the console, to get 10%, 30%, 40%, sometimes 2x performance, and when they port the game over to the PC, they say, 'ah, I was doing this on the console and I know it's the same GCN architecture, but I can't get access to this stuff.'

“That's the motivation for the shader intrinsic functions that we added to Vulkan, where [Bethesda] could map their algorithms that they were doing on console over to PC.”

Rather than take usual DirectX API commands and translate those commands to the driver, and then translating that output into machine code, shader intrinsics allow for the game code to more-or-less shoot straight over to the hardware. There are still compiler checks to ensure that the code is legal and within expected operating parameters (e.g. not malicious), but this removal of abstraction layers does two important things: (1) Removes extra steps from the process, which cost cycles, and (2) allows developers to specifically write code for execution on known GPU hardware, so that code is executing in the most optimized fashion possible. This new approach is comparable to in-line assembly programming, where known instructions can be passed along to the hardware without needing to spend as much time living in Compiler Land.

Expanding on this, Koduri told us:

“[Developers] can get at least as far as shader programs go – near equivalence to what they can do on a game console. It's cool that Vulkan allows us to extend the API to add these kind of features within our own schedule, within our own control. We don't have to ask permission of anybody other than getting the game developers to adopt it. So we're excited, and what you'll see us doing, from a game developer perspective, is you'll see us opening up the whole GPU Open initiative – and we can go more into that – but the idea was to make all the tools available on the PC. If something was available in the hardware, the developer needs to find a way to get to it. That was our goal. It's not always easy to provide the console-level access, but we're trying. We're opening up every door we can for them to have access to that. If you drive the experience per dollar for the PC – it doesn't need to match the console, but I think for now it's probably a factor of two or three, easily a factor of three off. If it gets closer, PC gaming will thrive even more. It's a rich ecosystem.

"With low overhead APIs, the benefit comes in application scenarios where you're CPU bound. With shader intrinsics, it's actually a GPU feature. It helps in a GPU-bound scenario. If my game is CPU-bound, then I need to find a better way to do the drawing. I still want to accomplish drawing the same picture, but can I do it in less number of cycles? Less number of instructions? What is the hardware capable of? Can I combine these four operations together into one interesting instruction that the hardware may have? That's where shader intrinsics kick in [when I'm GPU-bound], which is a very common scenario if you do anything 1080p or above – you're GPU-bound. You're CPU-bound at lower resolutions, but at 1080p and above, you tend to become more GPU bound. That's where the shader intrinsics kick in."

We'll be exploring shader intrinsics and other low-level hardware functions in greater depth as we continue to publish architecture deep dives. For now, we'd suggest looking into our AMD RX 480 review (for Polaris architecture and asynchronous compute) and the GTX 1080 review (for Pascal architecture and compute preemption). The second part of our interview with Raja Koduri will go live on Saturday (October 22) and will focus on GPU Open and achieving a “1,000,000x” increase in processing power within the next few decades.

Editorial: Steve “Lelldorianx” Burke
Video: Keegan “HornetSting” Gallick