Maxwell Memory Architecture: Third-Generation Delta Color Compression
This bit sounds pretty fancy.
Delta color compression is a specific process used when compressing data transferred to and from the framebuffer (GPU memory). Bandwidth is not “free;” as with all components, it’s critical to performance to ensure data is compressed to make the best use of buses, hopefully in a lossless fashion so that quality is not degraded.
Delta color compression improves memory efficiency through new compression technology. By compressing the data more heavily, less bandwidth is required and more data can be crammed down the pipe, ultimately resulting in a better-looking experience. Let’s take an example from GRID: Autosport, which we’ve benchmarked in the past:
This frame (n) has already been rendered by the GPU. The GPU already knows what’s present in the frame (having just drawn it) and can use this information when analyzing the next frame (n+1) in the gameplay sequence. Instead of drawing the absolute color values all over again in the next frame, we can use color compression to look at the delta (change) between values in each frame. The GPU looks at the next frame in the sequence and, instead of seeing our above image, sees something more like this:
The pink highlight shows the change in color between frames – either an object was moving too much, visually changed in appearance, or was granular enough to require additional work by the GPU, with each of these showing in non-highlighted appearance.
To recap: Analyzing the color change between successive frames minimizes demand on the GPU by avoiding exact (absolute) color value draws, instead opting for value change from base. NVidia’s whitepaper indicates that this process reduces bandwidth saturation by 17-18% on average, meaning we’ve got more memory bandwidth freed-up for other tasks. This also contributes to the effective memory speed: Although specified at 7Gbps GDDR5 memory speed, the GTX 980, 970, and other similarly-equipped Maxwell devices will perform equivalently to 9Gbps effective throughput, from what we’re told by nVidia.
In shorter form, you’d need to run DRAM at 9Gbps on Kepler in order to achieve the same effective bandwidth as 7Gbps on Maxwell.
Maxwell Technology: Multi-Frame Sampled Anti-Aliasing (What Is MFAA?)
MFAA is a new approach to the sampling patterns applied by the GPU when using anti-aliasing. The short of it is that higher anti-aliasing sample values are achievable with a significantly diminished impact to performance when compared to previous AA methodologies. For purposes of this review, nVidia was not able to fully launch MFAA in the GTX 980 driver package and supported games, so we’ll focus on overviewing the technology and showing a few prepared examples.
Performance tests will be conducted as soon as we receive the green-light from nVidia regarding stability of MFAA. We expect to have something more detailed online upon returning from Game24 next week.
Multi-Frame Sampled Anti-Aliasing varies anti-aliasing sample patterns to reduce load by analyzing the image across multiple frames, rather than collecting all samples on a single frame (which is what MSAA does). This is more applicable to the likes of Battlefield 4, Crysis 3, and similarly high-fidelity titles.
Let’s break this down using the industry-wide MSAA that’s found in most games. With 4xMSAA, you’re taking what can be thought of as four color samples per pixel; 8xMSAA would be eight samples per pixel; 16xMSAA would be sixteen samples per pixel, and so on. It becomes apparent why anti-aliasing is so abusive to the GPU when we consider that each frame has millions of pixels (1920 x 1080 = 2,073,600 pixels). Higher sample values improve accuracy of color when drawing the pixel because we can point at more locations on the object; if there’s a piece of geometry that has some color change going on, sampling the geometry numerous times in the applicable location will ensure a smoother transition between colors when the final frame is drawn. 0xAA will give us a much harder edge to things, whereas 4x-16x AA helps create a more visually-appealing and smoother image.
Let’s look at an image:
The above is what we’ll use to explain MFAA. On the right side, we have the relatively dominant MSAA that you’ve all likely seen within game settings at some point. Each square within the larger block is a pixel. The dissecting red line can be thought of as a piece of geometry that is obscuring our view of an object (or the borderline of a piece of geometry that the GPU is drawing).
With 4xMSAA, the GPU is taking four samples per pixel and then determining a color value for that pixel. For the top two pixels, all four samples are collected within a black area; that pixel is drawn as black by the GPU. The bottom left pixel has one sample outside of the geometry being analyzed (think of this as “white” for simplicity) with three samples on the “black” part of the pixel. That pixel – because 3/4 samples are on black and 1/4 is on white – is drawn as an accurate dark gray. The grayscale output in this simplified example could be even more granular with a higher sample value. The bottom right pixel is the opposite story: 3/4 samples land on white (beneath our slicing geometry line) and 1/4 lands in black. This pixel is drawn as mostly white.
The entire process of collecting four samples per pixel is GPU-intensive, which is why AA is one of the first things to drop in order to gain performance. By sampling millions of pixels per frame, we’re bogging down the GPU with requests that could hamper framerate. The best way to mitigate this impact is to decrease the sampling value – to 2x, for example.
All of this is done on a frame-by-frame basis. Here’s another image:
Focus on the left side this time. Notice that now we have two frames being sampled – frame n and frame n-1. MFAA samples frame n two times per pixel and does the same for frame n-1, but in different locations; you’ll notice that between the two frames, we’re sampling a total of four times and doing so in the same sample locations as 4xMSAA. Maxwell then blends frame n and n-1 together, producing the 4xMFAA set of four pixels. The 4xMFAA and 4xMSAA blocks are essentially the same image, so image quality doesn’t necessarily inherently improve just by using MFAA. It will improve indirectly, though: 4xMFAA would allegedly have performance effectively equivalent to 2xMSAA, it’s just spaced out over two frames to reduce load on the GPU. This means that users would be able to run higher MFAA values than MSAA values, indirectly improving graphics quality.
Maxwell Technology: Dynamic Super Resolution (DSR) Scaling
NVidia opened its DSR presentation with a Dark Souls II screenshot, stating simply: “Making it go faster isn’t really going to do you any good,” indicating the already high-FPS on the moderate-looking RPG. The game is already ‘capped’ for performance. More frames won’t add much at this point, so nVidia is instead seeking to improve visual quality. NVidia went on to highlight that most of its gamers (~95%!) are still using 1080 or 1440 displays.
Maxwell is capable of what nVidia calls “Dynamic Super Resolution,” which is a specialized and better-performing alternative to super-sampling. DSR renders the game at 4K resolution and then scales the output down to the native display resolution. The result is greater visual clarity and more defined edges / shadows. Here’s a pair of screenshots I took using Trine – the left is DSR (4K scaled), the right is a native 1080 output:
This all sounds an awful lot like super-sampling, though.
DSR renders out the frame at 4K resolution and filters it down onto the 1080p native display. Dynamic Super Resolution increases the sample rate – effectively a 4x sample rate increase – and then writes the 4K frame to the framebuffer. After the frame is written to memory, the GM204 uses a 13-tap Gaussian filter that’s built specifically for this task. The result is a higher image quality that eats performance fairly similarly to what a true 4K display would; there’s not much performance lost during the filtration process – the most notable performance hit is from the increased resolution.
Super-sampling also lacks an automated process. Someone has to hack together a fake resolution to force the GPU into rendering it, and once the display interface has been hacked to produce the higher resolution, a filter scales the image 1:1 to a 1080p display. This isn’t a lossless process. Much of the gain is potentially lost during the downscale and hacked output. NVidia tells us that their process equates less than a 1% performance hit after the obvious hit from 4K rendering.
In our testing of DSR, it appears to have very real, noticeable visual enhancements to gaming output. We tested the performance hit on the 980 below.
Maxwell Technology: VR Direct
I won’t dive too deeply into this one; we’ll save that for future articles that discuss VR more specifically. The short of VR Direct is that nVidia recognizes virtual reality as a significant force in the market right now. As such, the company has worked to tackle pervasive latency issues on the GPU-side when dealing with motion sensing and real-time movement through a VR space.
The trouble with latency in virtual reality – especially VR with motion sensing – is that it can produce an internally jarring side effect in the brain, eyes, and inner ear when the screen (output) does not match the user movement (input) precisely. This is where virtual reality sickness arises.
Maxwell architecture is now deploying “asynchronous warp” to aid in latency reduction. You can think of the rendered environment as having a sort of slight “buffer” around it; when the users moves his or her head, Maxwell adjusts the frame in-flight without re-rendering the entire frame. This makes it a sort-of pseudo-predictive technology that reduces latency by about 50%, from what nVidia told us (we cannot test VR without the right hardware).
Continue to the final page for our video card test methodology and 980 / GM204 GPU benchmark.