Following unrelenting rumors pertaining to its pricing and existence, nVidia's GTX 980 Ti is now an officially-announced product and will be available in the immediate future. The GTX 980 Ti was assigned an intensely competitive $650 price-point, planting the device firmly in a position to usurp the 780 Ti's positioning in nVidia's stack.
The 980 Ti redeploys the GTX 980's “The World's Most Advanced GPU” marketing language, a careful indication of single-GPU performance against price-adjacent dual GPU solutions. This video card takes the market positioning of the original GTX 780 Ti Kepler device in the vertical, resulting in the following bottom-up stack:
- GTX 980 4GB (now $500, reduced from $550 MSRP).
- GTX 980 Ti ($650 launch MSRP).
- GTX Titan X (still $1000, more or less).
Until Pascal arrives, nVidia is sticking with its maturing Maxwell architecture. The GTX 980 Ti uses the same memory subsystem and compression technology as previous Maxwell devices.
This GTX 980 Ti review benchmarks the video card's performance against the GTX 980, Titan X, 780 Ti, 290X, and other devices, analyzing FPS output across our suite of test bench titles. Among others tested, the Witcher 3, GTA V, and Metro: Last Light all make a presence.
NVIDIA GeForce GTX 980 Ti Benchmark & Dx12 Updates Video
(Note: At time of publication, the video was still pending upload and may be unavailable for a brief period).
NVIDIA GeForce GTX 980 Ti Specs vs. GTX 980, Titan X, & GTX 780 Ti
GTX 980 Ti | GTX Titan X | GTX 980 |
GTX 780 Ti | |
GPU | GM200 | GM200 | GM204 | GK-110 |
Fab Process | 28nm | 28nm | 28nm | 28nm |
Texture Filter Rate (Bilinear) |
176GT/s | 192GT/s | 144.1GT/s | 210GT/s |
TjMax | 92C | 91C | 95C | 95C |
Transistor Count | 8B | 8B | 5.2B | 7.1B |
ROPs | 96 | 96 | 64 | 48 |
TMUs | 176 | 192 | 128 | 240 |
CUDA Cores | 2816 | 3072 | 2048 | 2880 |
Base Clock (GPU) | 1000MHz | 1000MHz | 1126MHz | 875MHz |
Boost Clock (GPU) | 1075MHz | 1075MHz | 1216MHz | 928MHz |
GDDR5 Memory / Memory Interface |
6GB / 384-bit | 12GB / 384-bit | 4GB / 256-bit | 3GB / 384-bit |
Memory Bandwidth (GPU) | 336.5GB/s | 336.5GB/s | 224GB/s | 336GB/s |
Mem Speed | 7Gbps | 7Gbps | 7Gbps (9Gbps effective - read below) |
7Gbps |
Power | 1x8-pin 1x6-pin |
1x8-pin 1x6-pin |
2x6-pin | 1x6-pin 1x8-pin |
TDP | 250W | 250W | 165W | 250W |
Output | 3xDisplayPort 1xHDMI 2.0 1xDL DVI |
3xDisplayPort 1xHDMI 2.0 1xDual-Link DVI |
DL-DVI HDMI 2.0 3xDisplayPort 1.2 |
1xDVI-D 1xDVI-I 1xDisplayPort 1xHDMI |
MSRP | $650 | $1000 | $550 now $500 | $600 |
Familiar Maxwell Architecture & Memory Subsystem
Because the 980 Ti uses nVidia's existing Titan X GM200 GPU, nothing has changed in terms of processor architecture or its subsystems. The only real differences are illustrated by way of slightly reduced core count – 2816 CUDA cores versus 3072 CUDA cores on the TiX – and halved memory on-board.
For purposes of this review, everything we wrote about Maxwell's architecture and memory subsystems in our GTX 980 and GTX 960 reviews will hold true. We'll briefly recap just a few of the core items before moving on, though, just for a refresher.
This is the GM200 die. Hosted on this piece of silicon are 22 streaming multiprocessors, each with 128 CUDA cores. The total core count is 2816, but Titan X enables a few additional cores for a total of 3072. The GM200 used herein offers a 384-bit memory interface, operating at 7GHz (effective) memory clock. Extrapolating from this, the 980 Ti has a total of 336.5GB/s of memory bandwidth available for its 6GB GDDR5 VRAM. 96 ROPs and 176 TMUs are present, with 3MB of L2 Cache.
GM200 assigns 96KB of shared memory to each SM against Kepler's 64KB shared memory (780 Ti) that were also used as L1 Cache. GM200 hosts an additional 48KB pool of memory per SMM.
For reference, the GTX 980 hosts 2048 CUDA cores, a 256-bit interface, 64 ROPs, 128 TMUs, and a 224GB/s bandwidth.
The memory subsystem of Maxwell boasts the utilization of efficiency-focused compression techniques, one of which is called “Delta Color Compression.” These techniques are applied to graphics processing in a way that increases the performance efficiency of CUDA cores by roughly 40%. Within the gaming world, this means that lower CUDA core count – seen in the 780 Ti / 980 Ti disparity – and a smaller memory interface can still output equal or superior power. Outside of gaming, professional applications that don't make use of Maxwell's compression techniques may likely excel on an older device with greater core count. This exits the scope of our article, though, and won't be explored here.
The GM200 is not a full double-precision chip, excluding the Titan X and 980 Ti from their older Titan siblings in simulation use case scenarios.
Read more about how delta color compression, MFAA, and other Maxwell technologies work in our GTX 980 review.
DirectX 12_1 Feature Level Support
DirectX 12 deserves its own section in our review. Microsoft's new API is just around the corner, slated to replace Dx11 as Windows 10 rolls-out, and promises support for technologies that aim to improve graphics fidelity. We previously conducted an API overhead test that pitted Dx12 against Dx11 and Mantle (consumed to become Vulkan), but there's more to the API than bandwidth reduction and load balancing.
NVidia and Microsoft are making their rounds this week with discussions on Dx 12_1 feature level support items, like conservative rastering and volume tiled resources. It should be heavily emphasized that, although the 980 Ti is being marketed alongside these features, it is not required to take advantage of them. Any DirectX feature level 12_1 video card will support conservative raster and raster order views, with 12_0 supporting tiled resources. Just wanted to clear that up.
To quickly divert from the 980 Ti review, let's take a top-level look at conservative rastering and volume tiled resources.
Dx12 Conservative Raster & Volume Tiled Resources
Rastering analyzes the geometry of an in-game object and translates it into pixels. This is done by taking a sample from the center of the pixel – similar in part to how filtration techniques (AA, AF) sample pixels for colors – and then determines whether that pixel is inside or outside of nearby geometry. If the center of the pixel falls inside the geometry, the pixel will be represented by 'green' in the above image.
The problem with this approach is that pixels which are partially contained within geometry – but whose centers are over the line – will fall outside of the 'green' selection window above. This introduces an undesirable shimmering effect in the produced graphic, resulting in fine details losing fluidity and appearing more jagged. In the below image, a shadow is shown to illustrate the difference between a normal raster approach and conservative rastering.
You'll notice that the shadow's edges are jagged and 'pixelated' in one image. To avoid this in the future, conservative raster (Dx feature level 12_1) analyzes the pixels touching geometry borders, then includes any pixels in contact as “in” the geometry. These included pixels expand the border marginally and smooth the image drawn to the screen, going through any normal filtering processes alongside inclusion with the geometry.
The primary means to leverage conservative raster will be through shaded elements, where precision dictates the smoothness of a shadow's appearance in-game. NVidia points toward ray-traced shadows as a possibility when using conservative raster, an approach that can aim to replace shadow mapping in its sub-optimal use cases. Ray tracing – as with light and reflectivity – can be deployed to locate an object's edge, creating a shadow as necessary. Using Dx 12_1 and 12_0 features, game developers can use conservative raster to reduce the cost of deploying ray tracing by creating a data structure that informs the GPU on how it should locate object edges.
As for volume tiled resources, it's another upgrade to the ancient tiled resources graphics technique, something partially innovated by Richard Garriott (see that interview here).
Introducing the “volume” prefix to “tiled resources” changes things. Volume tiled resources allow the addition of a third parameter to game data structures – that parameter might be, consequently, volume – and aid in load reduction on the GPU when deploying assets for rendering. This is done by ensuring a scene is drawn only with the parts of the texture files that are required, restricting memory access (and processing power) only to assets that will be visually displayed to the user. Maybe a particular texture is substantially larger than the object it's applied to; in this case, volume tiled resources ensure that only the usable area of the texture is drawn and stored, reducing load requirement by eliminating the unused texture components from the pipeline.
NVidia likes to use sparse fluid simulation to best illustrate this discussion point. Smoke, as below, is vastly comprised of empty space (air) that doesn't need to be fed resources in games. The volumetric approach to tiled resources ensures that “focus” is afforded only to actual smoke particles, ignoring all empty air in the vicinity of the smoke cloud. More efficient processing is the end result.
Volume tiled resources allow items to be split into textured picture elements, known to most as texels, and limit focus in areas that are otherwise unoccupied or unnecessary to the scene.
Target Market: 1440, 4K, & VR
Back to the GTX 980 Ti.
The 980 Ti's target market is clearly the higher resolution gaming crowd. Virtual reality support was discussed during our time with nVidia, as seems to be the trend of late, but performant 1440p and 4K output was readily touted as a strong point.
The above image was provided by nVidia, but corresponds with some of our below testing. This shows the scaling effect of higher resolutions on previously released cards – like the 780 Ti, which has nearly a 3x disparity against the 980 Ti in high resolution performance. Closer to 1080p, the gap shrinks, but additional bandwidth and processing power through increased core efficiency (above) ensure greater 1440 to 4K performance on Maxwell architecture.
Continue to Page 2 for the benchmark results.