Multi-GPU configurations have grown in reliability over the past few generations. Today's benchmark tests the new GeForce GTX 980 Ti in two-way SLI, pitting the card against the GTX 980 in SLI, Titan X, and other options on the bench. At the time of writing, a 295X2 is not present for testing, though it is something we hope to test once provided.
SLI and CrossFire have both seen a redoubled effort to improve compatibility and performance in modern games. There are still times when multi-GPU configurations won't execute properly, something we discovered when testing the Titan X against 2x GTX 980s in SLI, but it's improved tremendously with each driver update.
Previous GTX 980 Ti Review Content
- GTX 980 Ti benchmark & review.
- GTX 980 Ti video review.
- GTX 980 Ti overclocking performance a 19% gain.
- GTX 980 Ti OC video.
NVidia GeForce GTX 980 Ti Specs
GTX 980 Ti | GTX Titan X | GTX 980 |
GTX 780 Ti | |
GPU | GM200 | GM200 | GM204 | GK-110 |
Fab Process | 28nm | 28nm | 28nm | 28nm |
Texture Filter Rate (Bilinear) |
176GT/s | 192GT/s | 144.1GT/s | 210GT/s |
TjMax | 92C | 91C | 95C | 95C |
Transistor Count | 8B | 8B | 5.2B | 7.1B |
ROPs | 96 | 96 | 64 | 48 |
TMUs | 176 | 192 | 128 | 240 |
CUDA Cores | 2816 | 3072 | 2048 | 2880 |
Base Clock (GPU) | 1000MHz | 1000MHz | 1126MHz | 875MHz |
Boost Clock (GPU) | 1075MHz | 1075MHz | 1216MHz | 928MHz |
GDDR5 Memory / Memory Interface |
6GB / 384-bit | 12GB / 384-bit | 4GB / 256-bit | 3GB / 384-bit |
Memory Bandwidth (GPU) | 336.5GB/s | 336.5GB/s | 224GB/s | 336GB/s |
Mem Speed | 7Gbps | 7Gbps | 7Gbps (9Gbps effective - read below) |
7Gbps |
Power | 1x8-pin 1x6-pin |
1x8-pin 1x6-pin |
2x6-pin | 1x6-pin 1x8-pin |
TDP | 250W | 250W | 165W | 250W |
Output | 3xDisplayPort 1xHDMI 2.0 DVI |
3xDisplayPort 1xHDMI 2.0 1xDual-Link DVI |
DL-DVI HDMI 2.0 3xDisplayPort 1.2 |
1xDVI-D 1xDVI-I 1xDisplayPort 1xHDMI |
MSRP | $650 | $1000 | $550 now $500 | $600 |
A Word About How SLI Works
None of what's in this section is news. SLI has been around for a long time now – Wikipedia says 2004 – and the functionality has remained the same. Just for a quick refresher, we'll go over how SLI works at a top-level as a means to prep for the next section.
Scalable Link Interfaces allow the bridging of two or more same-model nVidia GPUs in a system. By using an SLI bridge, multiple video cards can be connected to share workload in the form of pixel processing and graphics computations, but using just one pool of VRAM from the primary card. AMD's version of this is called "CrossFire."
By using SLI, it is an inherent fact that more PCI-e lanes will be consumed from the PCH or CPU. Each video card uses a number of PCI-e lanes made available to it through the motherboard's PCI express slots and the platform's lane availability, generally in the form of x16/x16 or x8/x8, depending on the CPU and board. By looking at the pins present within the PCI-e socket (or by flipping the motherboard over - above), the maximum lane count supplied to that slot is revealed at a hardware level. If there are traces split to eight physical, metal pins filling half of the slot's pin-out, it's an x8 slot. The same is true for any other count. This stated, just because a slot offers sixteen pins does not ensure x16 support of a video card.
Even if a magical Z97 motherboard existed that offered four real PCI-e x16 slots – and there isn't one – Haswell is limited to just 16 PCI-e lanes on the CPU and 8 PCI-e lanes on the Z97 & H97 chipsets. Add 'em up, and that's just 24 total lanes – enough for an x8/x8 or x16/x8 setup, but not x16/x16. How, then, do some boards offer "x16/x16" SLI compatibility on LGA115X socket boards?
Some motherboards, like our test board from Gigabyte, will multiplex lanes to produce optimized (switching) lane availability through aftermarket PLX/PEX chips. Signal multiplexing receives multiple analogue signals and merges them into a single signal output, which is de-muxed on the receiving side when retrieiving the original data. It's not a perfect solution, but we can effectively simulate a higher lane count by multiplexing the signal to divert lane availability to straining devices as demand fluctuates. If all present devices are pushing maximum throughput, multiplexing doesn't resolve the issue of limited lane availability; normally, though, gaming use cases place greater load on one device and don't distribute load evenly between all present GPUs. For these instances, multiplexing can accelerate throughput by modulating the signal's strength (effective lane count) to each PCI-e slot.
DirectX 12 Changes the Memory Game
The new DirectX 12 API has already fallen upon our test bench, primarily when performing an API overhead test between Dx11, Dx12, and Mantle (soon Vulkan). DirectX 12 aims to resolve a number of CPU-binding issues, namely those pertaining to draw calls heavily loading the CPU, but it's also making changes to the ways GPUs and IGPs interact.
Dx12's Multiadapter feature will, when deployed by developers, enable communication between IGPs hosted on the CPU die and the dGPU (hosted on the video card). This puts Intel IGPs to use in dGPU systems and will hopefully make more scalable utilization of APU graphics than AMD's own dual graphics solution.
Dx12 also looks like it will allow VRAM pooling on multi-GPU configurations, bypassing the long-standing limitation that only the primary card would utilize its VRAM. Moving forward, this means multiple VGAs can be installed in an array to increase available VRAM – a desirable feature for users relegated to 2GB or 3GB devices. We're not yet entirely sure how this will work in games – obviously they've got to support Dx12 first – but it's a promising direction.
Continue to page 2 for benchmark results & charts.