One video card to the next. We just reviewed MSI's R9 390X Gaming 8GB card at the mid-to-high range, the A10-7870K APU at the low-end, and now we're moving on to nVidia's newest product: The GeForce GTX 950.
NVidia's new GTX 950 is priced at $160, but scales up to $180 for some pre-overclocked models. The ASUS Strix GTX 950 that we received for testing is a $170 unit. These prices, then, land the GTX 950 in an awkward bracket; the GTX 750 Ti holds the budget class firmly below it and the R9 380 & GTX 960 hold the mid-range market above it.
The new GeForce GTX 950 graphics card hosts Maxwell architecture – the same GM206 found in the GTX 960 – and hosts 2GB of GDDR5 memory on a 128-bit interface. More on that momentarily. The big marketing point for nVidia has been reduced input latency for MOBA games, something that's being pushed through GeForce Experience (GFE) in the immediate future.
This review benchmarks nVidia's new GeForce GTX 950 graphics card in the Witcher 3, GTA V, and other games, ranking it against the R9 380, GTX 960, 750 Ti, and others.
NVidia GTX 950 Specs & ASUS Strix 950 Specs
GTX 950 | GTX 950 Strix | GTX 960 | GTX 750 Ti | |
Fab Process | 28nm | 28nm | 28nm | 28nm |
GPU | GM206 | GM206 | GM206 | GM107 |
Transistor Count | 2.94B | 2.94B | 2.94B | 1.87B |
Graphics Processing Clusters | 2 | 2 | 2 | 1 |
Streaming Multiprocessors | 6 | 6 | 8 | 5 |
CUDA Cores | 768 | 768 | 1024 | 640 |
TMUs | 48 | 48 | 64 | 40 |
ROPs | 32 | 32 | 32 | 16 |
Base Clock (GPU) | 1024MHz | 1165MHz | 1126MHz | 1020MHz |
Boost Clock (GPU) | 1188MHz | 1355MHz | 1178MHz | 1085MHz |
Memory Speed | 6.6Gbps | 6.6Gbps | 7.0Gbps | 5.4Gbps |
L2 Cache Size | 1024K | 1024K | 1024K | 2048K |
VRAM | 2GB GDDR5 | 2GB GDDR5 | 2GB, 4GB | 2GB |
Memory Interface | 128-bit | 128-bit | 128-bit | 128-bit |
Memory Bandwidth | 105.6GB/s | 105.6GB/s | 112.16GB/s | 86.4GB/s |
Texture Filter Rate | 49.2GT/s | 49.2GT/s | 72.1GT/s | 43.4GT/s |
Power Connectors | 1 x 6-pin | 1 x 6-pin | 1 x 6-pin | None |
TDP | 90W | 90W | 120W | 60W |
Display Connectors | 3 x DP 1.2 1 x HDMI 2.0 1 x DL DVI |
3 x DP 1.2 1 x HDMI 2.0 1 x DL DVI |
3 x DP 1.2 1 x HDMI 2.0 1 x DL DVI |
1 x DP 1.2 1 x HDMI 1 x DVI-I |
Form Factor | Dual Slot | Dual Slot | Dual Slot | Dual Slot |
Price | $160.00 | $170.00 | $200 (2GB) $240 (4GB) |
$120.00 |
The reference design for nVidia's GTX 950 calls for a 1024MHz base clock and 1188MHz boost clock, but pre-overclocked options will be available in abundance at launch. We'll discuss one of those below.
At $160, the GTX 950 is priced most directly against AMD's $150 R7 370 (which we don't have – but is effectively identical to the R9 270X in performance, a card we do have). The next closest devices of note are the GTX 960 2GB and R9 380 2GB for an extra $40 or the GTX 750 Ti for a $30-$40 price reduction. The 950 rests firmly between the two 'sweet spot' price-points, making for awkward market positioning.
ASUS Strix Differences
There's only so much that a board partner can do to differentiate from other options. They're all sourcing the GPU from the same manufacturer, after all.
For ASUS' Strix video cards, the difference is made in the form of cooling and noise focus. The Strix deploys two quiet fans atop a large, aluminum fin heatsink. Two large, 10mm copper heatpipes run through the sink and to the silicon. High-quality chokes and capacitors reinforce the VRM and power design, mitigating the chance for coil whine or choke 'hum' when under load. ASUS rates its capacitors for a 50k hour lifespan.
More noteworthy, though, is the substantial pre-overclock over the reference GTX 950. The reference clock of 1024 / 1188MHz (boost) is quickly outpaced by ASUS' 1165 / 1355MHz (boost) pre-OC. This aggressive pre-overclock increases thermals, but the Strix & DirectCU cooler should take care of that. We'll look into this in the thermal benchmarking.
GM206 Maxwell Architecture
GM206 is old, as is Maxwell, at this point. We discussed Maxwell in greater depth in our GTX 980 review. We'll recap a few key features and republish some of our previous content.
Maxwell focuses on power efficiency and core efficiency, something nVidia was criticized for on its Fermi GPUs. Maxwell's primary method for increasing efficiency is to perform more on-card compression and minify operations, like Delta Color Compression (below); Maxwell cores are approximately 40% more efficient than Kepler cores when it comes strictly to gaming tasks, accounting for the smaller memory interface by more intelligently and heavily compressing data.
This gaming efficiency gain comes at a reduction to production performance, though, as the raw core count and memory interfaces have diminished against Kepler heavyweights. For gamers, the optimization is nothing but good.
Pasted from our GTX 980 review:
Delta color compression is a specific process used when compressing data transferred to and from the framebuffer (GPU memory). Bandwidth is not “free;” as with all components, it’s critical to performance to ensure data is compressed to make the best use of buses, hopefully in a lossless fashion so that quality is not degraded.
Delta color compression improves memory efficiency through new compression technology. By compressing the data more heavily, less bandwidth is required and more data can be crammed down the pipe, ultimately resulting in a better-looking experience. Let’s take an example from GRID: Autosport, which we’ve benchmarked in the past:
This frame (n) has already been rendered by the GPU. The GPU already knows what’s present in the frame (having just drawn it) and can use this information when analyzing the next frame (n+1) in the gameplay sequence. Instead of drawing the absolute color values all over again in the next frame, we can use color compression to look at the delta (change) between values in each frame. The GPU looks at the next frame in the sequence and, instead of seeing our above image, sees something more like this:
The pink highlight shows the change in color between frames – either an object was moving too much, visually changed in appearance, or was granular enough to require additional work by the GPU, with each of these showing in non-highlighted appearance.
To recap: Analyzing the color change between successive frames minimizes demand on the GPU by avoiding exact (absolute) color value draws, instead opting for value change from base. NVidia’s whitepaper indicates that this process reduces bandwidth saturation by 17-18% on average, meaning we’ve got more memory bandwidth freed-up for other tasks. This also contributes to the effective memory speed: Although specified at 7Gbps GDDR5 memory speed, the GTX 980, 970, and other similarly-equipped Maxwell devices will perform equivalently to 9Gbps effective throughput, from what we’re told by nVidia.
In shorter form, you’d need to run DRAM at 9Gbps on Kepler in order to achieve the same effective bandwidth as 7Gbps on Maxwell.
(End re-published content).
The original GM206 (GTX 960) was divided into two GPCs with eight streaming multiprocessors (SMs). Each SM is home to 128 CUDA cores. This netted 1024 CUDA cores on the GTX 960. For the GTX 950, GM206 has been simplified for cost and power reasons – the GTX 950 GM206 hosts just six SMs (still two GPCs), bringing core count down to 768.
For comparison purposes, the first generation Maxwell card – the GTX 750 Ti – shipped with one GPC and five SMs, resulting in just 640 CUDA cores. To this day, the GTX 750 Ti is still a strong performer for gamers interested in titles that may not be as graphics intensive as, say, the Witcher 3 or GTA V.
MOBA Input Latency on the GTX 950 – A New Marketing Direction
This is nVidia's co-marketing attempt with the GTX 950. Although diminishing MOBA input latency is going to be something you read about a lot today, it is critical to note that the advancements made with input latency are not mutually exclusive with the GTX 950. Everything you're about to read is being pushed through GFE, meaning all modern Maxwell cards will benefit from the improvements. You will not have to “downgrade” a higher-end Maxwell card to the GTX 950 in order to reduce input latency.
Here's what we're talking about:
The GTX 650 card, a relative old-timer, exhibited an input latency of approximately 80ms in DOTA2. When we say “input latency,” we're talking about the delay between a physical button press and the execution of that action within the game engine. Clicking the mouse, then, has an 80ms delay between its in-game impact in DOTA2 with a GTX 650. Note that we specify in-game impact as opposed to observed impact, which would add additional human processing time.
Specifically looking at the GTX 950, we're told that input latency has been reduced to 45ms by way of driver and pipeline optimizations.
The simplified graphics pipeline is limited in its simultaneous actions. Buffers get filled and must be cleared before progress can be made. The game engine passes its active frame to the API – DirectX 11 or 12, in this case – which performs further scene processing before sending the frame along to the GPU. The GPU is generally the longest poll in the pipe, but if you look closely at the pipeline, you'll see that there are two frames simultaneously in transit to the GPU – DX1 and DX2. These frames are prebuffered (or double-buffered) for output and improved smoothness and performance of some games, but aren't necessary in all games.
By halving the prebuffer to just one frame, input latency can be reduced substantially without a noticeable detriment to gaming performance for some games. Those games, nVidia tells us, just happen to be the most competitive on the market: DOTA2, League of Legends, and Heroes of the Storm. In more engrossing games that offer more gemoetric complexity and demand more of the GPU, cutting the prebuffered frames in half could reduce smoothness and diminish depth of experience. For a MOBA, though, the latency reduction outweighs any marginal differences in the (already simplistic) visual department.
We previously discussed prebuffered frames in Watch Dogs, suggesting that readers with severe performance and mouse input issues drop the setting to '1.'
Continue to Page 2 for the gaming benchmarks.