Following the launch of 2GB cards, major board partners – MSI and EVGA included – have begun shipment of 4GB models of the GTX 960. Most 4GB cards are restocking availability in early April at around $240 MSRP, approximately $30 more expensive than their 2GB counterparts. We've already got a round-up pending publication with more in-depth reviews of each major GTX 960, but today, we're addressing a much more basic concern: Is 4GB of VRAM worth it for a GTX 960?
This article benchmarks an EVGA GTX 960 SuperSC 4GB card vs. our existing ASUS Strix GTX 960 2GB unit, testing each in 1080, 1440p, and 4K gaming scenarios.
EVGA GTX 960 4GB SuperSC Graphics Card Specs
GTX 960 | GTX 960 Strix | GTX 960 EVGA SuperSC 4GB | |
Base Clock (GPU) | 1126MHz | 1291MHz | 1279MHz |
Boost CLK | 1178MHz | 1317MHz | 1342MHz |
Mem Config | 2GB / 128-bit | 2GB / 128-bit | 4GB / 128-bit |
Mem Speed | 7010MHz | 7200MHz | 7010MHz |
Power | 1x6-pin | 1x6-pin | 6+2-pin |
TDP | 120W | 120W | >120W (?) |
MSRP | $200 | ASUS GTX 960 Strix $210 | EVGA GTX 960 4GB $240 |
First: Note that we will be performing a full review of EVGA's new GTX 960 4GB SuperSC card in short order, including overclocking and thermal performance tests. This is a more limited-scope article with one goal in mind: VRAM utilization analysis.
It's always important to dig through the marketing materials when performing any kind of product analysis. Marketed claims fuel purchases, after all, and need to be carefully tested for validation. EVGA's product page indicates that its SuperSC GTX 960 4GB card ($250) comes “outfitted with 4GB of high-speed GDDR5 memory, giving you higher texture qualities and better 4K performance.” The page goes on to restate this point a few times, primarily indicating high resolution performance as a tie-in to VRAM capacity.
Maxwell's Memory Subsystem Explained
We previously looked into Maxwell's memory subsystem during our GTX 980 review, which drilled into the current-gen platform's architecture. This topic was revisited within our Titan X preview content, though the memory architecture has not changed between the GM200 Titan X chip and GM204 980 chip.
Let's look at it a third time.
Before getting to memory, it's important to have a brief understanding of the GPU driving the GTX 960. The GM206 chip at the heart of all GTX 960 video cards is a slimmed-down version of the GM204, hosting the same underlying architecture and feature-set. The Maxwell SMM is divided into four blocks, each hosting 32 CUDA cores, making for 128 cores per streaming multiprocessor. The GM206 GPU is home to eight streaming multiprocessors, netting a total of 1024 CUDA cores (32 cores * 4 blocks * 8 SMs = 1024).
The reference GTX 960 shipped with 2GB of GDDR5 memory at launch, transacting on a 128-bit bus and effective 7010MHz operating clock (1753MHz native). Memory bandwidth calculations are explained in our GPU dictionary, but by dividing the memory bus width by eight (conversion to bytes) and then multiplying by the memory clock, then by 2 for DDR, then by 2 again for GDDR5, we get a memory bandwidth of 112.19GB/s. This makes the GM206 one of the most memory-limited nVidia GPUs on the market, though Maxwell's memory subsystem allegedly makes up for the on-paper limitations with compression optimization.
Even with only two 64-bit memory controllers (128-bit memory interface), the GM206's interface is capable of outperforming the 192-bit interface of the GK106 GPU; this is largely thanks to optimization in the memory pipeline by nVidia. Among other features, optimizations include memory compression techniques like third-generation delta color compression, which temporally analyzes the color delta between multiple frames, then applies updates using the delta value rather than absolute values. This, we're told, contributes to an overall 25% reduction in transacted bytes per frame vs. the Kepler architecture. Delta color compression is responsible for approximately 17-18% of this total reduction on its own, hence nVidia's derivation of an “effective 9Gbps” memory speed (equivalency to Kepler), although the hard spec for the GM206 is actually 7Gbps.
What Does Extra VRAM Actually Do?
Greater video memory capacities enable gaming scenarios that are asset-intensive and would otherwise saturate the framebuffer. Video memory is a volatile form of storage (like system memory) that fetches and retains highly frequented items for expedited access as the game calls upon those items. An example of something that might get stored in video memory would include texture files for the active game cell. Video memory is also fed assets like shadow maps, normal maps, and specular maps that are needed for every frame rendered; these items were discussed in our recent Crytek interview about physically-based rendering, if you're curious.
More limited memory capacity used to store textures of massive resolution could cache-out more frequently and would need to rapidly dump and fetch items on an on-demand basis. This action occurs as an exchange between the much slower system memory and the GPU's memory, increasing overhead and latency by getting forced through the PCI-e bus over long distances. Exceeding GPU memory results in somewhat staggering framerate drops that become jarring and noticeable to the user, something we'll discuss in the benchmark results below.
Memory Bandwidth Limitations & Solutions
Even with Maxwell's efficiency gains over Kepler, a 128-bit memory bus is still relatively small. This raises a concern of whether or not the GM206 is capable of transacting memory rapidly enough to adequately utilize a 4GB buffer. Overclocking – something we'll test in the forthcoming review – will increase memory bandwidth by way of increased clockrate, but the gains are yet unexplored. It is also up for investigation whether the GTX 960's overall performance output can sustain framerates high enough to encounter a memory bottleneck, rather than bottlenecks elsewhere in the pipe.
The Tests: Resolution Scaling & FPS Consistency
We conducted a large suite of real-world tests, logging VRAM consumption in most of them for comparative analysis. The games and software tested include:
- Assassin's Creed Unity (Ultra, 1080p only).
- Far Cry 4 (Ultra, 1080, 1440, 4K).
- Cities: Skylines (High, 1080, 1440, 4K).
- GRID: Autosport (Ultra, 1080, 1440, 4K).
- Metro: Last Light (Very High + Very High tessellation on 1080; Very High / High on 1440; High / High on 4K).
- Battlefield: Hardline (Ultra, 1080, 1440, 4K).
- 3DMark Firestrike Benchmark
We already know ACU and Far Cry 4 consume massive amounts of video memory, often in excess of the 2GB availability on our tested ASUS Strix card. GRID: Autosport and Metro: Last Light provide highly-optimized benchmarking titles to ensure stability on the bench. Battlefield: Hardline is new enough that it also heavily eats RAM, though we had some difficulty logging FPS in the game (explained below, along with our workaround). 3DMark offers a synthetic benchmark that is predictable in its results, something of great importance in benchmarking.
We added Cities: Skylines to the bench to diversify the titles tested. Our objective with these tests is to observe a sweeping range of titles that users would actually play, so moving away from the heavy focus on single-entity games (FPS & RPG titles) and into a zoomed-out management game assists in that. Cities: Skylines is among the most desirable titles to play on 4K resolutions right now; most of these other games exhibit various usability issues at 4K – like UI scaling improperly or textures stretching to the point of looking worse. At 4K resolutions, Cities looks better at greatly zoomed-out vantage points and allows the player to view more of the usable gaming space at any given time, ensuring a real-world, desirable 4K gaming scenario.
Games with greater asset sizes will spike during peak load times, resulting in the most noticeable dips in performance on the 2GB card as memory caches out. Our hypothesis going into testing was that although the two video cards may not show massive performance differences in average FPS, they would potentially show disparity in the 1% low and 0.1% low (effective minimum) framerates. These are the numbers that most directly reflect jarring user experiences during “lag spikes,” and are important to pay attention to when assessing overall fluidity of gameplay.
Continue to page 2 for test results.