Hardware Guides stub

Differences Between DDR4 & GDDR5 Memory

Posted on March 6, 2017

The finer distinctions between DDR and GDDR can easily be masked by the impressive on-paper specs of the newer GDDR5 standards, often inviting an obvious question with a not-so-obvious answer: Why can’t GDDR5 serve as system memory?

In a simple response, it’s analogous to why a GPU cannot suffice as a CPU. Being more incisive, CPUs are comprised of complex cores using complex instruction sets in addition to on-die cache and integrated graphics. This makes the CPU suitable for the multitude of latency sensitive tasks often beset upon it; however, that aptness comes at a cost—a cost paid in silicon. Conversely, GPUs can apportion more chip space by using simpler, reduced-instruction-set based cores. As such, GPUs can feature hundreds, if not thousands of cores designed to process huge amounts of data in parallel. Whereas CPUs are optimized to process tasks in a serial/sequential manner with as little latency as possible, GPUs have a parallel architecture and are optimized for raw throughput.

While the above doesn’t exactly explicate any differences between DDR and GDDR, the analogy is fitting. CPUs and GPUs both have access to temporary pools of memory, and just like both processors are highly specialized in how they handle data and workloads, so too is their associated memory.

DDR4 SDRAM

A higher speed and lower voltage successor to DDR3, DDR4 has been accepted as the current mainstream standard as many processors/platforms such as Skylake, Kaby Lake, Haswell-E, Z170, Z270, X99, and the upcoming Skylake-X and Ryzen have adopted DDR4. Much like a CPU, DDR4 is built to handle a bombardment of small tasks with low latency and a certain granularity. DDR4 is fundamentally suited to transferring small amounts of data quickly (comparatively speaking), at the expense of aggregate bandwidth. DDR4 bus width is 64 bits per channel, but is combinational; i.e., 128-bit bus width in dual channel. Additionally, DDR4 has a prefetch buffer size of 8n (eight data words per memory access), which means 8 consecutive data words (words can be between 8–64 bits) can be read and presciently placed in the I/O buffer. Also, the I/O interface is limited to a read (output from memory) or write (input to memory) per clock cycle, but not both. Below, we’ll discuss how these specs contrast with GDDR5.

GDDR5 SGRAM

GDDR5 is currently the most common graphics memory among the last couple generations of GPUs, but the newest version is GDDR5X, with it only being currently implemented on two cards: the GTX GeForce 1080 and Titan X (soon, 1080 Ti). Worth mentioning is HBM (High-Bandwidth Memory) used in some of the high-end Fiji GPUs by AMD. HBM 2 was ratified by the JEDEC in January of 2016 and is used in the the nVidia Tesla P100 and will presumably be used in the high-end Vega-based GPUs by AMD.

GDDR5 is purpose-built for bandwidth; e.g., moving massive chunks of data in and out of the framebuffer with the highest possible throughput. This is made possible by a much wider bus—anywhere from 256 to 512-bits across 4-8 channels. Albeit it comes at the cost of increased latency via much looser internal timings when compared to DDR4. Latency isn’t entirely an issue with GPUs, as their parallel nature allows them to move across multiple calculations simultaneously. Although GDDR5 has the same prefetch buffer size as DDR4 of 8n, the newest GDDR5X standard surpasses that with a depth of 16n (16 data words per memory access). Moreover, GDDR can handle input and output on the same clock cycle, unlike DDR. In addition, GDDR5 operates at a lower voltage than DDR4 at around ~1V, meaning less heat waste and higher performing modules. In small packages that are packed together densely, like on a graphics card PCB, lower heat is critical. System memory has the entire surface area of the stick to spread, and is isolated from high-heat components (like the GPU).

Evolution

DDR SDRAM doesn’t see the exponential growth that its graphics-based counterparts experience. Work began on DDR4 around 2005, but it didn’t come to market until 2014. DDR3 was launched in 2007 and is still widely used today.

The long gestation period can be attributed to a few factors. First, RAM vendors tend to compete on more of a price basis, rather than a performance basis. RAM is commoditized. Moreover, the RAM industry is not subjugated by only two competitors constantly trying to leap-frog one another. Secondly, new memory standards are developed and ratified by the JEDEC standards body, which is comprised of every memory maker in the world deliberating over new standards. Lastly, the memory industry, in respect to PCs, isn’t exactly clamoring for higher bandwidth—these days, RAM is seldom a bottleneck in performance desktop PCs.

There are many more catalysts for generational growth in regards to CPUs and GPUs, where development is largely spurred by one or two big manufacturers competing for market share. Additionally, the advent of the GPGPU pipeline (General Purpose Computing on Graphics Processing Units) is seeing GPU-accelerated computing become mainstream. This means that powerful GPUs are no longer desirable to gamers exclusively, with demand growing in several different computing domains. Thus, the race for advanced hardware and technology remains rampant, as more computing horsepower is needed for AI, deep learning, advanced image processing, financial modeling, data centers, etc.

Conclusion

While both DDR4 and GDDR5 share core technologies, one is not inherently better than the other; they are both effectually equipped to serve different purposes. There are few differentiators that come into play—as we’ve described here—but in trivial terms, it could be described as latency vs. bandwidth. CPUs are more cache reliant and efficient, and their cores run at much higher clock rate that that of GPUs. As such, CPUs don’t have to access system memory as frequently, but when they do, low latency is imperative. GPUs are less cache laden, but have smaller amounts of much faster memory, so it can typically be accessed much quicker. As such, high compute functions where throughput is key are offloaded to the video card and its VRAM.

Editorial: Eric Hamilton