What is the “Texture Fill-Rate” on a GPU and Does it Matter?

By Published December 27, 2014 at 12:00 am
  •  

As a part of our new website design – pending completion before CES – we've set forth on a mission to define several aspects of GPU technology with greater specificity than we've done previously. One of these aspects is texture fill-rate (or filter rate) and the role of the TMU, or Texture Mapping Units.

When listing GPU specifications, we often enumerate the clockrate and TMU count, among other specs. These two items are directly related to one another, each used to extrapolate the “texture filter rate” of the GPU. The terms “Texture Fill-Rate” and “Texture Filter Rate” can be used interchangeably. For demonstration purposes, here is a specifications table for the GTX 980 (just because it's recent):

NVIDIA GeForce GTX 980 & 970 Video Card Specs

  GTX 980
GTX 970 GTX 780 Ti 
GPU GM204 GM204 GK-110
Fab Process 28nm 28nm 28nm
Texture Filter Rate
(Bilinear)
144.1GT/s 109.2GT/s 210GT/s
TjMax 95C 95C 95C
Transistor Count 5.2B 5.2B 7.1B
ROPs 64 64 48
TMUs 128 104 240
CUDA Cores 2048 1664 2880 
BCLK  1126MHz 1050MHz  875MHz 
Boost CLK  1216MHz 1178MHz  928MHz 
Single Precision  5TFLOPs  4TFLOPs  5TFLOPs
Mem Config  4GB / 256-bit  4GB / 256-bit  3GB / 384-bit
Mem Bandwidth 224GB/s 224GB/s 336GB/s
Mem Speed  7Gbps
(9Gbps effective - read below) 
7Gbps
(9Gbps effective) 
7Gbps 
Power  2x6-pin  2x6-pin  1x6-pin
1x8-pin 
TDP  165W  145W  250W 
Output DL-DVI
HDMI 2.0
3xDisplayPort 1.2
DL-DVI
HDMI 2.0
3xDisplayPort 1.2
1xDVI-D
1xDVI-I
1xDisplayPort
1xHDMI
MSRP $550 $330 $600

The GM204 Maxwell GPU advertises a texture filter rate of 144.1GT/s (or gigatexels per second), a TMU count of 128, and a core clock of 1126MHz. The formula is simple, but we'll get there momentarily. The GM204 GPU filters 128 texels per clock (Int & FP16), or 1 texel per TMU per clock, something we'll explain shortly.

The texture filter rate of the GPU is representative of how many pixels (specifically 'texels') the GPU can render per second. This value is always represented as a measurement over time (1s). A 144.1GT/s texture fill rate comes out to 144.1 billion texels (textured picture elements) per second.

At its inception, the fill-rate was a simpler spec to define: It represented the count of "complete" pixels (that is, pixels that have been completely filtered) that can be stored in the framebuffer (GPU memory). To this end, the texture fill-rate was strictly representative of the number of on-screen pixels that were filtered and written to the buffer.

With modern hardware, off-screen textures can be filtered and rendered before the user even sees them to ensure a smoother transition and framerate when the camera pans next frame. In talking with some video game artists in the industry (cheers, Mike Pickton), we were also able to ascertain a few instances where off-screen texel pre-filtration and rendering could benefit developers. One of these instances includes reflections, whereupon an off-screen object (a tower) may be reflected by an on-screen reflective surface (water, metal). The reflective surfaces of cars and mirrors are other easy examples.

What is a Texel?

texel-polygonsTexels comprising a texture. Image source.

The word “pixel” is a shorthand adaptation for “picture element,” or a single dot of the many millions comprising two-dimensional screen space. Texels are “texture elements,” commonly called “textured pixels,” and are representative of a 'dot' in three-dimensional object space. Texels comprise a texture. Texels are mapped to objects and models during a process called texture mapping, which applies color, bitmaps, and textures / rasters to the 3-dimensional polygons, then assigns texels to the corresponding pixels.

Texture filtering kicks-in when determining the correct texture color for each individual texel, ultimately ensuring a sharper and more accurate texture application. Texels and pixels do not always fall perfectly within one another due to various viewing angles and distances dictated by the camera (player). This imperfect alignment introduces the demand for texture filtering, which uses one of many computational approaches to calculate the correct color for each pixel (isotropic & anisotropic filtering methods). A more accurate color selection (better filtering) minimizes the chance of jaggies and shimmering, which can be seen in most games when looking at thin strands of grass, fence posts, and other small modelex objects.

980-dsr-1The left image shows shimmering grass due to the level of precision required for such a small object.

Alternatively, rendering the screen at a higher resolution than native and then filtering it down to native resolution (i.e. render at 4K, filter down to 1080p) will reduce jaggies at significant cost to performance. This approach has been introduced with more popularity recently in the form of AMD's VSR and nVidia's DSR.

The Math: Formula for Calculating the Texture Fill Rate

This is pretty easy. To extrapolate the texture filter rate, we can use the following formula:

Texture Filter Rate = Core Clock * TMUs.

In the case of the GTX 980's GM204 chip, that would be 128 TMUs * 1126 = 144128. Note that the 1126 clock speed is measured as MHz, or millions of oscillations per second, so that'd actually be 128 * 1126MHz = 144.1GT/s; in other numbers, 128 * 1126 * (1000/s) = 144.1GT/s.

Although this formula is straight-forward for some modern hardware, AMD has a different approach to the texture fill-rate. A GM204 GPU (bilinearly) filters 128 texels per clock cycle (integer or floating point 16), effectively 1 texel per TMU per clock cycle (resulting in the numbers above). A Hawaii GPU filters 176 texels per clock cycle (INT) and just 88 texels per clock cycle (FP16), so depending on the task at hand and type of filtration, the theoretical max texture fill rate will vary on this hardware.

Note: The architecture is vastly different between competing manufacturers and the texture fill-rate cannot be linearly compared without factoring-in other technologies.

For additional example, the old VooDoo2 and RIVA TNT graphics processing architecture – this would be c. 1999 – did not process a clean 1 texel per 1 unit either. This was during a time when texture fill-rate was the “in” spec to capitalize on when developing marketing materials, similar to how modern monitors advertise contrast ratio numbers to a point of irrelevance. The VooDoo2 hosted two texture mapping units (TMUs) and had a clockrate of 90MHz, so using our formula above, that'd be a theoretical max of 180Megatexels/s. In actuality, however, each of the two VooDoo2 TMUs had to work together simultaneously to process the same texel (they could not alternate texel processing).

This was great if the game utilized a technique called “dual texturing” (applying multiple textures to a single polygon, with each texture occupying the same pixel space), because then one TMU could work on “Texture A” while the second TMU works on “Texture B” for the same pixel space. Because this was in an era where some games would apply a single texture to a single pixel, sometimes the second TMU would sit about doing nothing as the workload was only enough for one TMU to perform (hiring two people to perform a one-person job).

The takeaway is that although the VooDoo2 had a theoretical maximum texel filter throughput of 180MT/s, in most instances it would be somewhere between 90 and 180 MT/s depending on type of filtration and texture application.

In other words, all the marketing reflected only the higher numbers, but those higher numbers were not representative of 100% of use cases. Texture fill-rate is not always as high in practical use as advertised. And that's not the only reason, as we'll find out below.

Why So Many Texels?

msft-texel-1Image source: Microsoft.

144.1GT/s, or 144,100,000,000 texels per second, seems like an awfully large number. Despite potentially seeming unattainably high, it should be noted that this is a theoretical maximum texel throughput under ideal conditions. A GPU does a lot more than texture filtration. Other bottlenecks will present themselves prior to exceeding the texture fill rate of high-end hardware, in most instances.

Most users will never come close to exceeding that 144GT/s pipe. Let's assume a 4K screen resolution. That'd be 3840x2160 pixels, or nearly 8.3 million pixels. Consider next that we're attempting to draw these pixels at a minimum frequency of 60 FPS (60 times per second), and now we're at 498 million pixels drawn per second. Texture filtering becomes more complex and demanding when using different filtration technologies in games, like bilinear, trilinear, and anisotropic filtering.

We won't get into what each filter technique does in this article for scope reasons, but you're effectively multiplying the pixel count by an additional factor of X (4X anisotropic filtering samples 4 times per pixel to determine the correct texel color and provide a sharper image for the current viewing angle).

Throw an extra 4X filtering on there for good measure, and now we're at just 1.99B pixels per second. Modern game graphics effects will begin to more heavily saturate this pipe in other ways, like off-screen texture filtering / rendering for use in an on-screen fashion (as described above with pools and cars).

We can exceed a billion – even 3 billion – texels per second at such high resolutions with high-sample filter techniques and a higher FPS, but that still isn't going to hit 144.1GT/s. The GTX 980 will also throttle on other components before ever approaching such a number. Even though some software will demand higher texture fill-rate before taxing other components (like shaders or ROPs), most GPUs will crumple under the load of rendering a high-fidelity graphics at 4K before reaching texture fill-rate pipe limitations.

So... what?

Texture fill-rate doesn't matter as much as it did in the days of the VooDoo2 and TNT architecture; at least, not for gaming. VooDoo2 and TNT were also introduced at a time when screen resolution had vast changes and improvements pending, making pixel count more relevant. Although an AMD Eyefinity setup could tax the TMUs in a more threatening fashion, it's generally the case that the majority of PC gamers will run into other bottlenecks first – including ROPs, memory, or even the CPU at some point. Game optimization is also an imperfect beast that introduces its own limits that may precede hardware limitations.

Let us know what random aspect of computer hardware you'd like to learn about next by tweeting at us or commenting below.

- Steve "Lelldorianx" Burke.

Last modified on January 02, 2015 at 12:00 am
Steve Burke

Steve started GamersNexus back when it was just a cool name, and now it's grown into an expansive website with an overwhelming amount of features. He recalls his first difficult decision with GN's direction: "I didn't know whether or not I wanted 'Gamers' to have a possessive apostrophe -- I mean, grammatically it should, but I didn't like it in the name. It was ugly. I also had people who were typing apostrophes into the address bar - sigh. It made sense to just leave it as 'Gamers.'"

First world problems, Steve. First world problems.

  VigLink badge