How SSDs Work: Architecture, TLC vs. MLC NAND, & Endurance

By Published June 07, 2014 at 10:56 am

SSD Capacity, Channel Population, and Impact on Speed

Let's talk about the anatomy of a NAND Flash module and its impact on storage capacity. This is where the different fab processes and facilities can differ, but we retain overall standardized terminology and fundamentals across the consumer-class of SSDs. I'll be using Micron's newest 16nm MLC Flash NAND process as an example for this section, which is the very same NAND being used in Crucial's affordable SSDs that were released just recently.

Breaking down that name is a good starting block: "16nm MLC Flash NAND."

16nm - The "fab process" or physical size of the features on the silicon. This is regularly seen when discussing new GPUs and CPUs, like the jump from 32nm to 28nm process. The number is indicative of the smallest silicon incision on the semiconductor that the factory can make (think of this as a "cut" or "slice" of silicon). The size is normally directly correlated to the length of the gate for MOSFETs.

MLC - The NAND Flash type, which we'll define next in a large, separate section. These are often listed as "MLC" or "TLC" in consumer SSD applications, but also exist in "SLC" form for expensive Enterprise uses. The future of NAND includes 3D NAND (or VNAND), covered more in depth in the NAND Flash Types section below. "MLC" stands for multi-level cell, TLC for triple-level cell, and SLC for single-level cell. Again, the precise meaning of these words will be explored shortly.

With that defined, we can move on to channels, dies, and capacity calculations. Here's another very simple graphic using Micron's 128Gb Flash:

simple-ssd-2

In this one, we've split the Flash dies into four pieces. Most modern controllers (like Gen3 SandForce, the newest Samsung controllers) operate at peak performance when speaking to a maximum of four dies per channel. We've got a fully-populated channel and controller configuration here, so the hypothetical Flash controller will be under maximum spec.

Capacity is measured in gigabits (Gb) for Flash and DRAM modules. There are eight bits in a byte, so 128Gb is equal to 16GB (128 / 8 = 16). With eight channels and four dies per channel, we've got 16GB * 4 dies * 8 channels = 512GB. You'll notice that the image is labeled as "480GB SSD," though, so we've got 32GB missing. That's because all SSDs reserve some of the Flash for "overprovisioning," which I've previously discussed in-depth here and here, in the SandForce Gen3 announcement. The industry average for consumers is about 7% overprovisioned space, but this varies greatly depending on a particular drive's target user.

We're not going to dive too deep into overprovisioning in this post -- mostly because I've already done it -- but will save that for the next article in the series. I don't want to make things too overwhelming in part one.

The very short of it is this: Overprovisioned (OP) space is used for background "swapping," part of the wear-leveling and garbage collection processes, and ensures extended life of the SSD. The overprovisioned space on modern SF Gen3 controllers (and presumably the impending Samsung controllers) can also be "called into action" from the reserves when part of the NAND goes bad. A single bad block can render an entire SSD useless, so this sort of "back-up NAND" that is reserved in OP would champion error-heavy blocks and continue operating normally. Using SMART attributes, we are able to determine once this call to the reserves has happened. It is always advisable to replace the device immediately after the reserve begins seeing use.

With our hypothetical controller being fully saturated on its eight channels with 32 dies, speed, efficiency, and capacity will be at their maximum spec. Because NAND is now being fabricated at the 128Gb capacity, it is likely that the cost of 480GB SSDs will drop as they become more desirable and standardized in the industry; we saw this same shift from 120GB to 240GB just recently. The reason 64GB SSDs have largely vanished is tied to the NAND capacity: it is no longer cost-effective for manufacturers to make 64GB SSDs, especially given the decrease in performance with fewer dies on the PCB. A 64GB SSD with 128Gb Flash would be 4x128Gb dies, not even a quarter of the maximum channel saturation of a modern controller. The controller would be hamstrung.

But there's a lot more to it than this.

What's Inside the NAND Flash Module - Planes, Blocks, & SSD Architecture

Here's another graphic I made to aid this discussion:

ssd-architecture-1

The green SSD makes a return, but this time, we're looking inside its components. We're still using Micron's 16nm, 128Gb Flash in this example. You'll see that the black blocks still represent Flash modules, but now we're expanding one into a "plane," which is then expanded into a "block," then expanded into "pages." This is how an SSD organizes its data storage.

Micron's NAND we're referencing uses a dual-plane architecture, so it's got two planes per die. Each plane hosts 1024 blocks, and each block hosts 512 pages that are 16KB large. If you do the math here, that's 512 pages * 16KB capacity = 8MB per block; 8MB * 1024 blocks = 8GB of data per plane; 8GB * 2 planes = a 16GB die, or 128Gb. This is then multiplied by the number of dies present on the device to achieve the capacity. Using our previous 32-die sample SSD, that'd be 16GB * 32 dies = 512GB, minus overprovisioning to equal 480GB.

This organization structure is a requirement given the way SSDs operate. NAND storage is handled electrically, so the modules have a limited lifespan measured in program/erase cycles (P/E). The NAND begins to lose its charge as the device ages, eventually sending voltages to the gates that can be "misinterpreted" as the incorrect voltage level. This is the catalyst for a "bit error." Error Checking & Correction (ECC) can resolve a number of bit errors, but once bit errors become a commonality, the NAND inevitably enters a locked, read-only state. This state is preserved for a set amount of time to allow for data retrieval (depends on manufacturer), after which point the SSD will shut off for good.

That's where it gets problematic without a controller-based solution. SSDs can only write in pages, which are measured at about 16KB in modern architecture. This means all of your data is broken into 16KB chunks (or smaller) and written to "pages" within a block on the NAND. The controller can later move this data around as it sees fit as a part of its wear-leveling and garbage collection processes. Even though an SSD can read from and write to small, 16KB pages, it can only erase entire blocks at a time.

Once any single block has been erased a total number of times that exceeds the lifespan of the NAND, that block is effectively electrically dead; a single dead block can render an entire SSD useless. Luckily for us, there are means to prevent uneven usage of an SSD and stave electrical death. Controllers use "wear-leveling" as a means to ensure all blocks are programmed and erased at an equivalent pace, which looks something like this:

ssd-garbage-2

This graphic shows a page being moved around within a block (left). In magnetic storage, the disk writes in an outward pattern from the center. This is because there is a physical header that must move around a spinning platter, so keeping data closer to the center makes it physically more accessible to the header as it seeks, increasing performance. With an SSD, it doesn't matter where the data is stored because it's all accessed electrically; this is the same reason fragmentation does not exist on an SSD, and is why an SSD should never be defragmented.

Remember that SSDs can only erase entire blocks at a time, so to make most efficient use of the P/E cycles, it doesn't make sense to erase a block that's only got a few pages on it. This is where wear-leveling comes into play by rotating the pages across blocks to ensure more even distribution as the device is programmed and erased. Some SSDs will advertise as having "3K P/E" cycles (take this HyperX 3K example), which means that the NAND is rated for 3,000 programs and erases before it exits spec and enters "use at your own risk" territory. To put things into perspective, most users will exhaust the usable life of the system before an SSD of similar endurance is depleted.

Continue to the third and final page to learn about NAND Flash types, how MLC & TLC work, and what comes after TLC.


Last modified on August 04, 2016 at 10:56 am
Steve Burke

Steve started GamersNexus back when it was just a cool name, and now it's grown into an expansive website with an overwhelming amount of features. He recalls his first difficult decision with GN's direction: "I didn't know whether or not I wanted 'Gamers' to have a possessive apostrophe -- I mean, grammatically it should, but I didn't like it in the name. It was ugly. I also had people who were typing apostrophes into the address bar - sigh. It made sense to just leave it as 'Gamers.'"

First world problems, Steve. First world problems.

We moderate comments on a ~24~48 hour cycle. There will be some delay after submitting a comment.

Advertisement:

  VigLink badge