Evolving NAND Flash Types: What is TLC? MLC? How Does TLC NAND Work?
So we've dissected the NAND Flash modules.
NAND Flash types often appear in marketing text as a sort-of battle of TLC vs. MLC. Both have their advantages and disadvantages, as with any technology, but a whole helluvalot more goes into SSD selection -- like the right controller, which can seriously impact the life of a drive. We'll get to those choices soon.
SLC was the first type of NAND readily available on SSDs (primarily enterprise); it's known for high endurance (10k, 20k, even more P/E cycles), performance, and its unattainable price for consumers. SLC devices heavily dominate the enterprise market where high-reliability and uptime are critical to success (web servers, database servers). MLC and TLC have made their way into the consumer market as more affordable options.
SLC stands for single-level cell, MLC for multi-level (two) cell, and TLC for triple-level cell. All of these acronyms are representative of how many bits of data are stored per cell and how many voltage levels are stored per cell. The different NAND types impact your cost-per-GB, total capacity, speed, and endurance of the device.
That's the rapid primer. Another one of our graphics should help out:
Meet a sample single-level cell. This cell can store a single bit of data, but that's OK, because there are typically billions of cells per SSD. In the above SLC sample, we've got two possible voltage levels for the cell: 1 and 0 -- binary. When a charge is sent to the cell, it returns either a 1 or a 0 for the voltage check. Because there are only two possible voltage outcomes, the device runs with greater stability and performance than the more populated multi-level and triple-level cells, each of which stores exponentially more voltage levels and requires greater electrical precision at the time of checking.
More voltage levels make for a slower device given the added complexity, but also make for increased possibility of error when checking the voltage of a cell. When the voltage level is misinterpreted, a bit error is thrown and stability is potentially compromised. Controllers mitigate this significantly, as I've explained above, making TLC and MLC viable consumer options. The stability and performance downsides are largely outweighed by affordability and capacity in consumer use cases, and because consumers don't hammer drives to the extent servers do, the downsides don't necessarily impact the majority of users.
Here's what MLC and TLC look like:
MLC contains two bits of data per cell and hosts four voltage levels; TLC contains three bits of data per cell and hosts eight voltage levels -- it's exponential (2^1, 2^2, 2^3). At three bits of data per cell, each cell offers significantly more capacity than SLC and MLC counterparts (3x SLC, ~30% more than MLC). This means the cost-per-GB can be reduced because less physical hardware is required to store more data, resulting in the affordable SSDs we've got on the market today. Higher die yield also impacts price.
Here's another look at voltage levels:
As Kent Smith describes with hand motions in the video, there are different "readpoints" between the voltage levels on the cell. SLC is definitive: One readpoint between level 0 (L0) and level 1 (L1). MLC features a readpoint between each of its four voltage levels, so three total readpoints. TLC (not shown, as this is an old graphic) would feature 7 readpoints. The more readpoints we cram into a cell, the more granularity required at the electrical level (voltages) to accurately check the bit; more granularity in the same amount of space means more room for error. Because we're fitting more bits onto a cell, the individual cells will take more of a beating (accessed more frequently) than the single-bit SLC counterparts, which impacts endurance. The electrical charge stored weakens as the device ages, the NAND gets "worn out" from being hammered for more bits per cell, and the SSD "dies" much sooner.
Controller manufacturers have implemented ways to mitigate this, like increased overprovisioned space with the "reserve NAND" I described above. Other means of reducing impact to the NAND include decreasing the Write Amplification Factor (WAF), which we've explored in depth previously. By downsizing WAF to below 1x, all data written to the device is heavily compressed and causes less negative impact to endurance. SandForce calls its version of WAF reduction "DuraWrite," which reduces data written to the flash. SandForce Gen3 will introduce "SHIELD" technology to its devices, which is an error correction method that pulls from overprovisioning to perform more error correction as the device ages.
The reason I point all of this out is to remove some concern of endurance as a consumer. Although there are very clear disadvantages to Flash types that store more bits per cell, the advantages and affordability outweigh those when coupled with a reliable controller. Most users in our audience will not deplete an MLC SSD before upgrading the system entirely, and the same goes for TLC. The performance hit is less noteworthy than endurance, but that's strictly because SSDs brush up against the SATA interface limit to begin with; it'd be more noticeable on an SSD using a faster interface.
What's Next for NAND: VNAND & 3D NAND
Just before CES 2014, Flash and DRAM manufacturer Samsung announced its plans for "3D NAND" as a next step after TLC. This isn't something we'll see any time soon in the consumer market, but it is worth talking about in preparation for the day.
The trouble with TLC is that it's packing so many bits onto a cell that endurance and stability are threatened. Accuracy with voltage levels is more of a concern, thermals are more of a concern, and speed is more of a concern. To resolve this, Samsung has proposed that NAND should be stacked vertically ("3D NAND" or "VNAND), distantly similar in concept to Intel's 3D transistors. This is analogous to real estate in heavily-populated areas: An apartment rise will fit more residents in a square area than single-family homes. Samsung even uses this analogy in its own video, found here:
It's still far too early to discuss the architecture and storage/endurance implications of this technology, so we'll leave it there.
SSD Death from Disuse
Watchers of our initial video may have noticed that death from disuse was a very brief discussion point. As an SSD ages and depletes P/E cycles, its charge and ability to differentiate between voltage levels also decays. Setting a moderately-to-heavily used SSD on a shelf for a year without use may result in data loss at next access, especially if the device has entered its read-only state after P/E cycle depletion. Kent Smith likened this to a car battery, explaining that leaving the device unused for long periods of time is similar to leaving a car unused -- when you go to turn it on, the charge (and data) is drained. Unlike a car battery, though, consumers don't have any way to "recharge" the cells to regain access.
That's a lot of information for our first serious SSD architecture article. Please comment below and let me know if you have any further questions about any of this! I'll do my best to answer questions below, but do keep in mind that we've got plenty more follow-up pieces planned. Your questions could very well shape that content.
Our sincere thanks to LSI, Samsung, and Kingston for their fact-checking of this article and engineering knowledge.
- Steve "Lelldorianx" Burke.