Modern Challenges for SSD Controllers
According to LSI's Kent Smith, modern challenges for Flash controllers (or Flash Storage Processors) include the following:
All of these items are pretty directly tied to cost on both the consumer and enterprise level. Given that this is a gaming and enthusiast-driven site, we're going to focus on the consumer-side benefits.
Back when TLC was first announced on OCZ devices, the benefit for SSDs was obvious: TLC would reduce costs by 20-30%, from initial numbers, and would cascade into more affordable consumer-class SSDs. At a time when a 120GB SSD was in the $200 range, this was a big deal. Now, of course, a similar SSD can be had for around $100.
Challenges with TLC NAND Endurance / Lifespan
The rising concern in the industry was one of endurance. Most high-end consumer SSDs (840 Pro, OCZ Vector, Kingston HyperX) utilize MLC (multi-level cell) NAND Flash, whereas lower-end SSDs (the normal 840 or V300) operate on TLC, or triple-level cell NAND. MLC stores two bits per cell, TLC hosts three bits per cell. As voltage is regulated within the NAND during P/E (program/erase) cycles, the lifespan of the Flash degrades and the usable life of the drive is negatively impacted. In short, you'll have fewer program/erase cycles on SSDs with TLC NAND, and that means a shorter life. But TLC is cheaper, so we can't ignore that -- how, then, can we improve the lifespan of TLC to a point where endurance is no longer a concern for consumers?
The answer is in the controller -- whether it's SandForce, Samsung, OCZ, Intel, or someone else -- the controller is directly responsible for enhancing longevity. SandForce's SF3700 series flash storage processors hope to take this to the next level.
SATA is Saturated: SandForce is Pushing for PCI-e SSDs with Modularity
The SATA bus is another large obstacle to performance in modern storage devices. It's not news to anyone that storage is the single largest bottleneck in any machine.
SSDs alleviate a lot of that bottleneck (vs. their magnetic HDD brethren), but are still infinitely slower than RAM, the CPU, the PCI-E interface. The truth is, SSD speed is being throttled itself by the interface, and with no new SATA interface on the (immediate) horizon, it's most logical to look to using PCI-e. PCI-e SSDs already exist (see: RevoDrive), but the prices have made them inaccessible to the vast majority of the consumer market.
A lot of the price difference is on the manufacturing side. SSD manufacturers (
Now that we've outlined a few of the top-level obstacles, let's dive into more specifics on the SF3700 series controllers.
LSI SandForce SF3700 Block Diagram
What you're looking at above is a block diagram of the SF3700-series controllers and their planned modularity. Because the front-end is completely modular from the core and back-end technology of the FSP (Flash Storage Processor, or controller), the manufacturer can flip a bit to decide whether PCIe or SATA will be used.
We also see LSI's new SHIELD and existing RAISE technology on the back-end, which I'll discuss more below.
LSI SandForce SF3700 Controller Line-Up
You might have spotted ASUS' new ROG PCI-e SSD under the SF3739 category. We first got a hands-on with the new ASUS PCI-e SSD at CES, but were unfortunately given absolutely no other information -- we couldn't even photograph the back of the board. We weren't given any official comment on partnership when I brought up the image with LSI's presenters. That said, LSI and ASUS are partners and have done business in the past, so that coupled with the image means you can almost assuredly expect the SF3739 in the new ASUS PCI-e SSD.
As far as gamers are concerned, we're most interested in the SF3729 and SF3739. The enthusiast-class controller (SF3739) supports the PCI-e x4 interface, making for a higher theoretical maximum bandwidth. Mainstream client SSDs will still be SATA for the most part, but will now have the added possibility of PCI-e x2 connections.
LSI SandForce SF3700 Benchmarks vs. Samsung 840 Pro & XP941 SSDs
In preliminary internal testing and in SandForce's worst case scenario (100% data entropy - or less compressed/more random data), things are looking good:
At the time of presentation, LSI hadn't even yet been able to run a complete test suite, but given the data we have, it's very promising for the new controllers. On the PCI-e side (the red chart), the SF3700 SSDs handily outperform Samsung's highest-performing XP941 PCI-e controller at a multiplication factor of nearly 1.5x (the exception being write IOPS).
The sequential SATA performance is bus-limited, and aside from random write IOPS at 100% entropy, performance is on-par with or better than the competing 840 Pro device. When I asked LSI's Kent Smith what the random write performance with a more realistic 50% entropy would be, he estimated somewhere in the 70-80k IOPS range (or about equal).
SandForce SF3700 Latency Benchmarks (Enterprise-Focused)
This part is a bit more enterprise-driven than what consumers will worry about, but as queue depth increases, the latency interspersed between transactions increases and the datarate flatlines. The far right of each chart below is about a QD16 (or 16 queued transactions), with QD32 off of the chart.
The objective of the controller is to ramp into the data-rate before hitting higher latencies and larger queues, thus allowing a greater sustained datarate for prolonged transactions (like what a server would be responsible for).
With a higher datarate at the listed latencies, the overall responsiveness of the device will be improved.
To put things into perspective, most gamers will experience a QD2 to QD4 while gaming and performing other tasks. Servers max-out their queues regularly, depending on the type of data it handles.
Using DuraWrite & SHIELD to Improve Endurance
We've previously written articles that address the "What is Write Amplification Factor?" question, so this next piece is written with the assumption that you have a basic understanding of WAF. The linked article also covers overprovisioning, or the act of reserving drive space for improving endurance. Overprovisioned space is effectively a "reserve" of back-up storage components in the event of the death of a cell on your SSD. When a cell dies and there are no remaining cells to be reassigned to the drive, most SSDs will enter a write-locked (read-only) state temporarily before dying. Overprovisioning is a good thing.
LSI's DuraWrite technology has been on controllers for a while now, but the SF3700 controller sees improvements to DuraWrite's data reduction capabilities. On a top-level, DuraWrite works to reduce the total amount of data written to Flash, which in turn reduces the amount of "wear" to the cells that comprise the storage. By reducing the amount of wear to the cells, the device can survive more P/E cycles and yields a longer lifespan. LSI provided us with a graphic for the data writing pipeline as it pertains to DuraWrite:
So DuraWrite has improvements that enhance endurance, but what most impressed us with the SF3700 announcement was the SHIELD Error Correction technology (we weren't told what it stands for). SHIELD helps the NAND endure more faults as it ages. With basic overprovisioning, an SSD controller will reassign cells as existing cells are depleted and no longer able to retain data. SHIELD is effectively an 'adaptive' wear-leveling technique that pulls from overprovisioning to do more error correction.
In turn, this means your faulty cells (dying cells) will last longer, or have improved lifespan and endurance. Even though we're pulling from overprovisioning to perform more error correcting, the end result is still beneficial to the user: Faulty cells live longer, but can still be replaced by overprovisioned (reserved) cells when they fail.
But what happens if SHIELD fails and the drive becomes volatile, thus risking the stored data?
RAISE Returns: Redundant Protection from Catastrophic Failures
If SHIELD fails to protect the drive from a catastrophic fault, RAISE (Redundant Array of Independent Silicon Elements) comes in to protect the data. RAISE existed in the previous generation of SandForce controllers, but has been improved for SF3700 devices.
Using a ninth channel on the device, RAISE will now protect against full die failures by allocating an additional (reserved) die from overprovisioning to retain the data. This reduces overprovisioned space, but means your data survives the hit. By checking SMART attributes on the drive (many free tools exist, like HD Tune), the user will be alerted that RAISE 2 had to protect against a catastrophic failure and can now make the appropriate migration to a new drive. The idea isn't to keep using the drive -- it's just redundancy to give you time to move the data elsewhere, in practice.
Better Performance Scaling on Larger Capacity SSDs
This one's just a quick side-note, as we didn't get too in-depth during the conference.
In nearly all current SSDs, performance has a tendency to degrade with capacities (Flash die counts) that don't fully match-up with the channels. Take an example (these are not specific numbers, it is purely an example): If you have a 240GB drive with 8 dies and 8 channels, performance is optimal; step down to a 120GB drive, and you now have fewer dies with the same channels, so performance becomes sub-optimal and dips down.
In theory, the SF3700 series controllers should better allow lower die counts (lower capacities) without negatively impacting performance.
The future for LSI's SandForce controllers certainly seems bright. In the very least, we've got greater potential for PCI-e SSDs in the consumer market segment; at best, we have a higher-performance controller with greater endurance and fault tolerance capabilities. We'll get hands-on with the new controllers as manufacturers begin producing drives -- probably some time in 1H'14.
- Steve "Lelldorianx" Burke.