Servers use flash memory based solid state drives (SSDs) as a high-performance alternative to hard disk drives to store persistent data. "Unfortunately, recent increases in flash density have also brought about decreases in chip-level reliability." This can lead to data loss.
This is the first large-scale study of actual flash-based SSD reliability and it analyzes data from flash-based solid state drives at Facebook data centers for about four years and millions of operational hours in order to understand the failure properties and trends. The major observations:
- SSD failure rates do not increase monotonically with flash chip wear, but go through several distinct periods corresponding to how failures emerge and are subsequently detected,
- the effects of read disturbance errors are not prevalent in the field,
- sparse logical data layout across an SSD's physical address space can greatly affect SSD failure rate,
- higher temperatures lead to higher failure rates, but techniques that throttle SSD operation appear to greatly reduce the negative reliability impact of higher temperatures, and
- data written by the operating system to flash-based SSDs does not always accurately indicate the amount of wear induced on flash cells