Thursday, July 23, 2015

First Large Scale, In Field SSD Reliability Study Done At Facebook

First Large Scale, In Field SSD Reliability Study Done At Facebook. Adam Armstrong. Storage Review. June 22, 2015.
    Carnegie Mellon University has released a study titled “A Large-Scale Study of Flash Memory Failures in the Field.” The study was conducted using Facebook’s datacenters over the course of four years and millions of operational hours. The study looks at how errors manifest and aim to help others develop novel flash reliability solutions.
Conclusions drawn from the study include:
  • SSDs go through several distinct failure periods – early detection, early failure, usable life, and wearout – during their lifecycle, corresponding to the amount of data written to flash chips.
  • The effect of read disturbance errors is not a predominant source of errors in the SSDs examined.
  • Sparse data layout across an SSD’s physical address space (e.g., non-contiguously allocated data) leads to high SSD failure rates; dense data layout (e.g., contiguous data) can also negatively impact reliability under certain conditions, likely due to adversarial access patterns.
  • Higher temperatures lead to increased failure rates, but do so most noticeably for SSDs that do not employ throttling techniques.
  • The amount of data reported to be written by the system software can overstate the amount of data actually written to flash chips, due to system-level buffering and wear reduction techniques.
The study doesn’t state one type of drive is better than another.

Related posts:

No comments: