Just recently, a well-known e-car manufacturer had to issue a recall for more than 100,000 vehicles. The reason for this was defective eMMC data memories. Some important factors must be taken into account when using data storage devices in order to avoid similar errors in the development of customer products.
Caution with eMMCs
Embedded systems require non-volatile memories (data memories whose stored information is retained permanently - even while the embedded system is not in operation or powered) to store the operating system, application and data.
Modern embedded systems use eMMCs (embedded multimedia cards) as storage media because of their good price/performance ratio. EMMCs are NAND flash based memories. NAND flash wears out through use over time. Therefore, special care is required here to ensure that the customer product also achieves the planned service life.
NAND flash memories basically have two properties to which special attention must be paid:
- Wear-Out (Write/Erase Endurance) - A memory cell can become defective if it has been written to too often.
- Data Retention - The more often a cell is written to, the shorter it can retain the data.
These two properties depend on how often a cell has been written to. Since data is written via the software, primarily via the application software, the write behavior of the customer application software ultimately determines how long the memory lasts.
How can a wear-out be estimated in terms of product runtime?
Quality reports, a document that must be requested from the manufacturer, contain the eMMC manufacturer-specific maximum number of write and erase cycles for a cell. A wear-leveling algorithm (a technique for extending the life of memories) in the eMMC ensures that all cells in the eMMC are written to approximately the same number of times, regardless of the write behavior of the software. As long as the maximum number of write and erase cycles, which is distributed approximately equally across the cells, is not exceeded, the eMMC guarantees that defective blocks in the memory can be replaced by spare blocks. Spare blocks are additionally provided blocks. They can replace defective blocks if necessary.
Typical values for the maximum number of write and erase cycles of MLC (multi-level cell) eMMCs are 3000 cycles. For an eMMC with 8GB, the maximum TBW (Terra Byte Written) can be calculated from this.
The formula is max. TBW = W/E Cycles * Capacity. So in our example 3000 cycles * 8GB storage = ~23 TB. This means that the software must not write more than the maximum calculated TBW over the planned product lifetime, so that a wear-out remains below 100%.
The fact that the cells contained in eMMCs are organized in pages and pages are organized in blocks, but only whole pages can be written and whole blocks can be erased, makes it imperative to also consider HOW the data is written by means of the correction factor WAF (Write Amplification Factor). Large, sequential writes, for example, result in a lower WAF, while random writes of small data blocks result in a higher WAF.
Typical values for WAF are between one and eight. Assuming that the Write Amplification Factor (WAF) is four, the following results.
Formula: Max. TBW = W/E Cycles * Capacity/WAF = 3000*8GB/4= ~6 TB. Due to the imprecision of WAF, this method is more suitable for calculating the wear-out in the design phase, when the application software is still being developed.
eMMC (embedded multimediacard) are NAND flash-based memories.
NAND flash refers to a type of flash memory manufactured using NAND technology. The individual memory cells are connected in series.
Memory cells can become defective if they have been written to too often. The degree of wear on the memory is referred to as wear-out.
Data retention refers to the time for which data is stored. In other words, how long data remains stored without being damaged. The more often a cell is written to, the shorter it can retain data.
Wear Leveling is a technique to extend the lifetime of flash memories.
spare blocks are additionally provided blocks in eMMCs. They can replace defective blocks if necessary.
"MLC" means Multi Level Cell. These are memory cells that store two bits per cell. The counterpart to this are SLCs, i.e. single-level cells, which only store one bit per cell.
"TBW" means Terra Bytes Written and provides information on how much data can be written to a memory in total during its lifetime.
TBW also takes into account the limited rewritability. The typical unit of TBW is terabytes.
WAF stands for Write Amplification Factor. It denotes the ratio of the data actually written to the flash cells to the data intended to be written by the host.
Analysis of the writing behavior
A far more accurate method for estimating the wear-out is to implement the eMMC manufacturer-specific CMD56 to read out the maximum erase count of a block. This method involves running the application software with a representative write behavior over a sufficiently long time (e.g. 14 days) and extrapolating max. erase count for the planned product lifetime. This results in a much more accurate estimation of the wear-out, since the WAF is also measured with this method. Ginzinger electronic systems also uses this method as standard for the Wear-Out estimation.
How we work
Data Retention in Embedded Systems
In practice, it often turns out that the second property of NAND flash memories in particular, namely the required data retention, is more difficult to maintain than the wear-out. Data retention refers to the time in which a cell can hold the data after writing. This is generally valid when the customer's product is switched off, i.e. de-energized, as well as when it is switched on.
Ginzinger electronic systems therefore uses so-called automatic background operations. This continuously checks in the background whether a cell is about to lose data. This is made possible by Error Correction Codes (ECC). If necessary, the cell is then rewritten and data retention is only relevant if the customer product is switched off.
The data retention for the planned product lifetime can be estimated from the characteristic curves contained in the quality reports and the maximum erase count determined beforehand. Typical values for data retention:
10 years at 0% wear-out
1 year at 100% wear-out
These are clearly dependent on the ambient temperature of the eMMC and must therefore be determined and included in the calculation.
What to do?
If the Wear Out / the required Data Retention cannot be maintained over the planned product lifetime
One obvious possibility is to write less data or to write the data in an optimized way, i.e. to reduce the WAF (write amplification factor). Another possibility is to configure the eMMC as a pSLC (pseudo singel level cell) instead of an MLC (multi-level cell). When configured as an MLC, one cell stores two bits, whereas with pSLC it stores only one bit. Although this reduces the capacity of the eMMC by 50% (4GB instead of 8GB), the maximum number of write and erase cycles of a cell is significantly increased. Typically in the range of a factor of six to ten, depending on the eMMC manufacturer. This significantly reduces wear-out and significantly increases data retention.
In summary, this means for developers that memory wear and tear, as well as holding data for a limited time, are issues that must be addressed with perfectionism. Otherwise, as in the case described above from the automotive world, even systems that work without problems for years can suddenly fail and pose a major risk.
Questions about the software?