Comment 30 for bug 1365874

Revision history for this message
Theodore Ts'o (tytso) wrote :

The blog post here: https://raid6.com.au/posts/fs_ext4_external_journal_caveats/

is not generally true. What journal_async_commit does is convert the sequence:

1. write journal blocks
2. cache flush
3. write journal commit block
4. cache flush

... to:

1. write journal blocks
2. write journal commit block
3. cache flush

This tends to make a lot of difference on HDD's from a performance perspective, because a cache flush commands are so expensive. On an SSD with a competently implemented flash translation layer (FTL), it shouldn't make much of a difference from a performance perspective, and it shouldn't hardly any difference from write endurance perspective.

The way flash works is that flash chips are organized into erase blocks, which might be say, 64k. This is the minimal size must be erased as a unit. Once an erase block is cleared (which is the slow operation) it can be written a flash page (typically 4k in size) at a time. Once a flash page is written, it can't be erased except by erasing the entire erase block. If most of the erase blocks are filled, either with real data, or with garbage (former data contents which have been superceded), then it might be necessary to copy the still-used data contents to other erase blocks, so that an erase block can be emptied so it can be erased. If it is necessary to do those extra copies before the erase block can be rebase, this is cause of the "write amplifification" effect.

However, doing an extra CACHE FLUSH operation (which is what journal_async_commit eliminates) should not make any difference on any competently implemented FTL on a normal SSD.

The place where it make a difference is on what gets referred to as "cost optimized" flash in polite company, or "crap flash" by more honest engineers. You will most often find this in eMMC flash or SD cards found in the cash register aisle of Micro Center (assuming of course, that you actually get honestly labelled flash as opposed to a SD card claiming to have 1G of flash, but which is only backed by 16 MB of flash --- such that the 16MB + 4k write will end up overwriting previously written data). In these "cost optimized" flash, the FTL may end up mapping each 64k erase page to a 64k LBA address space. In that case, a 4k write followed by a cache flush will end up being the equivalent of a 64k flash erase/write. In the even more awful "crappy flash", each 64k erase block is direct mapped to a 64k LBA address space. In that case, if you are constantly overwriting any portion of the flash (either the FAT table for FAT file systems, or the journal in ext4), then those erase blocks will get worn out first --- and once they are worn out, the flash device becomes broken.

But I emphasize that this is really only a problem for crap flash. For a normal SSD with a competent FTL, the use of journal_async_commit (or not using journal_async_commit) should not make any real difference to how long your flash device lasts.