I was wrong about that. fsck -c -c -k ... had found 3 bad blocks, so I thought "ah! it was device error after all".
Having moved the bad blocks out of the way, I expected all to return to normal, and would have changed to moaning
about the complete lack of visible diagnostics (including in dmesg) about the occurrence of any IO error when writing
to the bad blocks. (It's possible that the IO ends up in the device cache and it's not until that's flushed to the drive that
any error is detected, and that's not communicated back to the host, so there's little the software can do.)
In fact, the system has continued on in the same old way. Just now:
[73851.280405] EXT4-fs error (device sda1): ext4_mb_generate_buddy:741: group 67, 6802 clusters in bitmap, 6777 in gd
[73851.280416] Aborting journal on device sda1-8.
[73851.280527] EXT4-fs (sda1): Remounting filesystem read-only
[73851.280541] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4550: Journal has aborted
[73851.280639] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4550: Journal has aborted
[73851.280836] EXT4-fs error (device sda1) in ext4_ext_remove_space:2790: Journal has aborted
[73851.280922] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4550: Journal has aborted
[73851.281005] EXT4-fs error (device sda1) in ext4_ext_truncate:4308: Journal has aborted
[73851.281093] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4550: Journal has aborted
[73851.281165] EXT4-fs error (device sda1) in ext4_orphan_del:2491: Journal has aborted
[73851.281313] EXT4-fs error (device sda1) in ext4_reserve_inode_write:4550: Journal has aborted
[73851.331505] EXT4-fs error (device sda1): ext4_mb_generate_buddy:741: group 128, 8453 clusters in bitmap, 8437 in gd
[73851.331513] EXT4-fs (sda1): pa f61a05e8: logic 2637, phys. 4209875, len 3
[73851.331516] EXT4-fs error (device sda1): ext4_mb_release_inode_pa:3607: group 128, free 3, pa_free 2
and after an fsck, it has reverted chunks of the file system because (presumably, not that it tells you anywhere) it has discarded the tail of the journal.
I was wrong about that. fsck -c -c -k ... had found 3 bad blocks, so I thought "ah! it was device error after all".
Having moved the bad blocks out of the way, I expected all to return to normal, and would have changed to moaning
about the complete lack of visible diagnostics (including in dmesg) about the occurrence of any IO error when writing
to the bad blocks. (It's possible that the IO ends up in the device cache and it's not until that's flushed to the drive that
any error is detected, and that's not communicated back to the host, so there's little the software can do.)
In fact, the system has continued on in the same old way. Just now: generate_ buddy:741: group 67, 6802 clusters in bitmap, 6777 in gd inode_write: 4550: Journal has aborted inode_write: 4550: Journal has aborted remove_ space:2790: Journal has aborted inode_write: 4550: Journal has aborted truncate: 4308: Journal has aborted inode_write: 4550: Journal has aborted del:2491: Journal has aborted inode_write: 4550: Journal has aborted generate_ buddy:741: group 128, 8453 clusters in bitmap, 8437 in gd release_ inode_pa: 3607: group 128, free 3, pa_free 2
[73851.280405] EXT4-fs error (device sda1): ext4_mb_
[73851.280416] Aborting journal on device sda1-8.
[73851.280527] EXT4-fs (sda1): Remounting filesystem read-only
[73851.280541] EXT4-fs error (device sda1) in ext4_reserve_
[73851.280639] EXT4-fs error (device sda1) in ext4_reserve_
[73851.280836] EXT4-fs error (device sda1) in ext4_ext_
[73851.280922] EXT4-fs error (device sda1) in ext4_reserve_
[73851.281005] EXT4-fs error (device sda1) in ext4_ext_
[73851.281093] EXT4-fs error (device sda1) in ext4_reserve_
[73851.281165] EXT4-fs error (device sda1) in ext4_orphan_
[73851.281313] EXT4-fs error (device sda1) in ext4_reserve_
[73851.331505] EXT4-fs error (device sda1): ext4_mb_
[73851.331513] EXT4-fs (sda1): pa f61a05e8: logic 2637, phys. 4209875, len 3
[73851.331516] EXT4-fs error (device sda1): ext4_mb_
and after an fsck, it has reverted chunks of the file system because (presumably, not that it tells you anywhere) it has discarded the tail of the journal.
This has become unusable.