thank you for taking action immediately. I really appreciate your effort.
After investigating the issue further I have to add that the mount option discard seems to trigger the issue, too.
@Trent
The general problem here is that RAID10 can balance single read streams to all disks (which is probably the major advantage over RAID1 effectively providing you RAID0 read speed; RAID1 needs parallel reads to achieve this).
That said it is no big surprise that several machines at our site went to readonly mode after *some time* (probably reading some filesystem relevant data from the "bad disk"). Unfortunately the "clean first disk" only happens if you act immediately, otherwise you might have some data corruption.
I verified this on one system where the root partition was affected using the debsums tool (just run debsums -xa) after fixing FS errors.
My procedure to recover was:
Assembly of the RAID:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127
Filesystem check on all partitions (note the -f parameter, some FS "think" they are clean):
fsck.ext4 -f /dev/VolGroup/...
Re-add the second component:
mdadm --zero-superblock /dev/nvme1n1p2
mdadm --add /dev/md127 /dev/nvme1n1p2
Hi Matthew and all,
thank you for taking action immediately. I really appreciate your effort.
After investigating the issue further I have to add that the mount option discard seems to trigger the issue, too.
@Trent
The general problem here is that RAID10 can balance single read streams to all disks (which is probably the major advantage over RAID1 effectively providing you RAID0 read speed; RAID1 needs parallel reads to achieve this).
That said it is no big surprise that several machines at our site went to readonly mode after *some time* (probably reading some filesystem relevant data from the "bad disk"). Unfortunately the "clean first disk" only happens if you act immediately, otherwise you might have some data corruption.
I verified this on one system where the root partition was affected using the debsums tool (just run debsums -xa) after fixing FS errors.
My procedure to recover was:
Assembly of the RAID:
mdadm --assemble /dev/md127 /dev/nvme0n1p2
mdadm --run /dev/md127
Filesystem check on all partitions (note the -f parameter, some FS "think" they are clean):
fsck.ext4 -f /dev/VolGroup/...
Re-add the second component:
mdadm --zero-superblock /dev/nvme1n1p2
mdadm --add /dev/md127 /dev/nvme1n1p2
Best regards