md raid0/linear doesn't show error state if an array member is removed and allows successful writes
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Bionic |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Disco |
Fix Released
|
High
|
Guilherme G. Piccoli | ||
Eoan |
Fix Released
|
High
|
Guilherme G. Piccoli |
Bug Description
[Impact]
* Currently, mounted raid0/md-linear arrays have no indication/warning when one or more members are removed or suffer from some non-recoverable error condition.
* Given that, arrays keep mounted, and regular written data to it goes through page cache and appear as successful written to the devices, despite writeback threads can't write to it. For users, it can potentially cause data corruption, given that even "sync" command will return success despite the data is not written to the disk. Kernel messages will show I/O errors though.
* The patch proposed in this SRU addresses this issue in 2 levels; first, it fast-fails written I/Os to the raid0/md-linear array devices with one or more failed members. Also, it introduces the "broken" state, which is analog to "clean" but indicates that array is not in a good/correct state. A message showed in dmesg helps to clarify when such array gets a member removed/failed.
* The commit proposed here, available on Linus tree as 62f7b1989c02 ("md raid0/linear: Mark array as 'broken' and fail BIOs if a member is gone") [http://
* One important note here is that this patch requires a counter-part in mdadm tool to be fully functional, which was SRUed in LP: #1847924.
It works fine without this counter-part, but in case of broken arrays, the
"mdadm --detail" command won't show broken, and instead will show "clean, FAILED".
* We ask hereby an exception from kernel team to have this backported to kernel 4.15 *only in Bionic* and not in Xenial. The reason is that mdadm code changed too much and we didn't want to introduce a potential regression in the Xenial version from that tool, so we only backported the mdadm counter-part of this patch to Bionic, Disco and Eoan - hence, we'd like to have a match in the kernel backported versions.
[Test case]
* To test this patch, create a raid0 or linear md array on Linux using mdadm, like in: "mdadm --create md0 --level=0 --raid-devices=2 /dev/nvme0n1 /dev/nvme1n1";
* Format the array using a filesystem of your choice (for example ext4) and mount the array;
* Remove one member of the array, for example using sysfs interface (for nvme: echo 1 > /sys/block/
* Without this patch, the array partition can be written with success, and "mdadm --detail" will show clean state.
[Regression potential]
* There's not much potential regression here; we failed written I/Os to bad arrays and show message/status according to it, showing the array broken status. We believe the most common "issue" that could be reported from this patch is if an userspace tool rely on success of I/O writes or in the "clean" state of an array - after this patch it can potentially have a different behavior in case of a broken array.
summary: |
- md raid0/linear don't show error state if an array member is removed + md raid0/linear don't show error state if an array member is removed and + allows successful writes |
summary: |
- md raid0/linear don't show error state if an array member is removed and - allows successful writes + md raid0/linear doesn't show error state if an array member is removed + and allows successful writes |
Changed in linux (Ubuntu Bionic): | |
status: | Confirmed → Fix Committed |
Changed in linux (Ubuntu Eoan): | |
status: | Confirmed → Fix Committed |
Changed in linux (Ubuntu Disco): | |
status: | Confirmed → Fix Committed |
There was a first attempt to mimic the behavior of NVMe/SCSI devices removed while holding a mounted filesystem: <email address hidden>/T/#u
It was quite complex, relying on force an unmount operation in this filesystem, stop writeback threads and remove the md block device. It got not many reviews, and most of them not favorable for it, hence we proposed a more simpler approach, hereby SRUEd.
The RFC cover-letter aforementioned details the issue to a large extent, so it may be interesting read for the interested parties in this issue.
Thanks,
Guilherme