Comment 28 for bug 557429

Revision history for this message
ceg (ceg) wrote :

I see that we were stumbling about confusing wording in mdadm.

Upon disappearance, a real failure, mdadm --fail or running an array degraded: mdadm -E shows *missing* disks marked as "removed". (What you probably referred to all the time.) Even though nobody actually issued "mdadm --removed" on them. (What I referred to.)

After a manual --fail (disk already marked "removed" now) however you still need to explicitly --remove to unbind a disk from an md device, and one must --fail before --remove is possible ("md device busy")

All would be clearer if
* mdadm -E would report "missing" instead of removed (which sounds like it really got "mdadm --removed")
* "mdadm --remove"ing would not require a prior manual --fail and only this would really mark disks as "removed" in the superblocks.

> I suppose that you could avoid marking the missing disk as removed when
> degrading the array, then --incremental could try to add it again later
> automatically.
> If the disk has not been tampered with then it would be
> resynced, hopefully quickly with the help of the write intent bitmap.

I think --incremental is supporting auto re-adding already since years. And since auto re-adding is a reallity and an important feature, relabeling the "removed" mark into "missing" should remove the confusion.
(Auto re-adding is broken in ubuntu, though (outside of initramfs for disks set up during initramfs), because the map file is not kept Bug #550131)

> In this case where the other disk has also been modified, the conflict
> can be easily detected because the first disk says the second disk is
> failed, and the second disk says the first disk is failed. If the
> second disk was not also degraded then it would still show both disks
> are active and in sync.

That is a good point!
If confliciting changes can be detected by this, why does mdadm not use this conflicting information (when parts of an array are claiming each other to be failed) to just report "conflicting changes" and refuse to --add without --force? (You see I am back asking to report and require --force to make it clear to users/admin that it is not only some bug/hickup in the hot-plug mechanism that made it fail, but -add is a manual operation that implies real data-loss in this case, not as in others when it will only sync an older copy instead of a diverged one.)