Comment 57 for bug 557429

Revision history for this message
Phillip Susi (psusi) wrote : Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

On 4/22/2010 5:08 AM, ceg wrote:
> Phillip, before suggesting something I try to think through the issue,
> and the same I try with feedback.
>
> But after several attempts to explain that changing metadata and
> removing the "failed" status (of allready running parts) in the
> superblocks of the conflicting parts that are plugged-in (but not to be
> added to the running array) breaks hot-plugging, I sadly still can't
> recognize any consideration of the bad effects your approach would have
> for many users.

That's because it DOESN'T break hot-plugging. I have explained why.

> And if I think about it, your metadata updates may not have the overall
> effect you may expect. When the modified part is plugged in
> during future boots, it can get run degraded again, the metadata
> is then back to what it was before, and it can again be used normally.
> So the metadata updates just breaks hotplugging and you could not
> explain a case where continous unintentional flip-flopping would occur
> and updating metadata would help.

No, the second disk will not be run degraded again; that is the whole
point of correcting the wrong metadata. If the second disk is the only
one there on the next boot, it will show that disk 2 is failed so it
can't be used, and mdadm can't find disk 1, so the array can not be started.

> Correct, that is unrelated to the metadata problem, I commented on it
> because setting this up has its pitfalls (like UUID dupes and this bug
> requiring --zero-superblock to prevent it from biting) and it would much
> facilitate comparing, copying etc. in a hot-plug environment.

As I said before, it does not require --zero-superblock. Once disk2 is
failed and removed from the array, you can create a new array using that
disk. mdadm will warn you that the disk appears to already be part of
an array, but you can tell it to continue and it will put disk2 in a new
array, with a new uuid, and you can mount it and inspect it. Once you
are done with it you can move it back to the original array and a full
resync will be done.

> It's even simpler once you can see that fixing metadata creates more
> issues than are actually there and updating metadata would really be
> able solve.

I have shown why this is wrong.

> If it happens that both segments get available with
> conflicting changes, one needs to be chosen (first one is already
> there). But if you update the metadata on this occasion (disabling one segment),
> from this moment on the raid system will not keep the
> system running as designed, and like it did before both segments came up
> together once. (You would change/break behavior.)

Yes, and this change is entirely intentional because if you don't do
this, then you can unintentionally continue to further diverge the two
disks without noticing, causing further damage. Imagine a server that
boots and decides it can't find disk2, so it goes degraded. It has a
cron job that fetches email from a pop server and deletes them once they
have been downloaded. The server reboots and this time can only find
disk1. Now the cron job again, fetches and deletes some mail. Now some
of your mail is on disk1, and some is on disk2, and you are running
without redundancy. You reboot and both disks are found. You can't use
both because they have become divergent, so you have to choose one. If
you don't update the metadata on the second disk, then when you reboot
and they happen to be detected in the reverse order, then you flip-flop
back and forth each boot and your mail gets further and further split.

Now let's say that you have a spare disk. When disk2 can't be found,
the spare kicks in and the array is rebuilt using the spare. Now if you
reboot and notice that disk2 is wrong, but don't update its metadata,
then you can run for a while and continue downloading mail to a properly
functioning redundant array. Then you reboot and this time disk2
happens to be detected first. You activate the degraded and more out of
date array and are now running without redundancy and will appear to be
missing more mail.

The flip flopping MUST be avoided if possible.