Bug #557429 “array with conflicting changes is assembled with da...” : Bugs : mdadm package : Ubuntu

Jamie Strandboge (jdstrand) on 2010-04-07

Changed in linux (Ubuntu Lucid):
importance:	Undecided → High
description:	updated
summary:	- booting out of sync RAID1 array fails with ext3 (comes up as syncd) + booting out of sync RAID1 array fails with ext3 (comes up as already in + sync)

Ubuntu QA Website (ubuntuqa) on 2010-04-07

tags:

added: iso-testing

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07: Re: booting out of sync RAID1 array fails with ext3 (comes up as already in sync)

#1

superblocks.txt Edit (1.6 KiB, text/plain)

Booted into a live cd, installed mdadm and then grabbed the superblocks with:
$ sudo mdadm -E /dev/vda1
$ sudo mdadm -E /dev/vdb1

Jamie Strandboge (jdstrand) on 2010-04-07

description:

updated

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07:

#2

disk1_superblock_after_boot_degraded.txt Edit (770 bytes, text/plain)

The last was the contents of the superblocks after I connected both disks. Here is the content of the superblock for disk1 after booting degraded with disk2 removed and after shutting down (obtained via live cd).

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07:

#3

disk2_superblock_after_boot_degraded.txt Edit (763 bytes, text/plain)

And here is the content of the superblock for disk2 after booting degraded with disk2 reconnected and disk1 removed and after shutting down (obtained via live cd (note it shows up as vda, not vdb since disk1 is removed)).

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07:

#4

disk1_and_disk2_before_activate.txt Edit (1.5 KiB, text/plain)

Superblocks with both disks attached, but before activate.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07:

#5

disk1_and_disk2_after_autodetect.txt Edit (1.7 KiB, text/plain)

Superblocks and /proc/mdstat after both disks are attached and 'sudo mdadm --auto-detect'.

Revision history for this message

Jamie Strandboge (jdstrand) wrote on 2010-04-07:

#6

From irc:
13:18 < psusi> jdstrand: when you boot with one disk, you get the warning abut
               being degraded and are given 15 seconds to abort activating
               degraded or not, right?
13:19 < jdstrand> psusi: I don't see a warning cause of plymouth, but there is
                  a pause yes
13:23 < psusi> jdstrand: can you boot with nosplash and noquiet boot options to
               disable that? after plugging both disks back in, the udev
               script tries to do an incremental build when it detects each
               disk. That should fail for both disks, then eventually after a
               timeout, the fallback script should try to do the degraded
               activate... at that point only one disk should be activated and
               the other ignored
13:31 < jdstrand> psusi: I didn't get to grub in time, but after a long pause
                  it flashed a screen at me very clearly stating I am booting
                  in degraded mode (each time with disk1 and disk2 removed)
13:34 < psusi> jdstrand: did you still get that timeout and message about
               degraded when you reconnect the second disk? or does it just
               plod along happily like nothing is wrong at all?
13:34 < psusi> until the fsck fails of course
13:34 < jdstrand> I don't think I got the timeout, let me check
13:36 < jdstrand> psusi: no pause. straight to file system errors

description:

updated

Jamie Strandboge (jdstrand) on 2010-04-07

description:

updated

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-07:

#7

I have reproduced this on Karmic by manually assembling, stopping, and reassembling the array based on two lvm volumes. When mdadm --incremental is run on the first degraded leg of the mirror, it activates it since it now has one out of one disk with the second disk flagged as faulty, removed. You would think that the second disk would show the first as faulty,removed as well, but it only shows it as removed. When mdadm --incremental is run on the second disk, it happily starts using it without a resync. I believe this should fail and refuse to use the second disk until you manually re-add it to the array, causing a full resync. I have mailed the linux-raid mailing list about this.

Changed in linux (Ubuntu Lucid):
status:	New → Confirmed

Revision history for this message

ceg (ceg) wrote on 2010-04-08:

#8

As it's not caused by kernel raid autodetection I guess this probably belongs to package mdadm.

Have you tested if the 9.10-10.04 update works for raid systems this time?

You can see quite some raid bugs filed and also https://wiki.ubuntu.com/ReliableRaid

affects:

linux (Ubuntu Lucid) → mdadm (Ubuntu Lucid)

Revision history for this message

ceg (ceg) wrote on 2010-04-08:

#9

Note that initramfs actually also wrongly executes "mdadm --assemble --scan --run" if it finds any arrays degraded.

Bug #497186 initramfs' init-premount degrades *all* arrays (not just those required to boot)

Revision history for this message

ceg (ceg) wrote on 2010-04-08:

#10

> I believe this should fail and refuse to use the second disk until you manually re-add it to the array, causing a full resync.

Yes, if there is a way for mdadm to determine if members are out of sync it should fail on conflicting updates that occured on separated parts of the array (as is the case here).

It should not fail if a usable remaining part of an array has been updated and the removed disk is plugged in again unchanged (hotplug readding of a raid member that is used as a backup).

It should never sync depending on device order (since that is rather random in hotplug systems anyway).

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-09:

#11

Activating the degraded array is done only if the root fs is not found, and only if the mdadm package was configured via debconf to do so. There is nothing wrong with this per se, the problem is that the second disk is automatically added back into the array by mdadm --increment. Once the disk has been marked as removed from the array, it should require manual intervention to put it back.

Revision history for this message

ceg (ceg) wrote on 2010-04-09:

#12

Looking under "bugs" where this bug has been filed (/ubuntu/lucid/) does not turn up a serious bug besides this one, but mdadm not only in 10.04 actually has some: https://bugs.launchpad.net/ubuntu/+source/mdadm

Revision history for this message

ceg (ceg) wrote on 2010-04-09:

#13

> Activating the degraded array is done only if the root fs is not found,

Right, its only in a failure hook, and things like cryptsetup won't be run after that...
The initramfs boot mechanism is just not designed with the right event driven approach yet. Bug #488317

> and only if the mdadm package was configured via debconf to do so.

Not quite right. That debconf question was a rather bogus, unhelpfull an unnecessary implementation. Bug #539597

> There is nothing wrong with this per se,

It is wrong to to use "mdadm --assemble --scan --run", because it will start *all* arrays that have not come up yet in initramfs stage. (They get desynced and need to be resynced) Bug #497186

Revision history for this message

ceg (ceg) wrote on 2010-04-09:

#14

> the problem is that the second disk is automatically added back into the array by mdadm --increment. Once the disk has been marked as removed from the array, it should require manual intervention to put it back.

In the case at hand mdadm should not only refuse addition due to it being "removed". Even if you add the disks manually mdadm should not just sync the disk slower to appear to the first one, because the parts are inconsistent!

I think a nice solution to detect this (counter+random) may have been posted to the linux-raid list.

The data corruption comes from the inconsistent parts (conflicting changes) that should require conscious user intervention or maybe configuration to decide about the sync direction.

Not auto re-adding manually removed raid_members, is a usability decision, that could probably made configurable but I see unrelated to the data corruption.

Leann Ogasawara (leannogasawara) on 2010-04-09

Changed in mdadm (Ubuntu Lucid):
assignee:	nobody → Canonical Kernel Team (canonical-kernel-team)
milestone:	none → ubuntu-10.04
status:	Confirmed → Triaged

Leann Ogasawara (leannogasawara) on 2010-04-09

Changed in mdadm (Ubuntu Lucid):
assignee:	Canonical Kernel Team (canonical-kernel-team) → nobody

Revision history for this message

Leann Ogasawara (leannogasawara) wrote on 2010-04-09:

#15

Just wanted to post the linux-raid mailing list thread for reference here:

http://marc.info/?l=linux-raid&m=127067374402401&w=2

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-09: Re: [Bug 557429] Re: booting out of sync RAID1 array fails with ext3 (comes up as already in sync)

#16

On 4/9/2010 9:58 AM, ceg wrote:
> In the case at hand mdadm should not only refuse addition due to it
> being "removed". Even if you add the disks manually mdadm should not
> just sync the disk slower to appear to the first one, because the parts
> are inconsistent!

This statement does not make sense. Of course they are inconsistent;
that is why you have to sync them, which will make them consistent.

> I think a nice solution to detect this (counter+random) may have been
> posted to the linux-raid list.

I believe that is overkill and adding a new feature. The bug as I see
it, is that --incremental activates the disk instead of refusing to
because it is marked as removed. Fixing that would solve this problem.

> The data corruption comes from the inconsistent parts (conflicting
> changes) that should require conscious user intervention or maybe
> configuration to decide about the sync direction.

Which they could do after --incremental refuses to use the removed disk.
The admin could look at the removed disk and salvage any data from it
he wishes to, then manually add it back to the array, causing a full resync.

> Not auto re-adding manually removed raid_members, is a usability
> decision, that could probably made configurable but I see unrelated to
> the data corruption.

It isn't an an option that could be configured; it is the definition of
the word "removed". If I remove the disk from the array, then it is no
longer part of the array and should not automatically be sucked back
into it.

Revision history for this message

Jan Nekvasil (jan-nekvasil) wrote on 2010-04-10: Re: booting out of sync RAID1 array fails with ext3 (comes up as already in sync)

#17

Do I understand it properly that this bug does _not_ affect RAID1 with ext4 filesystem in any case? I know that it's pretty obvious from original description, but I want to be sure before upgrade. Thanks in advance.

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-11:

#18

That seems to be mere happenstance. Using ext3 vs ext4 likely just slightly alters the exact IO pattern to cause a different number of md events. As long as the md event counter is not the same then adding the second modified disk back in causes a resync, destroying the changes specific to the second disk and going with the changes on the first detected disk.

Revision history for this message

Dirk (dirk2) wrote on 2010-04-11:

#19

Does anybody know the reason why ubuntu still uses mdadm 2.6.7 which is about 2 years old now?
Maybe this problem is solved in a newer mdadm-release...

Pip

Revision history for this message

ceg (ceg) wrote on 2010-04-14:

#20

>> mdadm should not just sync the disk slower to appear to the first one, because the parts
>> are inconsistent!
>
>This statement does not make sense.

Oh, right, yes. To put it better: They should not be synced if they "contain conflicting changes"

If I read the case originally reported, the drives actually weren't manually --removed, just disconnected during power-down. If mdadm starts to distinguish between missing/removed, the disk missing at boot time will probably still just be marked missing, same if it has actually failed, I guess.

But more generally: By forcing to manually re-add removed disks, while mdadm is not refusing to sync conflicting parts of an array, we only make re-adding the data-loss inducing action. Manually re-adding and syncing *might* (I am not so sure) assemble/resync a consistent array, but will lead to discard one part of the conflicting changes in the array (data-loss).

(Though, from what was written it does sound valid to me now, not to auto-readd "removed" disks, if "missing" disks are auto-readded.)

To prevent data-corruption, I think mdadm is required to detect conflicting changes. No matter if disks re-added automatically (like in having a auto synced back-up in the docking station/external disk) or re-added manually.

summary:

- booting out of sync RAID1 array fails with ext3 (comes up as already in
- sync)
+ booting out of sync RAID1 array comes up as already in sync (data-
+ corruption)

ceg (ceg) on 2010-04-14

description:

updated

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-14: Re: [Bug 557429] Re: booting out of sync RAID1 array comes up as already in sync (data-corruption)

#21

On 4/14/2010 9:19 AM, ceg wrote:
> But more generally: By forcing to manually re-add removed disks, while
> mdadm is not refusing to sync conflicting parts of an array, we only
> make re-adding the data-loss inducing action. Manually re-adding and
> syncing *might* (I am not so sure) assemble/resync a consistent array,
> but will lead to discard one part of the conflicting changes in the
> array (data-loss).

Correct, if both disks have been changed then you can not combine them
without discarding one change or the other. The admin would have to
decide if there were important changes on the other disk and recover
them before adding it back to the array.

> To prevent data-corruption, I think mdadm is required to detect
> conflicting changes. No matter if disks re-added automatically (like
> in having a auto synced back-up in the docking station/external disk)
> or re-added manually.

Why do you think that, and what exactly would that entail?

As long as the admin manually inserts the disk back into the array, he
KNOWS that any changes specific to that disk will be destroyed, so I
don't see a problem.

> Expected results: At this point it should boot degraded with
> /proc/mdstat showing it is syncing (recovering). This is how it works
> with ext4. Note that in the past one would have to 'sudo mdadm -a
> /dev/md0 /dev/MISSING-DEVICE' before syncing would occur. This no
> longer seems to be required.

This is not correct. You seem to be agreeing with me that automatically
adding the disk back and resyncing causes data loss, thus this should be
avoided. Instead you should have to manually add the disk back. You
say this is how it used to work? When? It doesn't seem to work that
way on Karmic. If it used to work that way, then the fact that it no
longer does is the regression that needs fixed.

ceg (ceg) on 2010-04-14

summary:

- booting out of sync RAID1 array comes up as already in sync (data-
- corruption)
+ array with conflicting changes is assembled with data corruption/silent
+ loss

Revision history for this message

ceg (ceg) wrote on 2010-04-14:

#22

Though I can sure understand it would be easier if we could just dismiss this to be taken care of by users, data-loss/corruption will allways come back heavy on ubuntu/mdadm.

With ubunu systems in particular, we can not assume there will always be an admin available. And if there is an admin, and he allways has to re-add removed members manually, how does he notice if a user made conflicting changes?

I am not sure if we are considering the valid use case of auto re-adding members enough here, yet. (Even if auto-adding just "missing" and not "removed" members.) I.e. the case of docking-stations / external backup drives.

> You seem to be agreeing with me that automatically
>adding the disk back and resyncing causes data loss, thus this should be
>avoided.

We need to avoid and warn about data-loss, no matter if manually or automatically.
Re-adding needs to be safe operation. If concurrent changes where made syncing has to be refused, if --force is not used.

> you should have to manually add the disk back. You
>say this is how it used to work? When? It doesn't seem to work that
>way on Karmic. If it used to work that way, then the fact that it no
>longer does is the regression that needs fixed.

Creating a fully hot-pluggable system is a major feature of ubuntu.

Revision history for this message

Phillip Susi (psusi) wrote on 2010-04-14: Re: [Bug 557429] Re: array with conflicting changes is assembled with data corruption/silent loss

#23

On 4/14/2010 11:58 AM, ceg wrote:
> Though I can sure understand it would be easier if we could just
> dismiss this to be taken care of by users, data-loss/corruption will
> allways come back heavy on ubuntu/mdadm.

Not necessarily. Data loss because of automatic hardware detection and
activation is a problem certainly, but data loss because the user ran rm
-rf / is not.

> With ubunu systems in particular, we can not assume there will always
> be an admin available. And if there is an admin, and he allways has
> to re- add removed members manually, how does he notice if a user
> made conflicting changes?

He will notice when he sees that the array is degraded and refusing to
use one of the disks.

> I am not sure if we are considering the valid use case of auto
> re-adding members enough here, yet. (Even if auto-adding just
> "missing" and not "removed" members.) I.e. the case of
> docking-stations / external backup drives.

I'm not quite sure what you mean here. A device that is removed should
never be automatically added when detected.

> We need to avoid and warn about data-loss, no matter if manually or
> automatically. Re-adding needs to be safe operation. If concurrent
> changes where made syncing has to be refused, if --force is not
> used.

I'm not sure why --force should be required. When you add a disk to the
array, you always destroy whatever data is on that disk. It goes
without saying.

>> you should have to manually add the disk back. You say this is how
>> it used to work? When? It doesn't seem to work that way on Karmic.
>> If it used to work that way, then the fact that it no longer does
>> is the regression that needs fixed.
>
> Creating a fully hot-pluggable system is a major feature of ubuntu.

Ok... how does that alter the fact that we should not be automatically
adding devices to arrays that have been explicitly removed?

Revision history for this message

ceg (ceg) wrote on 2010-04-14:

#24

> Ok... how does that alter the fact that we should not be automatically
> adding devices to arrays that have been explicitly removed?

Not at all, we agree that explicitly --remove(ing) a device is a good way to tell mdadm --incremental (its hotplug control mechanism) not to re-add automatically.

Personally I could even agree that it might be OK for "mdadm --add" not to require --force, but you don't seem to agree that "mdadm --incremental" really needs to be able to auto-re-add (not manually removed but missing) devices, in a safe manner.

>> be an admin available. And if there is an admin, and he allways has
>> to re- add removed members manually, how does he notice if a user
>> made conflicting changes?
>
> He will notice when he sees that the array is degraded and refusing to
> use one of the disks.

If I read your proposal correctly, running an array degraded would always also "remove" the missing disk.

This would imply to
* break all the auto-re-add later feature of mdadm --incremental (it also sports auto-read-only-until-write), even though it is perfectly safe in the majority of cases (no conflicts).
* force users/admins to *allways* re-add manually after an array is running degraded (this is not supporting hot-plugging, rather the contrary)
* make the perfectly safe re-addition of an outdated member device ( i.e. older backup) look indistinguishable from re-adding a member with conflicting changes (with data-loss!). The admin (*allways* forced to --add manually) can not notice when the operation will cause data loss.

>> I am not sure if we are considering the valid use case of auto
>> re-adding members enough here, yet. (Even if auto-adding just
>> "missing" and not "removed" members.) I.e. the case of
>> docking-stations / external backup drives.
>
> I'm not quite sure what you mean here. A device that is removed should
> never be automatically added when detected.

Please check https://wiki.ubuntu.com/HotplugRaid for example, and understand the need of a hot-plugging scheme that supports safe auto-re-adding.
If you manually --remove a member it should not get auto-re-added. If a member is only missing for a while, yes the array should keep running as well as be run degraded upon boot (as long as no conflicting changes were made).