Comment 46 for bug 557429

Revision history for this message
ceg (ceg) wrote : Re: [Bug 557429] --incremental should not auto-remove arbitrary segments with conflicting changes

I'm also fine with this postponed for after release, segmenting a
raid into concurrent hot-pluggable parts is a case, without correct
support now.

> hot-plugging order much more
> > arbitrary, and even less worth of committing to the meta-data.)
>
> If I plug in one disk and make some changes, then unplug it,
> plug in the other disk, and make some changes to it,

What would be your use-case?

> in the future I
> don't want which set of changes appears to depend on which disk I
> plug in first.

In most cases the next thing one would probably want
after conflicting changes are present in a system is to sync, in an
easy way. (Not to keep rebooting or reattaching much. Reattaching is
just a simple way to determine the order.)

As your case does not sound like a hot-plug use-case. Probably handle
that with --remove?

> As soon as both disks are plugged in and the
> conflicting changes are detected, you must record that in the
> metadata.

No, you must prevent data-corruption or loss. But don't do things like
--remove(ing) parts or fixing ordering in a hotplug environment
(and mdadm --incremental is just for that), because it would break
further management of the raid devices in a hot-plugging manner.

> > It may be good however
> > if mdadm would assemble any conflicting parts as extra devices
> > (normally md128 and up) so the parts are accessible for inspection,
> > can be compared, manually merged etc.
>
> No need to do that automatically, this is where manual intervention
> comes in.

Note that mdadm --incremental already does that for "unknown" arrays
(not defined or allowed by AUTO in mdadm.conf), it's not a new feature.

But your comments are a little irritating. We are actually talking
hot-plugging here, right? Plus ubuntu's no config, no intervention
necessary approach. Everything should just work.

> Once mdadm has rejected one of the disks and the admin
> notices, he can easily ask mdadm to move it to another array by itself
> to be mounted, inspected, merged, etc.

Are you actually aware what that means? I am not saying it is not
possible to create a new array from parts of an existing array without
loosing the data, but is sure isn't a trivial mdadm command. And then
you are really breaking up the array and won't be able to just sync the
other parts and still have the same (UUID) array.

>
> > Updating the metadata would prevent working with and switching
> > between concurrent versions in a hot-plugging manner. Think of the
> > use-case of segmenting a (non-root fs) data-array into two halves in
> > order to do some major refactoring. (This is like keeping a snapshot
> > by using only part of the mirror.)
>
> If that is the intent, then the user needs to manually remove one disk
> from the array and set it aside or add it to a separate array if they
> wish. If we /accidentally/ fork the array, we need to set the
> conflicting array aside and notify the user that they need to sort the
> situation out manually.

Yes, yes and yes again, this needs to be done in *any* case of
conflicting changes. If mdadm --incremental (the mdadm hotplug manager)
sets up the confliciting parts on separate md devices they will both
even appear on the desktop.

> We avoid making the situation any worse than
> it already is by updating the metadata.

No, it really makes things worse! It prevents the user/admin from
managing arrays (parts in this case) by simply plugging disks.

And what would be the gain of auto-removing writing metadate? If the
disks are connected during boot the disks will almost always stay in
the same order anyway, eliminating the gain to save that order
to metadata. If you want a specific order from the start, you need
to manually issue mdadm commands anyway. But now also if you need
another order than what was written to metadata. And all that mdadm
commands need to be issued in between an active hot-plugging
system (interference/no map file updating), instead of just re-plugging
your disks in order.

It's especially worse if the order in the metadata written does not
conform with the sync direction you want and you are required to
--zero-superblock, setup a new array making sure not to loose
the data from the arbitrary --removed part etc. Because after removing
the raid superblock blkid will report the partion to contain the
filesystem with the UUID that the md device is containing. And this
can cause an unsync that is not preventable by mdadm anymore, when the
fs on the partition gets mounted instead of the one on the right md
device!

Working with degraded arrays is not uncommon. The standard and
documented procedure to convert a non-raid system into a raid system is
to copy&modify the system into degraded arrays first and to sync
afterward as desired if everything went well.

And a nice and analog way to dist-upgrade systems while still being able
to quickly revert back is to detach a mirror disk from the system arrays
(as a backup/snapshot) prior to doing the dist-upgrade. If you want to
revert, then you just need to boot with only the previously detached
disk attached and plug and sync the other (already --run degraded)
drive later.

Summary to support safe hot-pluggable segmentation of arrays:
(arrays are only --run degraded manually or if required and incomplete
during boot)

* --incremental should stop auto re-adding "removed" members (so
  that --remove provides a manual means turn hot-plugging off)
* When arrays are --run degraded missing members should be marked
  "failed" but not "removed".
* Always check for conflicting "failed" states in
  superblocks, to detect conflicting changes.
   + always report (console and --monitor event) if conflicting changes
     are detected
   + require --force with --add for a manual re-sync of conflicting
     changes (unlike with resyncing an outdated device, in this case
     changes will get lost)
* To facilitate inspection --incremental should assemble array
  components with conflicting changes into auxiliary devices with
  mangled UUIDs (safe and easy diffing, merging, etc. even on desktop
  level)