Comment 3 for bug 1634691

Revision history for this message
Evgeniy L (rustyrobot) wrote :

Ilya,

Upgrade procedure must never use disks preservation feature, it's broken by design, since there are some cases when it may work, and dozens of cases when it breaks the system (with possible data corruption), I'm pretty confused to see why we even started to do it this way for upgrades, why do we even need do full re-provisioning? It's much simpler and much much less error-prone to download new image and write it into OS volume, that is it, no re-partitioning is required.

I'm not sure if description of the ticket is correct:

1. Astute always erases (MBR) of disks when you ask to re-provision the node.

2. The patch you are referring to is irrelevant (see previous item). Re-erasing of MBR multiple times does not make it worse.

3. Looking at the environment it became pretty clear, that the problem was that superblock was erased and partitioning schema didn't match the one, which was previously used, it was pretty obvious by running xfs_repair which failed to restore superblock from secondary blocks, due to differentiation in offset. As result what happened is you asked to re-partition the system with the new schema (new disk sizes, due to new iscsi volume found), even if some of volumes have keep_data flag set, re-partitioning of other volumes (w/o keep_data) pretty easily can break something (e.g. it may easily overlap with some raw partition).

My suggestion would be to implement proper and explicit rollout of new operating system, without dances with setting dozens of flags in dozens of places and praying that it will eventually work, and there are no differences in kernel and fs-related tooling, which can pretty easily break something.