Error in /etc/lvm.conf resulted in reinitialization of already-prepared LVM PVs

Bug #2025297 reported by Paul Goins
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Charm Cinder LVM
Triaged
High
Unassigned
Charm Helpers
New
Undecided
Unassigned

Bug Description

During a recent maintenance window, an attempted workaround to https://bugs.launchpad.net/cinder/+bug/2023073 was put into place, involving an edit of /etc/lvm/lvm.conf to set "wipe_signatures_when_zeroing_new_lvs = 0".

Unfortunately, at one point the file was rewritten incorrectly, setting that line to instead read "wipe_signatures_when_zeroing_new_lvs = ", i.e. no value for the key.

This unfortunately caused certain charmhelpers commands to return incorrect values regarding whether disks were initialized for LVM or not, resulting in the charm re-initializing the disks.

I'm attaching an excerpt from Juju unit logs for cinder-lvm which shows the problem, but giving a summary of what happened:

* reactive/layer_openstack.py:101:run_storage_backend is entered, and its code ends up calling CinderLVMCharm.cinder_configuration(), which in turn calls configure_block_devices().
* configure_block_devices() calls configure_lvm_storage().
* In configure_lvm_storage(), is_lvm_physical_volume(device) returns False, has_partition_table(device) also returns False, and thus prepare_volume(device) gets called.
* prepare_volume(device) calls clean_storage(device), which in turn calls zap_disk(device).

See the attachment for a detailed traceback.

Re: the has_partition_table() check: I confirmed that the configuration value "overwrite" is set to False, and the configured block-devices all lacked MBR/GPT partition tables so the has_partition_table() check wouldn't have blocked this.

This leaves the is_lvm_physical_volume() check as being the only protection applicable in this case against accidental re-initialization. And in this particular charm version, that is implemented in the charmhelpers code as follows:

    try:
        check_output(['pvdisplay', block_device])
        return True
    except CalledProcessError:
        return False

Basically, anything which would cause the above command to fail - like, perhaps, a misconfigured /etc/lvm.conf - may cause this check to falsely return that something is *not* an LVM volume, resulting in it getting re-initialized.

In summary: I believe this is a critical bug in charmhelpers which also critically impacts the cinder-lvm charm with the risk of blowing away data on configured LVM devices in the case of a misconfigured /etc/lvm.conf.

Revision history for this message
Paul Goins (vultaire) wrote :

Adding ~field-high. This feels pretty critical, and while I know it was caused by an unmanaged action, the fact that an unmanaged edit like that - or really, any "bugged" edit to /etc/lvm/lvm.conf - could cause a wiping of an in-use disk as a side effect of the above-described code path seems to warrant the attention.

description: updated
Revision history for this message
Facundo Ciccioli (fandanbango) wrote :

I'd add that some time ago we hit a very related issue to this. In essence I believe this bug should be about the fact that the charm is willing to zap disks all too carelessly. Such an action should have some safeguards.

What happened to us, tl/dr, was that we added a new device to the block-device config property. The new device was still being used as a bcache, and since the charm doesn't look for bcaches before preparing new devices it zapped it:

        if not is_lvm_physical_volume(device):
            # Unused device
            if overwrite is True or not has_partition_table(device):
                prepare_volume(device)
                new_devices.append(device)
        elif list_lvm_volume_group(device) != volume_group:
            # Existing LVM but not part of required VG or new device
            if overwrite is True:
                prepare_volume(device)
                new_devices.append(device)

The bcache device is not an LVM PV and has no partition table. So BAM, got prepared (ultimately zapped).

Revision history for this message
Paul Goins (vultaire) wrote :

The traceback attachment seems to have gotten messed up; I will re-upload it.

Revision history for this message
Paul Goins (vultaire) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote (last edit ):

It looks like we have 2 scenarios mentioned in this bug report:

1) Paul's case where it looks like we could add some exception logic around the pvdisplay call. This needs some testing as I'm not sure how the pvdisplay fails when lvm.conf is mis-configured.

2) Facundo's case where the request is to check for bcache as well. I see the argument here, I'm just not sure where we end with the checks as this could lead to requiring a long list of filesystems to check for as well?

I think this bug should focus on the original issue which Paul reported. If we need another bug for the 2nd case, let's open that seperately.

Changed in charm-cinder-lvm:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Billy Olsen (billy-olsen) wrote :

While this does have some rather unfortunate side effects and the code is lacking in proper error handling, this is not something I consider eligible for Field High. This is ultimately caused outside the bounds of what the code was tested and designed to handle. It is something that is important and should be fixed as data loss scenarios are bad, but it does not qualify for the field-high and I am removing that subscription.

This should still be treated as an important issue to be fixed.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.