Charm Helpers

Error in /etc/lvm.conf resulted in reinitialization of already-prepared LVM PVs

Bug #2025297 reported by Paul Goins on 2023-06-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Charm Cinder LVM	Triaged	High	Unassigned
	Charm Helpers	New	Undecided	Unassigned

Bug Description

During a recent maintenance window, an attempted workaround to https://bugs.launchpad.net/cinder/+bug/2023073 was put into place, involving an edit of /etc/lvm/lvm.conf to set "wipe_signatures_when_zeroing_new_lvs = 0".

Unfortunately, at one point the file was rewritten incorrectly, setting that line to instead read "wipe_signatures_when_zeroing_new_lvs = ", i.e. no value for the key.

This unfortunately caused certain charmhelpers commands to return incorrect values regarding whether disks were initialized for LVM or not, resulting in the charm re-initializing the disks.

I'm attaching an excerpt from Juju unit logs for cinder-lvm which shows the problem, but giving a summary of what happened:

* reactive/layer_openstack.py:101:run_storage_backend is entered, and its code ends up calling CinderLVMCharm.cinder_configuration(), which in turn calls configure_block_devices().
* configure_block_devices() calls configure_lvm_storage().
* In configure_lvm_storage(), is_lvm_physical_volume(device) returns False, has_partition_table(device) also returns False, and thus prepare_volume(device) gets called.
* prepare_volume(device) calls clean_storage(device), which in turn calls zap_disk(device).

See the attachment for a detailed traceback.

Re: the has_partition_table() check: I confirmed that the configuration value "overwrite" is set to False, and the configured block-devices all lacked MBR/GPT partition tables so the has_partition_table() check wouldn't have blocked this.

This leaves the is_lvm_physical_volume() check as being the only protection applicable in this case against accidental re-initialization. And in this particular charm version, that is implemented in the charmhelpers code as follows:

    try:
        check_output(['pvdisplay', block_device])
        return True
    except CalledProcessError:
        return False

Basically, anything which would cause the above command to fail - like, perhaps, a misconfigured /etc/lvm.conf - may cause this check to falsely return that something is *not* an LVM volume, resulting in it getting re-initialized.

In summary: I believe this is a critical bug in charmhelpers which also critically impacts the cinder-lvm charm with the risk of blowing away data on configured LVM devices in the case of a misconfigured /etc/lvm.conf.

See original description

Revision history for this message

Paul Goins (vultaire) wrote on 2023-06-29:

Adding ~field-high. This feels pretty critical, and while I know it was caused by an unmanaged action, the fact that an unmanaged edit like that - or really, any "bugged" edit to /etc/lvm/lvm.conf - could cause a wiping of an in-use disk as a side effect of the above-described code path seems to warrant the attention.

description:

updated

Revision history for this message

Facundo Ciccioli (fandanbango) wrote on 2023-06-29:

I'd add that some time ago we hit a very related issue to this. In essence I believe this bug should be about the fact that the charm is willing to zap disks all too carelessly. Such an action should have some safeguards.

What happened to us, tl/dr, was that we added a new device to the block-device config property. The new device was still being used as a bcache, and since the charm doesn't look for bcaches before preparing new devices it zapped it:

        if not is_lvm_physical_volume(device):
            # Unused device
            if overwrite is True or not has_partition_table(device):
                prepare_volume(device)
                new_devices.append(device)
        elif list_lvm_volume_group(device) != volume_group:
            # Existing LVM but not part of required VG or new device
            if overwrite is True:
                prepare_volume(device)
                new_devices.append(device)

The bcache device is not an LVM PV and has no partition table. So BAM, got prepared (ultimately zapped).

Revision history for this message

Paul Goins (vultaire) wrote on 2023-06-29:

The traceback attachment seems to have gotten messed up; I will re-upload it.

Revision history for this message

Paul Goins (vultaire) wrote on 2023-06-29:

traceback.txt Edit (5.8 KiB, text/plain)

Revision history for this message

Corey Bryant (corey.bryant) wrote on 2023-06-30 (last edit on 2023-06-30):

It looks like we have 2 scenarios mentioned in this bug report:

1) Paul's case where it looks like we could add some exception logic around the pvdisplay call. This needs some testing as I'm not sure how the pvdisplay fails when lvm.conf is mis-configured.

2) Facundo's case where the request is to check for bcache as well. I see the argument here, I'm just not sure where we end with the checks as this could lead to requiring a long list of filesystems to check for as well?

I think this bug should focus on the original issue which Paul reported. If we need another bug for the 2nd case, let's open that seperately.

Changed in charm-cinder-lvm:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Billy Olsen (billy-olsen) wrote on 2023-06-30:

While this does have some rather unfortunate side effects and the code is lacking in proper error handling, this is not something I consider eligible for Field High. This is ultimately caused outside the bounds of what the code was tested and designed to handle. It is something that is important and should be fixed as data loss scenarios are bad, but it does not qualify for the field-high and I am removing that subscription.

This should still be treated as an important issue to be fixed.

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Bug attachments

traceback.txt Edit

Add attachment

Remote bug watches

Bug watches keep track of this bug in other bug trackers.