lvremove occasionally fails on nodes with multiple volumes and curtin does not catch the failure

Bug #1871874 reported by Nick Niehoff
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
curtin (Ubuntu)
Expired
Undecided
Unassigned
linux (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

For example:

Wiping lvm logical volume: /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi
wiping 1M on /dev/ceph-db-wal-dev-sdc/ceph-db-dev-sdi at offsets [0, -1048576]
using "lvremove" on ceph-db-wal-dev-sdc/ceph-db-dev-sdi
Running command ['lvremove', '--force', '--force', 'ceph-db-wal-dev-sdc/ceph-db-dev-sdi'] with allowed return codes [0] (capture=False)
device-mapper: remove ioctl on (253:14) failed: Device or resource busy
Logical volume "ceph-db-dev-sdi" successfully removed

On a node with 10 disks configured as follows:

/dev/sda2 /
/dev/sda1 /boot
/dev/sda3 /var/log
/dev/sda5 /var/crash
/dev/sda6 /var/lib/openstack-helm
/dev/sda7 /var
/dev/sdj1 /srv

sdb and sdc are used for BlueStore WAL and DB
sdd, sde, sdf: ceph OSDs, using sdb
sdg, sdh, sdi: ceph OSDs, using sdc

across multiple servers this happens occasionally with various disks. It looks like this maybe a race condition maybe in lvm as curtin is wiping multiple volumes before lvm fails

Tags: sts
Revision history for this message
Eric Desrochers (slashd) wrote :

Curtin is currently using a two force:
$ lvremove --force --force vg_lv_name

as indicating here:
https://github.com/CanonicalLtd/curtin/blob/14c0560ed4482cb3b514fbec8d89118bd775652f/curtin/block/clear_holders.py#L136-L138

# LVREMOVE(8) #

Confirmation will be requested before deactivating any active LV prior to removal. LVs cannot be deactivated or removed while they are open (e.g. if they contain a mounted filesystem). Re‐moving an origin LV will also remove all dependent snapshots.

When a single force option is used, LVs are removed without confirmation, and the command will try to deactivate unused LVs.

To remove damaged LVs, two force options may be required (-ff).

-f|--force ...
Override various checks, confirmations and protections. Use with extreme caution.

tags: added: sts
Revision history for this message
Eric Desrochers (slashd) wrote :

# LVM(8)

DIAGNOSTICS
       All tools return a status code of zero on success or non-zero on failure. The non-zero codes distinguish only between the broad categories of unrecognised commands, prob‐
       lems processing the command line arguments and any other failures. As LVM remains under active development, the code used in a specific case occasionally changes between
       releases. Message text may also change.

# lvm src code #

tools/errors.h
#define ECMD_PROCESSED 1
#define ENO_SUCH_CMD 2
#define EINVALID_CMD_LINE 3
#define EINIT_FAILED 4
#define ECMD_FAILED 5

Seems like there is a possibilies of 6 different return codes in LVM as show above.

summary: - lvmremove occasionally fails on nodes with multiple volumes and curtin
+ lvremove occasionally fails on nodes with multiple volumes and curtin
does not catch the failure
Revision history for this message
Eric Desrochers (slashd) wrote :

The above was for focal ^

In Xenial:

#define ECMD_PROCESSED 1
#define ENO_SUCH_CMD 2
#define EINVALID_CMD_LINE 3
#define ECMD_FAILED 5

In Bionic
#define ECMD_PROCESSED 1
#define ENO_SUCH_CMD 2
#define EINVALID_CMD_LINE 3
#define EINIT_FAILED 4
#define ECMD_FAILED 5

Revision history for this message
Nick Niehoff (nniehoff) wrote :

I was able to reproduce this with a VM deployed by MAAS. I created a VM and added 26 disks to in using virsh (NOTE: I use zfs volumes for my disks)

for i in {a..z}; do sudo zfs create -s -V 30G rpool/libvirt/maas-node-20$i; done
for i in {a..z}; do virsh attach-disk maas-node-20 /dev/zvol/rpool/libvirt/maas-node-20$i sd$i --current --cache none --io native; done

Then in maas:

commission the machine to recognize all of the disks

machine_id=123abc
for i in {b..z}; do device_id=$(maas admin machine read $machine_id | jq ".blockdevice_set[] | select(.name == \"sd$i\") | .id"); vgid=$(maas admin volume-groups create $machine_id name=vg$i block_devices=$device_id | jq '.id'); maas admin volume-group create-logical-volume $machine_id $vgid name=sd${i}lv size=32208060416; done

You may need to change the size in the previous command. I then deployed the system 2 times with Bionic, with xenial as the commissioning OS. The second time I saw the "failed: Device or resource busy" errors. I am using MAAS 2.7.

This reproduces easily with Xenial as the commissioning OS.
This does not reproduce using Xenial with the hwe kernel as the commissioning OS.
I can not reproduce this using Bionic as the commissioning OS.

Revision history for this message
Ryan Harper (raharper) wrote :

During a clear-holders operation we do not need to catch any failure; we're attempting to destroy the devices in question. The destruction of a device is explicitly requested in the config via a wipe: value[1] present on one or more devices that are members of the LV.

1. https://curtin.readthedocs.io/en/latest/topics/storage.html#disk-command

Can you provide some more context as to what you think is wrong and what curtin should do instead?

Changed in curtin (Ubuntu):
status: New → Incomplete
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1871874

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Nick Niehoff (nniehoff) wrote :

Ryan,
   We believe this is a bug as we expect curtin to wipe the disks. In this case it's failing to wipe the disks and occasionally that causes issues with our automation deploying ceph on those disks. This may be more of an issue with LVM and a race condition trying to wipe all of the disks sequentially simply with the large number of disks/vgs/lvs.

   To clarify from my previous testing, I was mistaken, I thought MAAS used the commissioning OS as the ephemeral OS from which to deploy from, this is not the case. MAAS uses the specified deployment OS as the ephemeral image to deploy from. Based on this all of my previous testing was done with Bionic using the 4.15 kernel. This proves it is a race condition somewhere as sometimes this error does not reproduce and it was just a coincidence that I was changing the commissioning OS.

   I have tested this morning and have been able to reproduce the issue with bionic 4.15 and xenial 4.4 however I have yet to reproduce it using either bionic or xenial hwe kernels.

   I will upload the curtin logs and config from my reproducer now.

Revision history for this message
Nick Niehoff (nniehoff) wrote :
Revision history for this message
Nick Niehoff (nniehoff) wrote :
Revision history for this message
Ryan Harper (raharper) wrote :

>
> Ryan,
> We believe this is a bug as we expect curtin to wipe the disks. In this
> case it's failing to wipe the disks and occasionally that causes issues
> with our automation deploying ceph on those disks.

I'm still confused about what the actual error you believe is happening.
Note that lvremove is not a fatal error from curtin's perspective because
we will be destroying data on the underlying physical disk or partition.

Looking at your debug info:

1) your curtin-install.log does not show any failures of lvmremove command

2) if the curtin-install-cfg.yaml is correct, then you've marked

  wipe: superblock

on all of the devices on top of which you build logical volumes. With this
setting curtin wipes the logical volume *and* the underlying device.

Even if the writes to the lv fail, or if lvremove fails, as long as the
underying disk/partition succeed then the LVM metadata and partition table
on the disk will be cleared redering the content unusable.

Look at sda1 which olds the lv_root LV:

shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-24'
Running command ['dmsetup', 'splitname', 'vgroot-lvroot', '-c', '--noheadings', '--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)

# here we start wiping the logical device by writing 1M of zeros at the
# start of the device and at the end of the device
Wiping lvm logical volume: /dev/vgroot/lvroot
wiping 1M on /dev/vgroot/lvroot at offsets [0, -1048576]

# now we remove the lv device and then the vg if it's empty
using "lvremove" on vgroot/lvroot
Running command ['lvremove', '--force', '--force', 'vgroot/lvroot'] with allowed return codes [0] (capture=False)
  Logical volume "lvroot" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Running command ['pvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,pv_name'] with allowed return codes [0] (capture=True)
Running command ['vgremove', '--force', '--force', 'vgroot'] with allowed return codes [0, 5] (capture=False)
  Volume group "vgroot" successfully removed

# now the vg was created from /dev/sda1, here curtin wipes the device with
# 1M of zeros at the start and end of this partition
Wiping lvm physical volume: /dev/sda1
wiping 1M on /dev/sda1 at offsets [0, -1048576]

In the scenario where you see the lvremove command fail, what is the outcome
on the system. Does curtin fail the install? Does the install succeed by
something after booting into the new system fail? If the latter, what
commands fail and can you show the output?

Revision history for this message
Nick Niehoff (nniehoff) wrote :

Ryan,
   From the logs the concern is the device or resource busy from meesage:

Running command ['lvremove', '--force', '--force', 'vgk/sdklv'] with allowed return codes [0] (capture=False)
  device-mapper: remove ioctl on (253:5) failed: Device or resource busy
  Logical volume "sdklv" successfully removed
Running command ['lvdisplay', '-C', '--separator', '=', '--noheadings', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)

  Curtin does not fail and the node successfully deploys. This is in an integration lab so these hosts (including maas) are stopped, MAAS is reinstalled, and the systems are redeployed without any release or option to wipe during a MAAS release. Then MAAS deploys Bionic on these hosts thinking they are completely new systems but in reality they still have the old volumes configured. MAAS configures the root disk but nothing to the other disks which are provisioned through other automation later. The customer has correlated these to problems configuring ceph after deployment. I have requested further information about exactly the state of the system when it ends up in this case.

Revision history for this message
Ryan Harper (raharper) wrote :

> This is in an integration lab so these hosts (including maas) are stopped,
> MAAS is reinstalled, and the systems are redeployed without any release
> or option to wipe during a MAAS release.
> Then MAAS deploys Bionic on these hosts thinking they are completely new
> systems but in reality they still have the old volumes configured. MAAS
> configures the root disk but nothing to the other disks which are
> provisioned through other automation later.

Even with a system as you describe, curtin will erase all metadata
as configured. I do not believe that after deployment that any LVM devices
will be present on the booted system or found with LVM scan tools.

I very much would like to see the curtin install log from the scenario
you describe and any "old volumes" that appear configured after the install.

If some post-deployment script starts creating VGs and LVs, it's possible
the could find some metadata that curtin did not detect (offsets further
into the disk. MAAS and curtin aren't responsible for wiping the entire
contents of the disk *unless* told to do so. Curtin accepts config like:

wipe: zero

Which will zero out the entire device (disk, partition, etc). However such
wipes may take very long. I do not think this is a useful setting. Instead
the post-deployment scripts should be using best-practices, just like curtin
does when dealing with reused storage. Note that:

1) LVM tools *warn* when it finds existing metadata
2) All lvm tools include a --zero flag which will remove existing metadata
before creating; this is best practice when re-using existing storage.

Curtin also pre-wipes disks and partitions at their location in the disk
before creating things on top specifically to prevent buried metadata from
causing issues when creating new composed devices.

So please do find out more details about the post-install deployment.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for curtin (Ubuntu) because there has been no activity for 60 days.]

Changed in curtin (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.