[partition reuse] VGs are not deleted

Bug #1837214 reported by Paride Legovini on 2019-07-19
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
curtin
Undecided
Unassigned
subiquity
Critical
Michael Hudson-Doyle

Bug Description

In manual partitioning mode one may want to keep some partitions but throw away a VG, replacing it with a plain partition or with another VG built from scratch.

Subiquity allows for deleting VGs, but this is not actually done. Even if the "deleted" VG disappears, another VG with the same name can't be created (the error is: "vg0 is not a valid name for a volume group"). If the VG has been "deleted" from subiquity and a device which was in the VG is used in another way, the installation fails at format time with the error "device is in use".

In both cases a look in /dev reveals the VG still exists.

If a VG is deleted and not reused the installation does not fail, but the VG comes up configured in the installed system.

Related branches

Paride Legovini (paride) wrote :

VGs are not deleted even "reformatting" the device ("Reformat" from the device context menu).

On Sat, 20 Jul 2019, 02:02 Paride Legovini, <email address hidden>
wrote:

> Public bug reported:
>
> In manual partitioning mode one may want to keep some partitions but
> throw away a VG, replacing it with a plain partition or with another VG
> built from scratch.
>
> Subiquity allows for deleting VGs, but this is not actually done. Even
> if the "deleted" VG disappears, another VG with the same name can't be
> created (the error is: "vg0 is not a valid name for a volume group").
>

Ah I know what causes this: you can't create a vg with a name that
already exists in /dev. And subiquity hasn't been taught that the install
might actually remove a device node... Or at least it should.

If

> the VG has been "deleted" from subiquity and a device which was in the
> VG is used in another way, the installation fails at format time with
> the error "device is in use".
>

This is another bug and at least possibly one in curtin although it's also
possible that we're generating a bogus config. Did you save the logs?

In both cases a look in /dev reveals the VG still exists.
>
> If a VG is deleted and not reused the installation does not fail, but
> the VG comes up configured in the installed system.
>

Hmm I guess we need to do something that will cause curtin to run
clear_holders on all the members of an LVM / RAID we delete.

Cheers,
mwh

>

Paride Legovini (paride) wrote :

Hey Michael,

Michael Hudson-Doyle wrote on 22/07/2019:
>> If the VG has been "deleted" from subiquity and a device which was in the
>> VG is used in another way, the installation fails at format time with
>> the error "device is in use".
>>
>
> This is another bug and at least possibly one in curtin although it's also
> possible that we're generating a bogus config. Did you save the logs?

I didn't, but I think I can easily reproduce this.

> In both cases a look in /dev reveals the VG still exists.

Yes it does, this is why I added this detail in this bug. I think the
root cause is the VG is not deleted, even if it appears to be gone from
the subiquity interface.

Paride

Changed in subiquity:
status: New → Triaged
importance: Undecided → Critical
assignee: nobody → Michael Hudson-Doyle (mwhudson)
tags: added: reuse
Michael Hudson-Doyle (mwhudson) wrote :

My test setup for this has a VG consisting of a single PV consisting of a partition. If I remove the VG, reformat the disk and then create some partitions on it, everything works, the VG is removed and the install completes. But if you remove the VG and then try to reuse the partition somehow, things break as you describe.

I think that Ryan's patch from https://code.launchpad.net/~mwhudson/curtin/+git/curtin/+merge/370028/comments/967403 is relevant here, I'll try that next.

Michael Hudson-Doyle (mwhudson) wrote :

Oh no that doesn't help, because get_device_paths_from_storage_config doesn't return things that have preserve set on them. I think this comes down to the confusion we've talked about before that preserve: true should mean the thing itself is preserved, but curtin interprets it (sometimes, at least) as that the data on the thing should be preserved as well.

I think this will have to be release noted for 18.04.3 as well.

Paride Legovini (paride) wrote :

Hi Michael, a "Reformat" does seem to help when switching from LVM to partitions, however this is not an option when you want to delete a VG and replace it with another VG.

Steps to reproduce:

 1. At install time, setup a VG named vg0 consisting of a
    entire single device and mount it somewhere
 2. Reboot and start the installer again.
 3. In manual partitioning mode, delete the VG.
 4. Try to recreate a VG on the same device. The name "vg0"
    won't work, because "behind" subiquity the original vg0
    is still alive and well. This is already a bit confusing.
 5. Give up with vg0, name it vg1 and do ahead with installing
 6. The installer fails trying to setup vg1, as its PV is
    still part of vg0

Note that there is no "Reformat" option in this case. After deleting the VG the device appears as "clean" in subiquity. There is only the "Format" option, which formats it with a filesystem , which is not wanted in this case.

Ryan Harper (raharper) wrote :

Can we capture the debug data, probert probe data and the yaml curtin sends to subiquity?

Curtin certainly knows how to clear devices, including vgs; the question is whether subiquity told it to do so.

Paride Legovini (paride) wrote :

Here are the installer logs in the "replace VG with another VG" scenario, basically the steps in comment #6.

vg1 is composed of disk-sdb, which is set to be preserved instead of being
wiped.

  - serial: QEMU_HARDDISK_QM00002
    path: /dev/sdb
    preserve: true
    name: ''
    grub_device: false
    type: disk
    id: disk-sdb

 - name: vg1
    devices:
    - disk-sdb
    preserve: false
    type: lvm_volgroup
    id: lvm_volgroup-0

I think if subiquity marks the vg as preserve: false, then it needs to
ensure the composition devices are cleared (wipe: superblock) which implies
they cannot also have preserved: true.

On Tue, Jul 23, 2019 at 11:50 AM Paride Legovini <
<email address hidden>> wrote:

> Here are the installer logs in the "replace VG with another VG"
> scenario, basically the steps in comment #6.
>
>
> ** Attachment added: "tarball of /var/log/installer"
>
> https://bugs.launchpad.net/subiquity/+bug/1837214/+attachment/5278733/+files/installer-vg-delete-fail.tar.gz
>
> --
> You received this bug notification because you are subscribed to
> subiquity.
> Matching subscriptions: subiquity-bugs
> https://bugs.launchpad.net/bugs/1837214
>
> Title:
> [partition reuse] VGs are not deleted
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/subiquity/+bug/1837214/+subscriptions
>

On Wed, 24 Jul 2019 at 05:20, Ryan Harper <email address hidden>
wrote:

> vg1 is composed of disk-sdb, which is set to be preserved instead of being
> wiped.
>
> - serial: QEMU_HARDDISK_QM00002
> path: /dev/sdb
> preserve: true
> name: ''
> grub_device: false
> type: disk
> id: disk-sdb
>
> - name: vg1
> devices:
> - disk-sdb
> preserve: false
> type: lvm_volgroup
> id: lvm_volgroup-0
>
> I think if subiquity marks the vg as preserve: false, then it needs to
> ensure the composition devices are cleared (wipe: superblock) which implies
> they cannot also have preserved: true.
>

Right, this line of thinking is what lead directly to me filing
https://bugs.launchpad.net/curtin/+bug/1837487 :)

tags: added: id-5d645a2e2be6213e05b2df9a

This bug is fixed with commit bf03e4f7 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=bf03e4f7

Paride Legovini (paride) wrote :

I think this can be marked Fix Released, but I'll give it a test run first.

Paride Legovini (paride) wrote :

Well, I was wrong: the problem still exists exactly as described in the bug description. To summarize:

 - Once a vg is deleted, a vg with the same name can't be created.
 - If a partition that used to be in a deleted vg is reused, say /dev/vda3, the installation fails with error: "/dev/vda3 is apparently in use by the system; will not make a filesystem here!" while trying to format the partition.
 - This is because the vg doesn't really get deleted: /dev/mapper/vg0-lv--0 is still there and lvmdiskscan lists /dev/vda3 as part of a LVM volume.

Ryan Harper (raharper) wrote :

Do you have the installer tarball? I wonder if the wipe: zero value is getting set on the preserved partition.

I made some changes for other reasons that might have fixed this. Can you try again?

Paride Legovini (paride) wrote :

Thanks Michael. I tried and it's not fixed yet. The edge subiquity snap is from commit 1dc93b33, currently the latest. It behaves as before.

Attached is the tarball of the installer /var/log.

Paride Legovini (paride) wrote :

Well, while from the subiquity user interface the behavior is the same as before, the logs show a different error now:

An error occured handling 'partition-vda2': RuntimeError - Cannot find previous partition on disk /dev/vda
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: configuring partition: partition-vda2
TIMED BLOCK_META: 2.069
finish: cmd-install/stage-partitioning/builtin/cmd-block-meta: FAIL: curtin command block-meta
Traceback (most recent call last):
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/commands/main.py", line 202, in main
    ret = args.func(args)
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/log.py", line 97, in wrapper
    return log_time("TIMED %s: " % msg, func, *args, **kwargs)
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/log.py", line 79, in log_time
    return func(*args, **kwargs)
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 102, in block_meta
    return meta_custom(args)
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 1806, in meta_custom
    handler(command, storage_config_dict)
  File "/snap/subiquity/1691/lib/python3.6/site-packages/curtin/commands/block_meta.py", line 730, in partition_handler
    'Cannot find previous partition on disk %s' % disk)
RuntimeError: Cannot find previous partition on disk /dev/vda
Cannot find previous partition on disk /dev/vda
curtin: Installation failed with exception: Unexpected error while running command.
Command: ['curtin', 'block-meta', 'simple']
Exit code: 3

Before it was mkfs.ext4 refusing to make a filesystem in a partition currently in use, because still part of the to-be-deleted VG.

It is still not possible to delete a VG and create a new one named as the deleted one (e.g. vg0). This is exactly as before.

Ryan Harper (raharper) wrote :

The partition number change in subiquity is exposing some edge cases in curtin's expected numerical ordering.

The first partition on the disk is being reported as:

{device: disk-vda, size: 1073741824, flag: linux, number: 2, preserve: true, type: partition,
    id: partition-vda2}

And I'm quite sure that block-discover is returning partition 1.

           "partitiontable": {
                "label": "gpt",
                "id": "D8BE8C06-3610-4A08-99E9-BDC2CFDDF47C",
                "device": "/dev/vda",
                "unit": "sectors",
                "firstlba": 34,
                "lastlba": 20971486,
                "partitions": [
                    {
                        "node": "/dev/vda1",
                        "start": 2048,
                        "size": 2048,
                        "type": "21686148-6449-6E6F-744E-656564454649",
                        "uuid": "2932D6B2-AFF8-4B6D-87FB-16071CB4942A"
                    },
                    {
                        "node": "/dev/vda2",
                        "start": 4096,
                        "size": 2097152,
                        "type": "0FC63DAF-8483-4772-8E79-3D69D8477DE4",
                        "uuid": "66022EEF-8D02-41CD-BC89-0340D5E125A5"
                    },
                    {
                        "node": "/dev/vda3",
                        "start": 2101248,
                        "size": 18868224,
                        "type": "0FC63DAF-8483-4772-8E79-3D69D8477DE4",
                        "uuid": "8BEA9420-94D9-41ED-9D17-4A9BE67D3A91"
                    }
                ]
            }

Here's what curtin gives to subiquity:

Merged storage config:
storage:
    config:
    - id: disk-vda
        path: /dev/vda
        ptable: gpt
        type: disk
    - device: disk-vda
        flag: bios_grub
        id: partition-vda1
        number: 1
        offset: 1048576
        size: 1048576
        type: partition
    - device: disk-vda
        flag: linux
        id: partition-vda2
        number: 2
        offset: 2097152
        size: 1073741824
        type: partition
    - device: disk-vda
        flag: linux
        id: partition-vda3
        number: 3
        offset: 1075838976
        size: 9660530688
        type: partition
    - fstype: ext4
        id: format-partition-vda2
        type: format
        uuid: b9d92c76-3bc0-4dc3-9f0c-683c82bc2f78
        volume: partition-vda2
    - devices:
        - partition-vda3
        id: lvm-volgroup-vg0
        name: vg0
        type: lvm_volgroup
    - id: lvm-partition-lv-0
        name: lv-0
        size: 9659482112B
        type: lvm_partition
        volgroup: lvm-volgroup-vg0
    - fstype: ext4
        id: format-lvm-partition-lv-0
        type: format
        uuid: 592b09e4-b108-4457-9e32-97d044a5f84b
        volume: lvm-partition-lv-0
    version: 1

It appears that the bios_boot partition is skipped when emitting the config.

Ryan Harper (raharper) wrote :

Marking curtin task invalid; I believe *different* errors are showing up in this scenario but they are unrelated to the lvm change, and so far entirely related to the storage configuration sent to curtin from subiquity.

Changed in curtin:
status: New → Invalid
Paride Legovini (paride) wrote :

As Ryan noted the error is the same of:

https://bugs.launchpad.net/subiquity/+bug/1872999

but in this case the bios_grub partition was not deleted, so the scenario is not exactly the same. To be 100% sure that the bios_grub partition was still there - or at least that it was not manually deleted while setting up the partition - I re-run everything. The logs are attached. To recap the log tarball is the result of:

1. Installing with partition1 as bios_grub and partition2 as the only member of a VG, the root partition is a LV of this VG

2. Reinstalling on the same disk but deleting the VG and replacing it with a simple ext4 partition mounted as /. The bios_grub partition has been left untouched.

The second install attempt fails and it's the one for which I collected the attached logs. I refreshed subiquity from the edge channel before performing the second install.

Paride Legovini (paride) wrote :

This also happens in UEFI mode, so the problem is not strictly with the bios_grub partition.

Paride Legovini (paride) wrote :

With the latest edge subiquity snap (3bd1382b3b) the "Cannot find previous partition" curtin error is gone, but we're back to the point where:

- a VG can't be deleted and recreated with the same name
- a partition previously in a VG can't be repurposed as
  a "bare" partition, as the VG is was part of is not
  destroyed. This causes the following mke2fs error:

/dev/vdX is apparently in use by the system; will not make a filesystem here!

The steps to reproduce in the bug description still apply.

Logs attached.

Ryan Harper (raharper) wrote :

 - {device: disk-vda, size: 8586788864, flag: linux, number: 2, preserve: true, type: partition,
    id: partition-vda2}

This needs a wipe: superblock

but that may not be enough. I'm going to recreate this scenario; it's slightly different than the previous case that was fixed.

Changed in curtin:
status: Invalid → In Progress
Paride Legovini (paride) wrote :

I can confirm it still fails with the subiquity snap from edge/mwhudson-hack (bec88aaa). It still fails as before.

Ryan: this latest experimental snap is built from this branch:

https://code.launchpad.net/~mwhudson/subiquity/+git/subiquity-1/+ref/hack

which IIUC includes the changes in this PR:

https://github.com/CanonicalLtd/subiquity/pull/714

which has some more superblock wipes.

Paride Legovini (paride) wrote :

I filed a separate bug for the issue with reusing the name of a deleted VG for a new VG:

https://bugs.launchpad.net/subiquity/+bug/1873328

as the problem is completely unrelated.

This bug is fixed with commit 2a035357 to curtin on branch master.
To view that commit see the following URL:
https://git.launchpad.net/curtin/commit/?id=2a035357

Changed in curtin:
status: In Progress → Fix Committed
Paride Legovini (paride) wrote :

Happy to confirm that this is fixed in subiquity 20.04.2 (currently in edge).

On Tue, 21 Apr 2020 at 22:46, Paride Legovini <email address hidden>
wrote:

> Happy to confirm that this is fixed in subiquity 20.04.2 (currently in
> edge).
>

Amazing. Thanks for testing this over and over again!

Paride Legovini (paride) on 2020-05-05
Changed in subiquity:
status: Triaged → Fix Committed
Changed in subiquity:
status: Fix Committed → Fix Released

This bug is believed to be fixed in curtin in version 20.1. If this is still a problem for you, please make a comment and set the state back to New

Thank you.

Changed in curtin:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers