LVM boot problem - volumes not activated after upgrade to Xenial

Bug #1573982 reported by Yavor Nikolov on 2016-04-23
104
This bug affects 18 people
Affects Status Importance Assigned to Milestone
MAAS
Undecided
Unassigned
curtin
Undecided
Unassigned
lvm2 (Ubuntu)
Undecided
Unassigned

Bug Description

Soon after upgrade to Xenial (from 15.10) the boot process got broken. I'm using LVM for /root swap and other partitions.

===
The current behaviour is:

When I boot short after the Grub login screen I'm getting log messages like:

---
Scanning for Btrfs filesystems
resume: Could not state the resume device file: '/dev/mapper/VolGroup....'
Please type in the full path...
---

Then I press ENTER, for a few minutes some errors about floppy device access are raised (for some reason it tries to scan fd0 when floppy drive is empty). And then:

---
Gave up waiting for root device. Common problems: ...
...
ALERT! UUID=xxx-xxx.... does not exist.
Dropping to a shell.
---

From the BusyBox shell I managed to recover the boot by issuing "lvm vgchange -ay", then exit and then boot continues fine (all LVM file systems are successfully mounted).

===
One workaround so far is creating /etc/initramfs-tools/scripts/local-top/lvm2-manual script doing "lvm vgchange -ay". But I'm looking for cleaner solution.

Boot used to work fine with 15.10. Actually the first boot after upgrading to Xenial actually worked OK too, I'm not sure what might changed meanwhile (I've been fixing some packages installation since mysql server upgrade has failed).

===
# lsb_release -rd
Description: Ubuntu 16.04 LTS
Release: 16.04

description: updated
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lvm2 (Ubuntu):
status: New → Confirmed
Uxorious (uxorious) wrote :

I'm not seeing ANY LVM volumes active on system boot.
(I'm not putting any of the necessary boot paths on LVM).

After booting the system, the volume is visible but not active.
If I put one of the drives in sftab, booting Ubuntu breaks.

Is there a workaround to make the system do "vgchange -a y" during boot?

Yavor Nikolov (yavor-nikolov) wrote :

My workaround is as I explained in the issue description: I added a script in /etc/initramfs-tools/scripts/local-top/ folder which performs `vgchange -ay`.

MatthewHawn (steamraven) wrote :

I just ran into this upgrading from 14.04. My system is a btrfs raid across two LVM Volume Groups. Both volume groups need to be activated at boot, before the "btrfs device scan". The system used to do this.

Putting a vgchange in a script in local-top fixes this.

Thanks!

MatthewHawn (steamraven) wrote :

The apparent cause seems to be lvm2 (2.02.133-1ubuntu8). From the Changelog (https://launchpad.net/ubuntu/xenial/+source/lvm2/+changelog)

lvm2 (2.02.133-1ubuntu8) xenial; urgency=medium

  * Drop debian/85-lvm2.rules. This is redundant now, VGs are already
    auto-assembled via lvmetad and 69-lvm-metad.rules. This gets rid of using
    watershed, which causes deadlocks due to blocking udev rule processing.
    (LP: #1560710)
  * debian/rules: Put back initramfs-tools script to ensure that the root and
    resume devices are activated (lvmetad is not yet running in the initrd).
  * debian/rules: Put back activation systemd generator, to assemble LVs in
    case the admin disabled lvmetad.
  * Make debian/initramfs-tools/lvm2/scripts/init-premount/lvm2 executable and
    remove spurious chmod +x Ubuntu delta in debian/rules.

 -- Martin Pitt <email address hidden> Wed, 30 Mar 2016 10:56:49 +0200

The initramfs-tools script does not activate all of the logical volumes and its detection is lacking in certain edge cases like mine.

Databay (rs-databay) wrote :

I can confirm this bug to be present also in lvm2 (2.02.133-1ubuntu10).

I got the affected system (upgraded via do-release-upgrade on 09.08.2016) back up with above mentioned workaround:

Creating /etc/initramfs-tools/scripts/local-top/lvm2 script doing "lvm vgchange -ay". And making it executable.

Shouldn't this bug get some priority since it possibly makes a remote-system inaccesible ?

eulPing (francois-jeanmougin) wrote :

I can confirm same issue here after upgrade or 14.04 to 16.04.
Note that on my system, / is not on LVM.

lvm is not initiated at boot time nor at init time and the system gave up mounting /usr (/ is not on LVM on my system). For me, this is even worst, even when / is mounted and we are supposed to be in a sort of "userland", LVM is not up.

I had to mount -- bind proc, run, sys and dev to /root/
Then lvm vgchange -ay
then mount -a
[This is required to run update-iniramfs as this script is in /usr and requires /var]
Then mount -o remount rw /
Then create a lvm2 script in local-top as described earlier [THANK YOU!]
Then update initramfs with update-initramfs -k all -u
Then sync and umount
exit the chroot
reboot

This is not an obvious process to follow, especially ending up with an undocumented script in local-top :).

Good luck all!

Lisio (lisio) wrote :

Faced with the same behavior yesterday, the only workaround for me became adding line "vgchange -ay" to /usr/share/initramfs-tools/scripts/local-top/lvm2.

Didn't change any config for a couple of months before this issue, only executed apt-get upgrade on regular basis.

However, now I get the following warnings during boot:

Sep 20 12:39:17 server systemd[1]: Started File System Check on /dev/data/data.
Sep 20 12:39:17 server systemd[1]: Mounting /data...
Sep 20 12:39:17 server systemd[1]: dev-disk-by\x2dlabel-web.device: Dev dev-disk-by\x2dlabel-web.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/virtual/block/dm-0
Sep 20 12:39:17 server systemd-fsck[840]: web: clean, 906007/134217728 files, 455468592/536870912 blocks
Sep 20 12:39:17 server systemd[1]: Started File System Check on /dev/data/web.
Sep 20 12:39:17 server systemd[1]: Mounting /web...
Sep 20 12:39:17 server systemd[1]: Mounted /data.
Sep 20 12:39:17 server kernel: EXT4-fs (dm-1): mounted filesystem with ordered data mode. Opts: (null)
Sep 20 12:39:17 server systemd[1]: dev-disk-by\x2dlabel-web.device: Dev dev-disk-by\x2dlabel-web.device appeared twice with different sysfs paths /sys/devices/virtual/block/dm-1 and /sys/devices/virtual/block/dm-0
Sep 20 12:39:17 server systemd[1]: Mounted /web.
Sep 20 12:39:17 server kernel: EXT4-fs (dm-0): mounted filesystem with ordered data mode. Opts: (null)

Benpro (benpro82) wrote :

I wonder if this is due to the use of systemd. As seen on https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=774082 for Debian.

Jarod (jarod42) wrote :

Last night I ran into the same problem. I upgraded from 12.04 LTS to 16.04.1 LTS Server and got stuck at boot.
The last message complained about a UUID not being present. It turned out it was the /usr FS. Doing an "lvm lvscan" from the initrd prompt showed all but one LVs inactive. The only one active was bootvg/root.
I then booted via rescue system and added "lvm vgchange -ay" in /usr/share/initramfs-tools/scripts/local-top/lvm2 right before "exit 0". After running "update-initramfs -k all -c" and rebooting the server got up again.

The bootvg is on a RAID1 disk controlled via mdadm.

mdadm --detail /dev/md1
/dev/md1:
        Version : 1.2
  Creation Time : Sat Dec 20 16:49:58 2014
     Raid Level : raid1
     Array Size : 971924032 (926.90 GiB 995.25 GB)
  Used Dev Size : 971924032 (926.90 GiB 995.25 GB)
   Raid Devices : 2
  Total Devices : 2
    Persistence : Superblock is persistent

    Update Time : Mon Jan 23 09:50:47 2017
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0

           Name : XXX:1
           UUID : xxxxxxxx:xxxxxxxx:xxxxxxxx:xxxx814e
         Events : 21001

    Number Major Minor RaidDevice State
       0 8 19 0 active sync /dev/sdb3
       2 8 3 1 active sync /dev/sda3

pvs
  PV VG Fmt Attr PSize PFree
  /dev/md1 bootvg lvm2 a-- 926.90g 148.90g

The /boot FS is on sda1/sdb1 also via RAID1
sda2 and sda2 are swap

fdisk -l /dev/sda
Disk /dev/sda: 931.5 GiB, 1000204886016 bytes, 1953525168 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 4096 bytes
I/O size (minimum/optimal): 4096 bytes / 4096 bytes
Disklabel type: dos
Disk identifier: 0x00000000

Device Boot Start End Sectors Size Id Type
/dev/sda1 2048 1026047 1024000 500M fd Linux raid autodetect
/dev/sda2 1026048 9414655 8388608 4G 82 Linux swap / Solaris
/dev/sda3 9414656 1953525167 1944110512 927G fd Linux raid autodetect

lsb_release -rd
Description: Ubuntu 16.04.1 LTS
Release: 16.04

dpkg -l lvm2
ii lvm2 2.02.133-1ubuntu amd64 Linux Logical Volume Manager

Akshay Moghe (akshay-moghe) wrote :

Facing a similar problem on a debootstrap rootfs.

Even after ensuring that the lvm2 package is installed (and hence the initramfs scripts are present) I still get dropped to a shell in the initramfs. Running `lvchange -ay` causes the volume to show up and subsequently the bootup will succeed. I presume "fixing" the script (as described in comment-8) will fix the problem but I'd like to see a fix where I'm not forced to re-roll my own initrd.

Any pointers as to why this might be happening?

Tore Anderson (toreanderson) wrote :

I ran across the same bug. It was caused by the root filesystem being specified on the kernel command line with the root=UUID=<foo> syntax. This is not handled by the case "$dev" in stanza in activate() in /usr/share/initramfs-tools/scripts/local-top/lvm2. See attached screenshot. If I change the kernel command line to say root=/dev/vg0/root instead it works.

Chris Sanders (chris.sanders) wrote :

I've run across this today and it affects MAAS.
MAAS version: 2.2.2 (6099-g8751f91-0ubuntu1~16.04.1)

Configuring an LVM based drive with a raid on top of it for the root partition will trigger this. Deploying the default kernel / OS will fail due to inactive volume groups.

The fix as expected:
lvm vgchange -ay
mdadm --assemble --scan
exit

Then apply the above mentioned script to make it stick.

affects: maas (Ubuntu) → maas
Andres Rodriguez (andreserl) wrote :

@Chris,

Can you attach the output of:

maas <user> machine get-curtin-config <systemid>

And attach the follow curtin log (you can grab that from the UI under the Installation tab).

Also, this seems an issue widely with Ubuntu.

Curtin is the one that writes this configuration, so marking this as Incomplete for MAAS and opening it in curtin.

Changed in maas:
status: New → Incomplete
Ryan Harper (raharper) on 2017-11-06
Changed in curtin:
status: New → Incomplete
Chris Sanders (chris.sanders) wrote :

The machine I was using has been redeployed without LVM. If I get a chance to redeploy I'll grab the requested logs. It's fairly trivial to trigger if you have a machine available to deploy with lvm boot as described above.

TJ (tj) wrote :

Attached is a patch (generated on 16.04) that activates volume groups when root=UUID=... is on the kernel command-line.

The attachment "activate VGs when root=UUID=" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags: added: patch
deehefem (deehefem) wrote :

Patch works fine for me... Kinda odd it's been two years and it hasn't been rolled into the upgrade. 70 machines I have to patch after upgrading :/

Cameron Paine (cbp) wrote :

This bug report enabled me to recover quickly from a planned upgrade (14.04 -> 16.04) that went south. FWIW I'm able to confirm that it's a live issue.

All of our critical workstations are deployed with lvs on top of md devices. Some, including the one I was upgrading, use md mirrors.

FWIW:

$ cat /proc/cmdline
root=/dev/mapper/sysvg-root ro quiet splash
$ uname -a
Linux lab-netvista 3.13.0-87-generic #133-Ubuntu SMP Tue May 24 18:33:01 UTC 2016 i686 i686 i686 GNU/Linux
$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

If there's anything else I can provide to assist in resolution please let me know.

Cameron

Adam Seering (aseering) wrote :

This bug report just enabled me to recover from an upgrade to Ubuntu 18.04.1. So I can confirm that this is still an issue.

Root partition on an LVM volume; LVM physical volume on a software (mdadm) RAID.

The workaround in this comment solved the problem for me:
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/1573982/comments/10

Let me know if I can provide any additional useful information.

Thomas Stadler (tomina) wrote :

Could someone please describe me how to add the patch from TJ?

Steve Dodd (anarchetic) wrote :

Confused to see no movement on this bug?

The logical thing seemed to be add another case to /usr/share/initramfs-tools/scripts/local-top/lvm2 calling lvchange_activate with no parameters, but it seems that doesn't work - does activation/auto_activation_volume_list need to be set in lvm.conf perhaps?

I decided giving an explicit root=/dev/vg/lv on the command line was probably more transparent than burying a setting in lvm.conf anyway.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.