LVM VG is not activated during system boot

Bug #1396213 reported by MegaBrutal on 2014-11-25
64
This bug affects 27 people
Affects Status Importance Assigned to Milestone
One Hundred Papercuts
High
Unassigned
initramfs-tools (Ubuntu)
High
Unassigned
linux (Ubuntu)
High
Unassigned
lvm2 (Ubuntu)
High
Unassigned

Bug Description

Hi all,

I open this report based on the linked conversation I had on the linux-lvm mailing list, and the Ask Ubuntu question I posted regarding this case.
https://www.redhat.com/archives/linux-lvm/2014-November/msg00023.html
https://www.redhat.com/archives/linux-lvm/2014-November/msg00024.html
http://askubuntu.com/questions/542656/lvm-vg-is-not-activated-during-system-boot

I have 2 VGs on my system, and for some reason, only one of them gets activated during the initrd boot sequence, which doesn't have my root LV, so my boot sequence halts with an initrd prompt.

When I get to the initrd BusyBox prompt, I can see my LVs are inactive with "lvm lvscan" – then, "lvm vgchange -ay" brings them online. The boot sequence continues as I exit the BusyBox prompt. The expected behaviour would be that both VGs should activate automatically.

On LVM mailing list, I've been advised it may be a problem with Ubuntu initrd scripts, hence I report this problem here. (I'm not sure if I'm reporting it to the correct place by assigning it to "linux", but I didn't find a package directly related to the initrd only, so I assumed initrd scripts are maintained by the kernel team. If you think it's an error, and know the correct package, please reassign!) Our suspicion is, initrd prematurely issues "vgchange -ay" before all the PVs come online. This makes sense.

I already tried to set the "rootdelay" kernel parameter to make initrd wait longer for the boot LV to come up, but it didn't work. Note however, it worked with previous kernel versions.

The problem came up when I upgraded to Utopic, and got kernel vmlinuz-3.16.0-23-generic. When I boot with the old kernel from Trusty (vmlinuz-3.13.0-37-generic), it works fine.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.16.0-23-generic.
AplayDevices: Error: [Errno 2] No such file or directory
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/pcmC1D2p', '/dev/snd/pcmC1D1c', '/dev/snd/pcmC1D0c', '/dev/snd/pcmC1D0p', '/dev/snd/controlC1', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D3p', '/dev/snd/controlC0', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory
Card0.Amixer.info: Error: [Errno 2] No such file or directory
Card0.Amixer.values: Error: [Errno 2] No such file or directory
Card1.Amixer.info: Error: [Errno 2] No such file or directory
Card1.Amixer.values: Error: [Errno 2] No such file or directory
DistroRelease: Ubuntu 14.10
HibernationDevice: RESUME=/dev/mapper/vmhost--vg-vmhost--swap0
InstallationDate: Installed on 2013-12-06 (354 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Release amd64 (20131016)
Lsusb:
 Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
 Bus 002 Device 001: ID 1d6b:0001 Linux Foundation 1.1 root hub
MachineType: WinFast 6150M2MA
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=xterm-256color
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 radeondrmfb
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-3.16.0-23-generic root=/dev/mapper/hostname--vg-hostname--rootfs ro rootflags=subvol=@ rootdelay=300
ProcVersionSignature: Ubuntu 3.16.0-23.31-generic 3.16.4
RelatedPackageVersions:
 linux-restricted-modules-3.16.0-23-generic N/A
 linux-backports-modules-3.16.0-23-generic N/A
 linux-firmware 1.138
RfKill: Error: [Errno 2] No such file or directory
Tags: utopic utopic
Uname: Linux 3.16.0-23-generic x86_64
UnreportableReason: The report belongs to a package that is not installed.
UpgradeStatus: Upgraded to utopic on 2014-10-28 (28 days ago)
UserGroups:

_MarkForUpload: True
dmi.bios.date: 01/19/2008
dmi.bios.vendor: Phoenix Technologies, LTD
dmi.bios.version: 686W1D28
dmi.board.name: 6150M2MA
dmi.board.vendor: WinFast
dmi.board.version: FAB2.0
dmi.chassis.type: 3
dmi.chassis.vendor: WinFast
dmi.modalias: dmi:bvnPhoenixTechnologies,LTD:bvr686W1D28:bd01/19/2008:svnWinFast:pn6150M2MA:pvrFAB2.0:rvnWinFast:rn6150M2MA:rvrFAB2.0:cvnWinFast:ct3:cvr:
dmi.product.name: 6150M2MA
dmi.product.version: FAB2.0
dmi.sys.vendor: WinFast
---
ApportVersion: 2.14.7-0ubuntu8
Architecture: amd64
DistroRelease: Ubuntu 14.10
InstallationDate: Installed on 2013-12-06 (355 days ago)
InstallationMedia: Ubuntu-Server 13.10 "Saucy Salamander" - Release amd64 (20131016)
Package: linux (not installed)
ProcEnviron:
 LANGUAGE=en_US:en
 TERM=linux
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
Tags: utopic
Uname: Linux 3.18.0-031800rc6-generic x86_64
UnreportableReason: The running kernel is not an Ubuntu kernel
UpgradeStatus: Upgraded to utopic on 2014-10-28 (29 days ago)
UserGroups:

_MarkForUpload: True

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1396213

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: utopic
MegaBrutal (qbu6to) on 2014-11-25
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
MegaBrutal (qbu6to) wrote :

I can't run apport-collect because Lynx doesn't send HTTP referrer to Launchpad, and I don't wish to go through the ordeal of finding out how to change this behaviour.

Furthermore, I'm not sure what logs would be collected. I can provide logs you need manually, if you ask for them.

MegaBrutal, could you please provide the requested information following https://help.ubuntu.com/community/ReportingBugs#Filing_bugs_when_offline_or_using_a_headless_setup ?

tags: added: regression-release
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
MegaBrutal (qbu6to) wrote :

I've generated the report and I'm trying to use what the linked tutorial suggests:

$ ubuntu-bug -c 1396213.apport -u 1396213
Usage: ubuntu-bug [options] [symptom|pid|package|program path|.apport/.crash file]

ubuntu-bug: error: -u/--update-bug option cannot be used together with options for a new report

That tutorial must be outdate, and I couldn't figure out how the syntax changed to make it send the report and file it under this existing bug. I get the same with "apport-cli". What is the correct command to use if I have the report file, and want it to be attached here?

apport information

tags: added: apport-collected
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

MegaBrutal, could you please test the latest upstream kernel available from the very top line at the top of the page (the release names are irrelevant for testing, and please do not test the daily folder) following https://wiki.ubuntu.com/KernelMainlineBuilds ? It will allow additional upstream developers to examine the issue.

If the test did not allow you to test to the issue (ex. you couldn't boot into the OS) please make a comment in your report about this, and continue to test the next most recent kernel version until you can test to the issue. Once you've tested the upstream kernel, please comment on which kernel version specifically you tested. If this bug is fixed in the mainline kernel, please add the following tags:
kernel-fixed-upstream
kernel-fixed-upstream-VERSION-NUMBER

where VERSION-NUMBER is the version number of the kernel you tested exactly shown as:
kernel-fixed-upstream-3.18-rc6

This can be done by clicking on the yellow circle with a black pencil icon next to the word Tags located at the bottom of the bug description.

If the mainline kernel does not fix this bug, please add the following tags:
kernel-bug-exists-upstream
kernel-bug-exists-upstream-VERSION-NUMBER

Once testing of the upstream kernel is complete, please mark this bug's Status as Confirmed. Please let us know your results. Thank you for your understanding.

MegaBrutal (qbu6to) on 2014-11-26
description: updated
MegaBrutal (qbu6to) on 2014-11-26
tags: added: kernel-bug-exists-upstream kernel-bug-exists-upstream-3.18-rc6
MegaBrutal (qbu6to) wrote :

I experience the same symptoms with the mainline kernel.

Note, if the problem is really with initrd scripts, then probably the kernel version doesn't matter much. On the other hand, as I see, lvm2 initrd scripts haven't changed since Trusty, which would suggest it's a kernel bug, after all. Hard to pinpoint.

Either way, I suspect one of my hard disks (/dev/sda, Maxtor) initializes late, and that might cause the problem. Still interesting that older kernels bother to wait for it and properly activate the vmhost-vg.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed

MegaBrutal, just to advise, apport-collect'ing on the mainline kernel doesn't provide any helpful information. As well, you don't have to apport-collect further on this report unless asked.

Despite this, the next step is to fully commit bisect from kernel 3.13.0-37 to 3.16.0-23 in order to identify the last good kernel commit, followed immediately by the first bad one. This will allow for a more expedited analysis of the root cause of your issue. Could you please do this following https://wiki.ubuntu.com/Kernel/KernelBisection ? Please note, finding adjacent kernel versions is not fully commit bisecting.

Thank you for your understanding.

Helpful bug reporting tips:
https://wiki.ubuntu.com/ReportingBugs

tags: added: needs-bisect
Changed in linux (Ubuntu):
status: Confirmed → Incomplete

apport information

description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

MegaBrutal, just to reinforce https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1396213/comments/22 you do not have to do any more apport-collect'ing at this time.

description: updated
MegaBrutal (qbu6to) wrote :

Yeah, I was in hurry and didn't have the time to give you the reason why I did this.

I sent the latest apport from a kernel with which I don't experience the issue. The difference is that I see these messages:

[ 5.527952] bio: create slab <bio-1> at 1
[ 151.415509] bio: create slab <bio-2> at 2
[ 161.717873] bio: create slab <bio-3> at 3
[ 162.277013] bio: create slab <bio-4> at 4

When the last of these appears, boot immediately continues correctly. Not sure if it's of any relevance.

Anyway, today I've tried the following mainline kernels:
3.13.0-031300
3.14.0-031400
3.15.0-031500
3.16.0-031600

What surprised me is that even with 3.13.0-031300 I experienced the problem. I'm kind of confused now. I'll continue investigation tomorrow. I will do kernel bisection as you requested, but first I'd like to make sure the problem is not with initrd-s. I don't know what to do if it turns out my initrd-s are the culprits. The only working kernel I have (vmlinuz-3.13.0-37-generic) has its initrd generated from the old Trusty times. It seems every initrd I generate with any new installed kernel with Utopic, has the problem.

I won't send more apports, sorry for the inconvenience.

MegaBrutal (qbu6to) wrote :

I've had a backup of my old Trusty system, and I've built an initrd for 3.16.0-23-generic in that environment. With that initrd, the kernel booted properly. Note, still it takes several minutes for the VG to activate, but it eventually comes up if the kernel waits enough with "rootdelay=300".

Now, kernel bisection would take me to nowhere, as it seems the problem is not with the kernel itself. Any initrd I build under Utopic has the symptom, even older kernels from the 3.13 series.

Could you advise how could I investigate further? I have no idea how could I debug/bisect initrd-s. As I found, the lvm2 package hasn't changed between Trusty and Utopic, so it's not lvm2's initrd scripts themselves.

MegaBrutal, you may want to ping an lvm2 maintainer about this. As per https://launchpad.net/ubuntu/+source/lvm2/+changelog the most recent person to commit to this is Dave Chiluk <email address hidden> .

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in lvm2 (Ubuntu):
status: New → Confirmed

@ Christopher M. Penalver:

Because this bug renders the system temporally unusable, it looks like it should have an importance of "critical".

Changed in lvm2 (Ubuntu):
importance: Undecided → Medium
Changed in hundredpapercuts:
status: New → Confirmed
importance: Undecided → Medium

Alberto Salvia Novella, thanks for the QA, and helpful suggestion. From my understanding, the Ubuntu Kernel Team prefers to mark reports like these High (not sure why I marked it Medium initially, probably was on auto-pilot) as defined in https://wiki.ubuntu.com/Bugs/Bug%20importances :
"Makes a default Ubuntu installation generally unusable for some users ... if the system fails to boot ... on a certain make and model of computer"

If this was reproducible across multiple hardware types, and affected a large group of users, I would agree it better fits the criteria of Critical.

Despite this, let this be marked High across all the marked packages, and the UKT/root package maintainers may mark it further as they see fit.

Changed in linux (Ubuntu):
importance: Medium → High
Changed in lvm2 (Ubuntu):
importance: Medium → High
MegaBrutal (qbu6to) wrote :

Now I have some more info about this. What actually makes the VG activation so long is that I have a snapshot. Activating the snapshot takes very long, and bringing up the entire VG takes about 5 minutes. This wouldn't be such a big problem, as I could just patiently wait for the activation (with rootdelay). But I think the problem is that something kills vgchange before it could finish bringing up all VGs. I had the fortune to boot a developmental Vivid, and I've seen some 'watershed' messages stating that 'vgchange' was killed because it was taking "too long". If we'd let 'vgchange' to finish properly, I had the 2nd VG, which contains my root FS.

MegaBrutal (qbu6to) on 2015-03-19
Changed in initramfs-tools (Ubuntu):
status: New → Confirmed
Changed in initramfs-tools (Ubuntu):
importance: Undecided → High
Changed in hundredpapercuts:
importance: Medium → High
MegaBrutal (qbu6to) wrote :

With the knowledge that the hang is caused by snapshots, some googling has brought up some duplicates:
https://bugs.launchpad.net/lvm2/+bug/360237
https://bugs.launchpad.net/ubuntu/+source/lvm2/+bug/995645

It seems the issue was hanging around since 2009 or earlier.

In Trusty (or probably even earlier distros) it was possible to work around by setting a sufficiently long rootdelay, and eventually all the LVs in all VGs came online. But since Utopic, initrd gives up altogether. No matter how long my rootdelay is, my 2nd VG never gets activated after the snapshot came online in the 1st VG.

MegaBrutal (qbu6to) wrote :

Got an interesting reply from the Red Hat LVM mailing list:
https://www.redhat.com/archives/linux-lvm/2015-March/msg00022.html

Haven't tested the suggestion yet.

MegaBrutal (qbu6to) wrote :

Turns out, the suggested --setactivationskip option is not present in Utopic, as Utopic comes with an LVM version which dates back to 2012 (2.02.98(2) (2012-10-15)), while the feature was implemented in 2013.

Vivid will come with a newer LVM: 2.02.111(2) (2014-09-01).

Activation skip could be a nice workaround if it would be backported.

Astara (astara) wrote :
Download full text (8.2 KiB)

I don't get this bug.

I have at least 1 snapshot going on my "/home" partition all the time.

The VG that /home is in contains most of my partitions (26), with
2 more partitions on a separate (VG+PD's) VG.

Now, I've noticed when I am booting, it *does* take a bit of time to mount
bring up and mount all of the lvs, but you can the root mount is NOT
in an VG/LV -- It's on a "regular device" (numbers on left are w/kernel time
printing turned on -- so they are in seconds after boot):

[ 4.207621] XFS (sdc1): Mounting V4 Filesystem
[ 4.278746] XFS (sdc1): Starting recovery (logdev: internal)
[ 4.370757] XFS (sdc1): Ending recovery (logdev: internal)
[ 4.379839] VFS: Mounted root (xfs filesystem) on device 8:33.
..
[ 4.449462] devtmpfs: mounted
... last msg before my "long pause" where pretty much everything
get activated:
[ 4.591580] input: Dell Dell USB Keyboard as /devices/pci0000:00/0000:00:1a.7/usb1/1-3/1-3.2/1-3.2:1.0/0003:413C:2003.0002/input/input4
[ 4.604588] hid-generic 0003:413C:2003.0002: input,hidraw1: USB HID v1.10 Keyboard [Dell Dell USB Keyboard] on usb-0000:00:1a.7-3.2/input0
[ 19.331731] showconsole (170) used greatest stack depth: 13080 bytes left
[ 19.412411] XFS (sdc6): Mounting V4 Filesystem
[ 19.505374] XFS (sdc6): Ending clean mount
.... more mostly unrelated messages... then you start seeing "dm's" mixed in
with the mounting messages -- just before kernel logging stops:

[ 22.205351] XFS (sdc2): Mounting V4 Filesystem
[ 22.205557] XFS (sdc3): Mounting V4 Filesystem
[ 22.216414] XFS (dm-5): Mounting V4 Filesystem
[ 22.217893] XFS (dm-6): Mounting V4 Filesystem
[ 22.237345] XFS (dm-1): Mounting V4 Filesystem
[ 22.245201] XFS (dm-8): Mounting V4 Filesystem
[ 22.267971] XFS (dm-13): Mounting V4 Filesystem
[ 22.293152] XFS (dm-15): Mounting V4 Filesystem
[ 22.299737] XFS (sdc8): Mounting V4 Filesystem
[ 22.340692] XFS (sdc2): Ending clean mount
[ 22.373169] XFS (sdc3): Ending clean mount
[ 22.401381] XFS (dm-5): Ending clean mount
[ 22.463974] XFS (dm-13): Ending clean mount
[ 22.474813] XFS (dm-1): Ending clean mount
[ 22.494807] XFS (dm-8): Ending clean mount
[ 22.505380] XFS (sdc8): Ending clean mount
[ 22.544059] XFS (dm-15): Ending clean mount
[ 22.557865] XFS (dm-6): Ending clean mount
[ 22.836244] Adding 8393924k swap on /dev/sdc5. Priority:-1 extents:1 across:8393924k FS
Kernel logging (ksyslog) stopped.
Kernel log daemon terminating.
-----
A couple of things different about my setup from the 'norm' --
1) since my distro(openSuSE) jumped to systemd, (and I haven't), I had to write some
rc scripts to help bring up the system.
2) one reason for this was my "/usr" partition is separate from root and
my distro decided to move many libs/bins ->usr and leave symlinks on the
root device to the programs in /usr. One of those was 'mount' (and its associated libs).

That meant that once the rootfs was booted I had no way to mount /usr, where most
of the binaries are (I asked why they didn't do it the "safe way" and move most
of the binaries to /bin & /lib64 and put symlinks in /usr but they evaded answering that question for ~2 years . So one script I run aft...

Read more...

MegaBrutal (qbu6to) wrote :

@Astara:
> Now, I've noticed when I am booting, it *does* take a bit of time to mount
> bring up and mount all of the lvs, but you can the root mount is NOT
> in an VG/LV -- It's on a "regular device" (numbers on left are w/kernel time
> printing turned on -- so they are in seconds after boot):
>
> [ 4.207621] XFS (sdc1): Mounting V4 Filesystem
> [ 4.278746] XFS (sdc1): Starting recovery (logdev: internal)
> [ 4.370757] XFS (sdc1): Ending recovery (logdev: internal)
> [ 4.379839] VFS: Mounted root (xfs filesystem) on device 8:33.

If I understand you well, you have your root FS on a regular partition, thus you are not affected by this bug, as partitions don't need to be „activated”. You need to have your root FS on an LV to be affected.

> if test -d /etc/lvm -a -x /sbin/vgscan -a -x /sbin/vgchange ; then
> # Waiting for udev to settle
> if [ "$LVM_DEVICE_TIMEOUT" -gt 0 ] ; then
> echo "Waiting for udev to settle..."
> /sbin/udevadm settle --timeout=$LVM_DEVICE_TIMEOUT
> fi
> echo "Scanning for LVM volume groups..."
> /sbin/vgscan --mknodes
> echo "Activating LVM volume groups..."
> /sbin/vgchange -a y $LVM_VGS_ACTIVATED_ON_BOOT
> mount -c -a -F

Well, this is some interesting stuff. At first I thought you quoted it from an initrd script, but then I read you don't use an initrd. So where this script comes from? Did you write it, or is it shipped with the distro? Maybe the timeouts can be tweaked to allow a longer activation time, and it is also sane to only allow the root VG to activate at boot time and then activating the data VG later (probably asynchronously), after the root LV was successfully mounted.

I have this problem as well on my 14.04.2 Dom0 cloud1. The VG that holds all VMs does not come up active on boot so no VMs set to autostart are started. I have another 14.04.2 which is on matching hardware and configured very similar to cloud1. It reboots and starts up the VMs properly. like this one but without LVM snapshots. This seems to basically mean that servers with LVM snapshots can not be counted on to boot properly and activate the VG. Please reply and let me know if this understanding is incorrect.

Changed in initramfs-tools (Ubuntu):
status: Confirmed → Fix Released
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Changed in lvm2 (Ubuntu):
status: Confirmed → Fix Released
Changed in hundredpapercuts:
status: Confirmed → Fix Released
igor (igccpao) wrote :

im not sure, but looks the problem i have here is the same problem 14.04

gave up waiting for root device
common problems:
--boot arts (cat /proc/cmdline)
-check root delay (did the system wait long enough?)
-check root (did the system wait for the right device?)
-missing modules ls/dev
ALERT! /dev/mapper/ubuntu-vg-root noes not exist!
droping to a shell.
budt box 1.21.1 (ubuntu 1.1.21.0-1ubuntu1) built-in-shell (ash)
enter 'help' for a list of built-in- commants
(initramfs)

on a live cd>

ubuntu@ubuntu:~$ sudo lvm lvscan
  ACTIVE '/dev/ubuntu-vg2/root' [929.36 GiB] inherit

ubuntu@ubuntu:~$ sudo mkdir /newroot
ubuntu@ubuntu:~$ sudo mount /dev/ubuntu-vg2/root /newroot
mount: special device /dev/ubuntu-vg2/root does not exist

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers