[feisty] mounting LVM root broken

Bug #83832 reported by Aurelien Naldi
16
Affects Status Importance Assigned to Milestone
lvm2 (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

Binary package hint: lvm2

running an up to date feisty, I can not boot on my RAID/LVM root partition. I previously had bug #75681 which seems fixed now (/dev/md0 is created automatically again) but the LVM volumes are not activated by the initramfs anymore.
In the initramfs shell I can not find "/scripts/local-top/lvm" anymore.

adding "break=mount" to the kernel options to get a shell, then running:

/scripts/local-top/mdadm
lvm vgscan
lvm vgchange -a y

does activate the RAID and the LVM volumes, I get the right files in /dev/mapper but I can not boot further. When leaving the busybox shell I get some messages including:
"/init:1:cannot open /dev/mapper/vg_... : No such device or address"
and I then get back to a busybox shell.
mounting my the volume by hand fails with the same message.

On another install (also feisty but a bit less up to date), I can mount the same volume without a problem. As many updates came today for LVM/usev/initramfs, I ran an update via chroot, but the problem is still there.

Any hint to boot it by hand in the meantime ?

Revision history for this message
Ian Jackson (ijackson) wrote : Re: [Bug 83832] [feisty] mounting LVM root broken

Aurelien Naldi writes ("[Bug 83832] [feisty] mounting LVM root broken"):
> running an up to date feisty, I can not boot on my RAID/LVM root
> partition. I previously had bug #75681 which seems fixed now
> (/dev/md0 is created automatically again) but the LVM volumes are
> not activated by the initramfs anymore. In the initramfs shell I
> can not find "/scripts/local-top/lvm" anymore.

We have indeed changed this but it is supposed to work better now and
not break :-).

I have some questions:

 * Which versions of
     lvm2
     lvm-common
     udev
     libdevmapper1.02
     devmapper
   do you have installed ?

 * While it's broken, boot with break=premount, and do this:
 udevd --verbose >/tmp/udev-output 2>&1 &
 udevtrigger
   At this point I think you will find that your mds and lvms
   are not activated; check with
 cat /proc/partitions
   If as I assume they aren't:
 pkill udevd
 lvm vgscan
 lvm vgchange -a y
 mount /dev/my-volume-group/my-volume-name /root

   If as you say this doesn't work, check the major and minor numbers
   and symlinks shown by
        ls -al /dev/mapper
        ls -al /dev/.static/dev/mapper
        ls -al /dev/my-volume-group
        ls -al /dev/.static/dev/my-volume-group
        dmsetup ls
   and the output from
        dmsetup table
   and perhaps
 mount /dev/.static/dev/mapper/vg-lv /root

   When you've got it mounted
        cp /tmp/udev-output* /root/root/.
        exit
   And then when the system boots attach /root/udev-output* and your
   initramfs to this bug report.

 * Does running
     sudo update-initramfs
   fix it ? (If you didn't attach it to this bug report as I ask
   above please keep a copy of the old initramfs so we can peer
   at its entrails.)

Thanks,
Ian.

Revision history for this message
Aurelien Naldi (aurelien.naldi) wrote :

ii lvm-common 1.5.20ubuntu11 The Logical Volume Manager for Linux (common
ii lvm2 2.02.06-2ubuntu8 The Linux Logical Volume Manager
ii udev 103-0ubuntu11 rule-based device node and kernel event mana
ii libdevmapper1.02 1.02.08-1ubuntu4 The Linux Kernel Device Mapper userspace lib
ii mdadm 2.5.6-7ubuntu3 tool to administer Linux MD arrays (software

No newer version are available yet

I do not have devmapper installed, and none is available (my other, working, feisty install does not have any devmapper either)

It looks like I have another problem: after running udevtrigger, /dev/md0 was _not_correctly_ assembled (it was last time). It had been assembled while only two disks (out of four) had been detected. After stopping md0 and running /scripts/local-top/mdadm, it as back and working.

running lvm by hand seemed to work, but I could still not mount my root partition, this time mount complained:
"failed: Invalid argument"

as I could not find /dev/static, I can not compare the majors/minors of /dev/mapper/* :/
I had my 3 LVM volumes as (254,[0-2]). After booting on my working install I see here 253[0-2] so this might be the problem, right ??

I tried to mount my /boot partition to copy the output files but this failed as well, and I can only attach the faulty initramfs.

rebuilding the initramfs did not help (it was rebuild due to an update of usplash, done in a chroot)

I see two problems now:
* /dev/md0 is assembled too early (in 3 boots one worked, one had a single device in the RAID volume and one had two devices), see my comment in bug #75681.
* I am unable to mount ANYTHING in the initramfs, maybe due to a wrong major number ? I guess I can mknode my devices in the initramfs, but I am a little worried that the boot will not go well if udev mangles major codes :/

Revision history for this message
Aurelien Naldi (aurelien.naldi) wrote :

creating the block devices by hand did not help.
And the major/minor numbers for /dev/sda* are the same in the initramfs as in my working system.

This looks pretty annoying :/
This initramfs was created in a chroot, the problem may land here ??

My working system is a 64 bit install, while the broken one is 32bit.
I have mounted my root partition, its /boot, then chrooted into it and mounted /proc

Revision history for this message
Aurelien Naldi (aurelien.naldi) wrote :

OK, now I have a problem :/
I wanted to recreate the initramfs outside of a 64bit install, so I fired an edgy desktop CD: it does not support RAID/lvm anymore.
The dapper CD worked fine, I recreated the initramfs. upon reboot, it is still broken.

The actual problem is that I tryed to add other devices to the broken RAID array. They was added as spare drives and their superblock data got updated. Now I can not assemble the array anymore.

/dev/sdb3 is correctly detected but the three other devices are seen as spare drives.

It should be possible to restore the superblocks, as the one in /dev/sdb3 is still valid and contains the order of all devices in the array, but I am not sure how to do this, any pointer ?
Except for the RAID superblocks, all data in the array should still be in sync, so It should be fixable, hopefully :)

Revision history for this message
Aurelien Naldi (aurelien.naldi) wrote :

Just to let you know, recreating the RAID array in the same order with the same options scared me but did work. Everything seems to be "fine", I am just done playing with it for now, waiting for your analyse of the initramfs to do more tests. Sorry for tonight's flood :)

Revision history for this message
Aurelien Naldi (aurelien.naldi) wrote :

Today I managed to boot this system again, so here are some updates:
* the mount error (invalid argument) was caused by a missing "-t reiserfs" (or "-t ext2" for my /boot partition)
* with "break=premount", the raid array has been correctly assembled twice (out of 4 tries), in which cas the LVM volumes have been correctly detected as well.
* None of my "normal boot" (i.e. without break=[pre]mount) could assemble the array

I attach the output of udevd, corresponding to a boot where the RAID array has been assembled in degraded mode (resync is ongoing right now)

Revision history for this message
Ian Jackson (ijackson) wrote : Re: [Bug 83832] Re: [feisty] mounting LVM root broken

I think I have fixed the bug that causes the assembly of the RAID
arrays in degraded mode, in mdadm_2.5.6-7ubuntu4 which I have just
uploaded.

Please let me know if it works ...

Thanks,
Ian.

Revision history for this message
Jonathan Hudson (jh+lpd) wrote :

I think I've been hit by something like this too. Having also thought that the raid1 stuff, was fixed, I dist-upgraded last night, shutdown and today find the box is broken beyond my ability even to get an initramfs shell.

With (which booted on Friday) without any break=

root=/dev/mapper/vg00/root break=premount

....
VFS: Cannot open root device on mapper/vg00-root or unknown block 0,0)
Please append a correct "root=" boot option
Panic
....

Any hints on how to get the box booting again highly welcome.

Revision history for this message
Soren Hansen (soren) wrote :

On Sat, Feb 17, 2007 at 01:32:42PM -0000, jh wrote:
> root=/dev/mapper/vg00/root break=premount

You should most likely change that to /dev/mapper/vg00-root . I'm not
saying that will fix all of it, but it will definitely not work without
that change.

Cheers.

Revision history for this message
Jonathan Hudson (jh+lpd) wrote :

Apologies, my typo (as the quote from the boot log indicates).

The boot line is root=/dev/mapper/vg00-root, and the box is still broken into panic-
stricken non-bootability; I'd still be grateful for further assistance is getting it to boot again.

-jonathan

Revision history for this message
John Affleck (jraffleck) wrote :

Is there any progress on this ? I have the exact same situation, where I need to supply break=mount and lvm vgscan/lvm vgchange -a y to boot. I saw this on an edgy->feisty upgraded machine, then did a clean install of feisty elsewhere (both have boot as a 'normal' partition and lvm for everything else.

Doing a clean install of feisty on the original machine (the upgraded one that fails to boot) didn't help.
     lvm2 -> 2.02.06-2ubuntu9
     lvm-common -> 1.5.20ubuntu12
     udev -> 108-0ubuntu4
     libdevmapper1.02 -> not installed
     devmapper -> not installed

udev output attached.

Revision history for this message
John Affleck (jraffleck) wrote :

Oops.
     libdevmapper1.02 -> 2:1.02.08-1ubuntu10

Revision history for this message
Travis Tabbal (travis-tabbal) wrote :

I am also having this problem. Thanks for the bug report BTW, I was able to boot my server finally with the info here. I have to boot with break=mount then do the lvm vgscan/lvm vgchange -ay then CRTL-D and bootup goes fine. If I don't break on the LILO boot, it freezes after assembling the RAID arrays. That always seems to be the case, I haven't had any problems with degraded arrays and the like. Just LVM.

Is there somewhere in the scripts I can put the lvm commands to get this to at least boot automatically? I am not that familiar with the init scripts for the initrd.

I am also an Edgy->Feisty upgrade, AMD64 running in 32-bit mode.

Running update-initramfs does not help.

Running LILO I get this error: (I noticed it because update-initramfs runs lilo)
----------
Warning: '/proc/partitions' does not match '/dev' directory structure.
    Name change: '/dev/dm-0' -> '/dev/.static/dev/vol1/swap'
    The kernel was compiled without DEVFS, but the '/dev' directory structure
        implements the DEVFS filesystem.
----------

I don't know if it is related. LILO continues and seems to work fine otherwise.

I did an apt-get update; apt-get upgrade and everything installed fine.

Revision history for this message
Joe Fry (joe-thefrys) wrote :

Same story here.

This machine was dist-upgraded from dapper -> edgy -> feisty... everything worked fine in edgy AFAIK... I never reboot the server unless I do a kernel update, which I avoid because I need to recompile IVTV modules and all of that crap for my mythbackend.

I was able to get the machine to boot by adding the break=mount option to the kernel line during the grub boot... Pressing ctrl-C after the raid arrays are built, then doing the LVM commands mentioned above and CTRL-D to continue. I also had to remove the "savedefault" line or I would get an ERROR 15: File not found during boot (which I didn't get before).

I wasn't able to boot any of the previous kernels without breaks either... which seems odd as they booted fine before, and AFAIK nothing changed in their initramfs files.

I also noticed that some of my mounts were not made... specifically /home (XFS on VG-HomeLV, though this could be because fsck errored on this drive?) and I can't mount /boot... something to do with the UUID ("special device /dev/disk/by-uuid/f9f974b1-360a-4d97-9308-2d50f3665c21 does not exist") which seems pretty odd to me. Though I am sure I could correct that, it's strange that the UUID would not exist or have changed.

Please advise on what I should do now... I am not afraid to run Gutsy if it will solve this problem.

Thanks,

Joe

Revision history for this message
Joe Fry (joe-thefrys) wrote :

Sorry... I fixed my problem... though it was likely the same problem (symptoms and workaround were the same), I had forgotten that I had moved / to a separate disk when I ran into problems with LVM in the past... unfortunately I hadn't changed the kopt=root line in my menu.lst to reflect the new location... so everything was changed to point to /dev/mapper/vg-rootlv. Not to mention that I had left the old rootlv in place, so it still tried to boot with it.

As soon as I made the changes to my menu.lst, unsurprisingly, it booted fine... of course my / wasn't on LVM anymore. I knew I had experienced problems with / on LVM before and worked around it... I can't believe I had forgotten something so obvious!

That's the one thing I hate about linux servers... because things break so rarely, and because any smart admin knows to not futz with a working server, it's easy to forget what you have done to get the box to where it is!

Revision history for this message
Aleksandr Koltsoff (czr) wrote :

Had the same problem. Updated from edgy to feisty (was running a custom built 2.6.18 kernel, but with initramfs generated with stock edgy process). After update, system stuck on kernel messages.

Using lilo here (x86-64, for some reason had problems with grub originally and lilo just works), with LVM over MD-setup.

The fix was to append forcefully the root command line parameter to the kernel.
For some reason lilo didn't honor the root=/dev/mapper/vg0-root (notice only one zero here) at all.

Added this into the image=/vmlinuz config:
  append="root=/dev/mapper/vg0-root"

(the ran lilo obviously to update it), then it works properly.

I guess this is not directly related to all of the above problems people were experiencing, but you might want to try this as well, IFF using lilo and no other obvious culprits seem to be the problem.

Revision history for this message
John Affleck (jraffleck) wrote :

So and upgrade to Gutsy did not fix it and I still had to do break=mount with lvm vgscan && lvm vgchange -a y to be able to boot. But them I noticed that it was an lvm1 volume and converted it to lvm2 with lvm convert.. that appeared to fix it for both Gutsy and Feisty (although I can't guarantee that was what fixed it, it seems the most likely explanation).

Revision history for this message
Kevin Brock (kevin-brock) wrote :

I had this problem today when I upgraded an old Redhat 9 system. This was already using LVM for all the file systems except /boot. I wanted to keep as much of my data file systems but have a clean Ubuntu server install so just reformatted the main install locations (/ /var /tmp /usr) but left everything in lvm as this was recognized by Ubuntu installer and would requiring changing the partitioning.

When I rebooted the startup hung and eventually I was dropped into the debian busybox (first time for me to see that so I wasn't sure what was going on). Anyway, did some net search and found this bug report and some others elsewhere. Not wanting to do any major changes to the system if possible, I booted into recovery off the CD and mounted the root (and noted that recovery was able to load the LVMs fine just as the installer had been).

As given by the previous poster, I investigated vgconvert and then examined the volume meta data version (either vgdisplay, pvscan) and saw this was the old version 1. To convert to version 2 on my three volume groups, I did:

vgconvert -M2 vg01 vg02 vg03

This was the *ONLY* thing I did so I know this is what fixed the problem. I now do not need to do the manual interventions given above. I'm guessing that the boot-time volume manager in Feisty only is recognizing LVM version 2 VGs for some reason.

Revision history for this message
Gaetan Nadon (memsize) wrote :

Most likely same as bug 147216
Thanks for reporting this bug and any supporting documentation. Since this bug has enough information provided for a developer to begin work, I'm going to mark it as confirmed and let them handle it from here. Thanks for taking the time to make Ubuntu better!
BugSquad

Changed in lvm2:
status: New → Confirmed
Revision history for this message
Phillip Susi (psusi) wrote :

I'm not seeing enough information here to proceed, is this still happening to anyone with a currently supported release?

Changed in lvm2 (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for lvm2 (Ubuntu) because there has been no activity for 60 days.]

Changed in lvm2 (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.