grub2 fails to boot or install when an LVM snapshot exists
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| grub2 (Debian) |
Fix Released
|
Unknown
|
||
| grub2 (Ubuntu) |
High
|
Unassigned | ||
| Lucid |
High
|
Colin Watson |
Bug Description
SRU Justification:
Impact: When /boot and / are in an LVM VG and a snapshot is made of an LVM LV in that VG the system will not boot and grub can not be modified (updated, reinstalled) until all snapshots are removed.
Testcase:
Binary package hint: grub2
Steps to reproduce:
- (Lucid beta2 installed from CD)
- Take a snapshot of any volume
- on reboot:
error: fd0 read error.
error: no such disk.
grub rescue>
- Use the rescue cd to get a root shell
- Remove the snapshot and reboot
Now, the system boots. Create a new snapshot to repeat.
The system has 2 SATA disks in mdadm RAID1 configuration with 1 lvm volume on top and no 'normal' partitions.
Also see Comment #6 https:/
Fix: See debdiff patch
https:/
summary: |
- Disk not found when booting mdadm RAID1 with snapshotted lvm volum + Disk not found when booting mdadm RAID1 with snapshotted lvm volume |
Alvin (alvind) wrote : | #2 |
Setting to confirmed because it was easily reproduced on another server.
Changed in grub2 (Ubuntu): | |
status: | New → Confirmed |
Nigel Babu (nigelbabu) wrote : | #3 |
Setting back to New. If another independent source can confirm the bug, it would be great.
Changed in grub2 (Ubuntu): | |
importance: | Undecided → High |
status: | Confirmed → New |
dblade (listmail) wrote : | #4 |
I believe I have the same issue although I had hard locked before the reboot and thus interpreted my boot failure as "needing to reinstall the bootloader" and nothing to do with the snapshot I had made earlier. My specific filesystem is full root + boot LVM ext3. The reason I mention that specifically as I have not tested if this issue also persists with a seperate /boot.
I basically couldn't grub to (re)install and the experienced symptoms described here -> https:/
I finally found out that while a snapshot of root was active there was an extra "/dev/mapper/
The moment I killed the snapshot, deleted device.map and ran nothing more than `grub-install /dev/sda`, it completed normally, generated grub.cfg entries loading all the proper modules (raid mdraid lvm ext2) as well as populating a proper /boot/grub/
dblade (listmail) wrote : | #5 |
I did not clarify previously, but the LVM physical volume is indeed a mdraid root mirror.
# pvs
PV VG Fmt Attr PSize PFree
/dev/md0 mypv lvm2 a- 69.24g 31.24g
Bernhard Schmidt (berni) wrote : | #6 |
The installation of grub2 fails when _any_ snapshot is present. Not only on the root filesystem (or boot), not even mounted. In my system it was a snapshot of a WinXP volume used by KVM. Deleting it worked just fine.
root@lxbsc02:/# grub-install /dev/sda
/usr/sbin/
Auto-detection of a filesystem module failed.
Please specify the module with the option `--modules' explicitly.
root@lxbsc02:~# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
home wdc750g -wi-ao 40,00g
kvm-winxp wdc750g owi-a- 20,00g
kvm-winxp-snap wdc750g swi-a- 8,00g kvm-winxp 10,81
root wdc750g -wi-ao 20,00g
swap wdc750g -wi-ao 4,00g
torrent wdc750g -wi-ao 200,00g
root@lxbsc02:~# lvremove wdc750g/
Do you really want to remove active logical volume kvm-winxp-snap? [y/n]: y
Logical volume "kvm-winxp-snap" successfully removed
root@lxbsc02:~# grub-install /dev/sda
Installation finished. No error reported.
This is a clusterf*ck of a bug.
Changed in grub2 (Ubuntu): | |
status: | New → Confirmed |
summary: |
- Disk not found when booting mdadm RAID1 with snapshotted lvm volume + grub-install fails when LVM snapshot exists |
Why change the description? It also fails when grub is successfully installed.
dblade (listmail) wrote : | #8 |
Perhaps the description should become "grub2 fails to boot or install when an LVM snapshot exists"
Bernhard Schmidt (berni) wrote : | #9 |
Yes, you are absolutely right. I wanted to stress that it does not seem to be related to mdadm RAID1 as the original description suggested. Changed.
summary: |
- grub-install fails when LVM snapshot exists + grub2 fails to boot or install when an LVM snapshot exists |
Alvin (alvind) wrote : | #10 |
Except for systems using mdadm RAID1, all those I tried with LVM snapshots can boot. You're saying that taking a snapshot on /any/ Lucid system makes it unbootable?
Bernhard Schmidt (berni) wrote : | #11 |
Yes, I have had this bug on a system without any RAID (neither hardware nor mdraid) running Lucid amd64. And I can reproduce it on that particular box.
root@lxbsc02:~# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
azureus wdc750g -wi-ao 300,00g
home wdc750g -wi-ao 40,00g
kvm-winxp wdc750g -wi-a- 20,00g
root wdc750g -wi-ao 20,00g
swap wdc750g -wi-ao 4,00g
torrent wdc750g -wi-ao 200,00g
root@lxbsc02:~# lvcreate -s -L 2G -n kvm-winxp-fresh wdc750g/kvm-winxp
Logical volume "kvm-winxp-fresh" created
root@lxbsc02:~# grub-install /dev/sda
/usr/sbin/
Auto-detection of a filesystem module failed.
Please specify the module with the option `--modules' explicitly.
root@lxbsc02:~# lvremove wdc750g/
Do you really want to remove active logical volume kvm-winxp-fresh? [y/n]: y
Logical volume "kvm-winxp-fresh" successfully removed
root@lxbsc02:~# grub-install /dev/sda
Installation finished. No error reported.
Oddly enough I cannot reproduce it on my system at home, which is also running Lucid.
root@pest:~# lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
btrfs wdc -wi-a- 32,00g
home wdc -wi-ao 16,00g
karmic wdc -wi-ao 16,00g
karmicold wdc -wi-a- 16,00g
swap wdc -wi-ao 2,00g
videos wdc -wi-a- 100,00g
vm-winxp wdc -wi-a- 21,00g
root@pest:~# lvcreate -s -L 2G -n kvm-winxp-fresh wdc750g/vm-winxp
Volume group "wdc750g" not found
root@pest:~# grub-install /dev/sda
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
Installation finished. No error reported.
Yes, the VG/LV names are a bit different. I tried making the snapshot LV name very long, did not change anything.
Bernhard Schmidt (berni) wrote : | #12 |
Err that last part obviously showed that I did not create a snapshot, but it seems to work even when I do it right
root@pest:~# lvcreate -s -L 2G -n kvm-winxp-fresh wdc/vm-winxp
Logical volume "kvm-winxp-fresh" created
root@pest:~# grub-install /dev/sda
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
error: cannot open `/dev/sdb' while attempting to get disk size.
Installation finished. No error reported.
Changed in grub2 (Debian): | |
status: | Unknown → Confirmed |
Seth (bugs-sehe) wrote : | #13 |
Anything about this? Anytime I accidentally shutdown the system while having a snapshot (e.g. of my /home fs) I get a borked grub boot. _annoying_
I need to boot a rescue CD to lvremove my snapshot before I can boot again
Changed in grub2 (Debian): | |
status: | Confirmed → Fix Released |
Colin Watson (cjwatson) wrote : | #14 |
Fixed in Maverick by merging this upstream change:
grub2 (1.98+20100702-1) unstable; urgency=low
* New Bazaar snapshot.
[...]
- Skip LVM snapshots (closes: #574863).
[...]
-- Colin Watson <email address hidden> Fri, 02 Jul 2010 17:42:56 +0100
Changed in grub2 (Ubuntu): | |
status: | Confirmed → Fix Released |
this issue should be mentioned in the documentation of grub2 until the fix arrives in LTS.
Linus van Geuns (nirkus) wrote : | #16 |
Upgraded from maverick to natty and had a similar issue:
- bootfs is ext3, rootfs ext4
- both are logical volumes on top of raid1
- did snapshots of root & boot before upgrade and it worked for the first reboot
After that, grub2 just booted the menu entry, did a lot of hdd access and stoped w/o any errors or messages.
I could get the kernel & initramfs loaded by striping the menu entry down to:
insmod part_msdos, raid, lvm, ext2
root (vg-device)
kernel...
initramfs...
but mounting the rootfs failed within initramfs and it dropped to a shell.
mounting the logical volume (roots) within that shell worked.
changing the filesystem UUIDs within my snapshots of rootfs & boot didnt change anything.
after deleting both snapshots, grub2 & initramfs booted w/o any error.
Torsten Landschoff (torsten) wrote : | #17 |
While installation security updates to my lucid system, this made my system unbootable last week. I spent half an hour today to make it bootable again.
I booted from supergrubdisk which also failed to detect LVM (it usually did). I ended up using Knoppix and noticed the leftover snapshot (created via schroot) on boot, deleting it. I then tried to chroot into my Lucid system which failed because Knoppix is i386 and my Lucid install is amd64.
Rebooting with the Lucid installation medium, I was surprised that the grub installation on my hard drive magically started to work again and booted fine into my Lucid install.
I would deem this a really important problem and I am all for fixing it in Lucid given that a patch exists.
BTW: The snapshot name created by schroot is quite long, as it contains a UUID.
fred (ubuntu-launchpad-lk2) wrote : | #18 |
not having a fix for this in "LTS" after more than a year makes me sad
wasted 3 hours of my life on this earlier this year - bug is known, confirmed and fixed - why not push a new grub version for lucid?
Benaiah (dougie-hobson) wrote : | #19 |
I am also wondering why a bug fix is not being pushed to lucid. I know that with LTS versions you do the whole feature freeze thing and only release security updates and I think bug fixes. This is not a feature addition, it is a bug that needs to be fixed. Is this not possible?
The answer to "why didn't this get into Lucid?" is nobody did the SRU process:
Attached debdiff for Lucid
description: | updated |
tags: | added: testcase |
Attached debdiff for Lucid (this time using "lucid-proposed" pocket)
Attached debdiff for Lucid (this time using "lucid-proposed" pocket)
Created PPA to host patched grub-pc package:
https:/
description: | updated |
description: | updated |
Confirmed patch from PPA
https:/
fixes issue.
VM with patch via PPA installed booted despite snapshot:
nutz@lp-563895:~$ dpkg -l |grep grub
ii grub-common 1.98-1ubuntu12.
ii grub-pc 1.98-1ubuntu12.
nutz@lp-563895:~$ sudo lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv0 vg0 owi-ao 3.81g
lvol0 vg0 swi-a- 1.71g lv0 0.17
swap vg0 -wi-ao 488.00m
nutz@lp-563895:~$ df -h /boot
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-lv0 3.9G 901M 3.0G 24% /
No one has responded to my effort to convert this bug report into an SRU request so I did the only thing I know, I opened LP: #888069
I applied the patched grub-pc and grub-common packages from my PPA
https:/
to an HP DL-165 G5 today. Prior to applying the patches the system was unbootable.
One thing that I did have to do was:
1. aptitude purge grub-pc grub-common
2. cd /boot/grub
3. rm -r *
4. cd -
Then install the patched packages. For some reason the APT "purge" command leaves many files under /boot/grub
Changed in grub2 (Ubuntu Lucid): | |
milestone: | none → ubuntu-10.04.4 |
Launchpad Janitor (janitor) wrote : | #29 |
Status changed to 'Confirmed' because the bug affects multiple users.
Changed in grub2 (Ubuntu Lucid): | |
status: | New → Confirmed |
I will be on vacation through Jan 5, 2012. Please do not ask for testing until after that date, thanks.
Changed in grub2 (Ubuntu Lucid): | |
status: | Confirmed → Triaged |
importance: | Undecided → High |
assignee: | nobody → Colin Watson (cjwatson) |
Hello Alvin, or anyone else affected,
Accepted grub2 into lucid-proposed, the package will build now and be available in a few hours. Please test and give feedback here. See https:/
Changed in grub2 (Ubuntu Lucid): | |
status: | Triaged → Fix Committed |
tags: | added: verification-needed |
Interesting dates in the life of LP: #563895
March 21, 2010 - Reported as Debian #574863
April 4, 2010 (elapsed 14 days) Reported as LP: #563895
June 2, 2010 (elapsed 73 days) Fix committed into Debian and Debian bug closed.
July 5, 2010 (elapsed 106 days) Fix committed in Ubuntu 10.10 (Maverick)
September 10, 2011 (538 days) Complaint (not by me) in LP #563895 using the words "not having a fix for this in "LTS" after more than a year makes me sad"
November 8, 2011 (597 days) I convert bug into SRU request with debdiff of patch and PPA with patched grub2 packages.
November 14, 2011 (604 days) by submitting this question I am able to get someone to acknowledge that this bug needs work.
January 20, 2012 (670 days) day which I, the only person on Earth who is going to know enough to do the testing, is referred to by Marvin Pitt as "anyone else affected".
Colin Watson (cjwatson) wrote : | #33 |
@nutznboltz: Thanks for your patch, and sorry I took so long to deal with it. It needed to be reformatted into a patch with proper headers and attribution in debian/patches/, to match the rest of the package. Since this has been waiting so long, I just went ahead and did this rather than walking you through it, but I've left your name in the changelog.
And yes, it did take a long time. That's because we have more work than it's humanly possible to do. I'm not sure that recriminations are productive at this point?
Colin Watson (cjwatson) wrote : | #34 |
(Also, I think I can probably do the testing if necessary, so I think you're exaggerating about "only person on Earth", not to mention that Martin's comment was generated by the sru-accept.py script rather than written by hand; but given said inhuman amounts of work it would probably stand a better chance of happening quickly if somebody else did it.)
Sorry, Colin, it's not about you. I'll explain later if I get the chance.
The following non-developers have made comments in this bug indicating they understand at least a little bit about this problem:
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
* https:/
Why aren't they testing now? What could be wrong?
> Why aren't they testing now? What could be wrong?
Wait! Maybe it's that they are all stupid! Yeah, that's it they're too stupid to test. Don't worry stupid people, I'll do your testing for you.
nutz@lp-563895:~$ apt-cache policy grub-pc
grub-pc:
Installed: 1.98-1ubuntu13
Candidate: 1.98-1ubuntu13
Version table:
*** 1.98-1ubuntu13 0
100 /var/lib/
1.98-1ubuntu12 0
900 http://
1.98-1ubuntu5 0
500 http://
ksta@lp-563895:~$ apt-cache policy grub-pc grub-common
grub-pc:
Installed: 1.98-1ubuntu13
Candidate: 1.98-1ubuntu13
Version table:
*** 1.98-1ubuntu13 0
100 /var/lib/
1.98-1ubuntu12 0
900 http://
1.98-1ubuntu5 0
500 http://
grub-common:
Installed: 1.98-1ubuntu13
Candidate: 1.98-1ubuntu13
Version table:
*** 1.98-1ubuntu13 0
100 /var/lib/
1.98-1ubuntu12 0
900 http://
1.98-1ubuntu5 0
500 http://
nutz@lp-563895:~$ sudo grub-install /dev/vda
Installation finished. No error reported.
nutz@lp-563895:~$ lsb_release -ds ; uname -a
Ubuntu 10.04.3 LTS
Linux lp-563895 2.6.32-37-server #81-Ubuntu SMP Fri Dec 2 20:49:12 UTC 2011 x86_64 GNU/Linux
nutz@lp-563895:~$ sudo lvcreate -s -l 437 /dev/vg0/lv0
Logical volume "lvol0" created
nutz@lp-563895:~$ sudo vgs
VG #PV #LV #SN Attr VSize VFree
vg0 1 3 1 wz--n- 6.00g 0
nutz@lp-563895:~$ sudo lvs
LV VG Attr LSize Origin Snap% Move Log Copy% Convert
lv0 vg0 owi-ao 3.81g
lvol0 vg0 swi-a- 1.71g lv0 0.00
swap vg0 -wi-ao 488.00m
nutz@lp-563895:~$ df -h / /boot
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/vg0-lv0 3.9G 955M 2.9G 25% /
/dev/mapper/vg0-lv0 3.9G 955M 2.9G 25% /
nutz@lp-563895:~$ sudo reboot
The system rebooted with the snapshot. Everything works. Thanks.
tags: |
added: verification-done removed: verification-needed |
dblade (listmail) wrote : | #40 |
nutznboltz please post information that pertains to the bug. All the extra stuff you are tossing in serves no purpose.
Thanks.
@dblade it's the price that you pay for not doing the SRU work yourself. The reward for being the one to do the work is that you don't have to listen to the ones who are doing it for you. Why not spend some time learning how to do the work?
I have some information for you:
http://
Are you an Ubuntu Linux Bug Fool? http://
dblade (listmail) wrote : | #43 |
Most people applied the fixed package manually and moved on over a year ago. Are you really surprised?
I'm not here to debate whether or not is right or wrong to report a bug, or provide info regarding a bug, and then not be involved every step of the way. People have the right to contribute or not contribute as they see fit. Your attitude is not going to affect this fact in a positive way.
All you have really achieved here is made pull my name off the CC list for this bug. I did it because the latest updates you've made are the equivalent of spam.
Maybe you need to take a step back and realize how little control you actually have on this matter.
Torsten Landschoff (torsten) wrote : | #44 |
I actually applied the fix manually but did not set the fixed package on hold as I had the hope that any update to the grub package in lucid would fix this.
Yesterday my system crashed while working with schroot and snapshots and is now unbootable again :-) So I am interested in a fix once again and hope it goes into Lucid. I will try and install the new proposed fix.
Torsten Landschoff (torsten) wrote : | #45 |
Okay, I just installed the version from lucid-proposed:
torsten@sharokan:~$ dpkg -l|grep grub
ii grub-common 1.98-1ubuntu13 GRand Unified Bootloader, version 2 (common files)
ii grub-pc 1.98-1ubuntu13 GRand Unified Bootloader, version 2 (PC/BIOS version)
I had to run
# grub-install /dev/sda
to actually update the grub installation (it would be nice if installing the new package would do that automatically, but this is of course a risk wrt. possible regressions).
I created a snapshot again and low and behold: I can boot just fine even with existing LVM snapshots.
Thanks! I am all for moving this to lucid-updates.
Colin Watson (cjwatson) wrote : Re: [Bug 563895] Re: grub2 fails to boot or install when an LVM snapshot exists | #46 |
On Tue, Jan 24, 2012 at 10:02:37AM -0000, Torsten Landschoff wrote:
> I had to run
>
> # grub-install /dev/sda
>
> to actually update the grub installation
Run 'dpkg-reconfigure grub-pc' to set this up to run automatically on
future upgrades.
@Colin: that will not work for /dev/vda on Ubuntu 10.04.
Oh, right after I wrote that I remembered LP: #623609
Does 1.98-1ubuntu13 fix that?
Colin Watson (cjwatson) wrote : | #49 |
Torsten asked for /dev/sda, not /dev/vda, so that is moot anyway.
Yes, as indicated in the changelog, 1.98-1ubuntu13 should also fix bug
623609.
Launchpad Janitor (janitor) wrote : | #50 |
This bug was fixed in the package grub2 - 1.98-1ubuntu13
---------------
grub2 (1.98-1ubuntu13) lucid-proposed; urgency=low
[ Colin Watson ]
* Handle partition devices without corresponding disk devices
(LP: #623609).
[ Ken Stailey ]
* Backport upstream patch to skip LVM snapshots (LP: #563895).
-- Colin Watson <email address hidden> Fri, 20 Jan 2012 12:08:36 +0000
Changed in grub2 (Ubuntu Lucid): | |
status: | Fix Committed → Fix Released |
Thanks Colin Watson, you do such great work!
I tried to use supergrubdisk to debug this, but I'm having difficulty to get logs. It's just too much for a serial console.
Using supergrubdisk:
- insmod raid
- insmod lvm
- detect OS.
No OS will be detected