New 12.04 beta Kernel doesn't support booting from dmraid

Bug #948291 reported by Paul Hannah
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Stefan Bader

Bug Description

After upgrading my 11.10-alternative (due to the need for dmraid) install, the boot process stalls at the:

ubuntu
. . . . .

page and eventually kicks me out to a partial shell.

The issue seems to be an inability to see the dmraid root partition.

I've managed to keep using the system by choosing one of the older kernels.

I've tried dpkg-reconfigure dmraid and remove/reinstall dmraid, but that didn't seem to help.
---
AlsaVersion: Advanced Linux Sound Architecture Driver Version 1.0.24.
ApportVersion: 1.94-0ubuntu2
Architecture: i386
ArecordDevices:
 **** List of CAPTURE Hardware Devices ****
 card 0: PCH [HDA Intel PCH], device 0: ALC275 Analog [ALC275 Analog]
   Subdevices: 1/1
   Subdevice #0: subdevice #0
AudioDevicesInUse:
 USER PID ACCESS COMMAND
 /dev/snd/controlC0: pkh 3110 F.... pulseaudio
Card0.Amixer.info:
 Card hw:0 'PCH'/'HDA Intel PCH at 0xc9400000 irq 50'
   Mixer name : 'Intel CougarPoint HDMI'
   Components : 'HDA:10ec0275,104d5000,00100005 HDA:80862805,104d5000,00100000'
   Controls : 21
   Simple ctrls : 10
DistroRelease: Ubuntu 12.04
HibernationDevice: RESUME=UUID=da251fe8-aa2a-45a9-87c6-9a52435c7ead
InstallationMedia: Ubuntu 11.10 "Oneiric Ocelot" - Release i386 (20111011)
MachineType: Sony Corporation VPCSA36GG
Package: linux (not installed)
ProcFB: 0 inteldrmfb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.0.0-16-generic root=UUID=cb969f9f-14b2-4373-a271-c859de36f7c8 ro acpi=force quiet splash vt.handoff=7
ProcVersionSignature: Ubuntu 3.0.0-16.29-generic 3.0.20
RelatedPackageVersions:
 linux-restricted-modules-3.0.0-16-generic N/A
 linux-backports-modules-3.0.0-16-generic N/A
 linux-firmware 1.71
StagingDrivers: rts_pstor mei
Tags: precise staging
Uname: Linux 3.0.0-16-generic i686
UpgradeStatus: Upgraded to precise on 2012-03-04 (2 days ago)
UserGroups: adm admin cdrom dialout lpadmin plugdev sambashare users vboxusers
dmi.bios.date: 08/02/2011
dmi.bios.vendor: INSYDE
dmi.bios.version: R2080H4
dmi.board.asset.tag: N/A
dmi.board.name: VAIO
dmi.board.vendor: Sony Corporation
dmi.board.version: N/A
dmi.chassis.asset.tag: N/A
dmi.chassis.type: 10
dmi.chassis.vendor: Sony Corporation
dmi.chassis.version: N/A
dmi.modalias: dmi:bvnINSYDE:bvrR2080H4:bd08/02/2011:svnSonyCorporation:pnVPCSA36GG:pvrC609KUAN:rvnSonyCorporation:rnVAIO:rvrN/A:cvnSonyCorporation:ct10:cvrN/A:
dmi.product.name: VPCSA36GG
dmi.product.version: C609KUAN
dmi.sys.vendor: Sony Corporation

Revision history for this message
Paul Hannah (pkhannah) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 948291

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: precise
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
Paul Hannah (pkhannah) wrote : AcpiTables.txt

apport information

tags: added: apport-collected staging
description: updated
Revision history for this message
Paul Hannah (pkhannah) wrote : AlsaDevices.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : AplayDevices.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : BootDmesg.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : CRDA.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : Card0.Amixer.values.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : Card0.Codecs.codec.0.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : Card0.Codecs.codec.3.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : CurrentDmesg.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : IwConfig.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : Lspci.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : Lsusb.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : PciMultimedia.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : ProcCpuinfo.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : ProcEnviron.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : ProcInterrupts.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : ProcModules.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : PulseSinks.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : PulseSources.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : RfKill.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : UdevDb.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : UdevLog.txt

apport information

Revision history for this message
Paul Hannah (pkhannah) wrote : WifiSyslog.txt

apport information

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Brad Figg (brad-figg) wrote : Test with newer development kernel (3.2.0-18.28)

Thank you for taking the time to file a bug report on this issue.

However, given the number of bugs that the Kernel Team receives during any development cycle it is impossible for us to review them all. Therefore, we occasionally resort to using automated bots to request further testing. This is such a request.

We have noted that there is a newer version of the development kernel than the one you last tested when this issue was found. Please test again with the newer kernel and indicate in the bug if this issue still exists or not.

You can update to the latest development kernel by simply running the following commands in a terminal window:

    sudo apt-get update
    sudo apt-get upgrade

If the bug still exists, change the bug status from Incomplete to Confirmed. If the bug no longer exists, change the bug status from Incomplete to Fix Released.

If you want this bot to quit automatically requesting kernel tests, add a tag named: bot-stop-nagging.

 Thank you for your help, we really do appreciate it.

Changed in linux (Ubuntu):
status: Confirmed → Incomplete
tags: added: kernel-request-3.2.0-18.28
Revision history for this message
Paul Hannah (pkhannah) wrote :

I just did a full update, re-ran dpkg-reconfigure dmrain (just to be sure) and rebooted.

Same problem, ended up in busybox with the kernel complaining it couldn't find the device.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Stefan Bader (smb) wrote :

This really sounds like some problem related to the construction of the initrd. But it would be good to have some more information. One problem is that it is that I cannot really see how the exact setup is. I would suspect sda contains /boot (the partition table looks suspicious there, too), sd[bcd] probably make up the RAID?
First thing to try is to boot the new kernel but from the grub menu remove "quiet slpash" and replace it with "debug". Then when dropping into busybox check wether the dmraid-45 module is loaded, whether there is a dmraid-activate and whether it maybe is possible to manually progress by calling that and then vgchange -ay and should that bring up the root vg, exit the busybox shell.

Changed in linux (Ubuntu):
assignee: nobody → Stefan Bader (stefan-bader-canonical)
Revision history for this message
Paul Hannah (pkhannah) wrote :

Stephan,

I'm guessing the same thing re initrd. The system is a vaio vpcsa which has hardwired onboard 4x64GB ssd. So there's only the raid, nothing else.

I don't know how to copy/paste out of busybox, so I've taken a couple of phone photos -- hopefully they're clear enough.

The upshot is that dmraid is loading (thought it's the last thing it does...) In busybox, I can (once I worked out how) activate and mount the raid, with one weird message that probably is completely unrelated.

As for vgchange, I couldn't work out what was needed there -- vgchange doesn't seem to be a valid command in the busybox shell.

I'll have another go if you can give me the baby-steps post-mount.

Revision history for this message
Paul Hannah (pkhannah) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Paul, ah ok, so the RAID setup is just a JBOD one (basically glueing together the individual parts of the SSD). That explains the boot messages (because the partition table was created over the length of the RAID). And just a word of warning: at least in my experience grub in the past had no concept of RAID setups. So it was essential to have at least /boot on a partition that is accessible natively. In your case this just might about work since the first partition covers the whole first disk. But it also continues after and I am not sure that the files in /boot always will remain in the areas of the first disk. Must admit that grub2 may be more capable than I think. This limitations are from a bit back in time. But at least being cautious here cannot go wrong. ;)

But for the current problem: the simplest way to gather output in such cases is by using a usb stick which you can mount in the busybox shell and then redirect output there.

The rename error is weird. Even more since your later ls shows the node is correctly named. About the vgchange command. I think that was just an incorrect assumption from my side. I was just assuming you would be using lvm volumes on top of the RAID (which I was guessing to be in some other mode like striping, mirror or raid5). But it looks like Volume0p1 could just be your / and Volume0p5 be your swap partition (note "dmsetup ls" is another nice way to list currently available device mapper volumes). So maybe you were just about there and exit would have completed the boot.

If that works the next step would be to have --debug on the grub command line which enables debugging output of the upstart system. That should show up in /var/log/boot.log and/or /var/log/syslog. Maybe that will show what is going wrong. Oh, in the busybox, maybe have a look at scripts/init-premount, there should be a dmraid file.

Revision history for this message
Paul Hannah (pkhannah) wrote :

Stefan,

You were right -- tried again and exit from busybox after 'drmaid -ay' successfully completed the boot.

As far as the raid setup, essentially that's the end result of throwing in the 11.10 alternative cd in the machine the day it arrived with windows installed and following the prompts.

So, not sure what to do next. I had a look in /dev/log/boot.log, and there really doesn't appear to be anything that looks like it's going to be useful.

If this is just a case of 'you shouldn't have done it that way', then I'm happy for you to just mark it 'won't fix' or 'invalid' -- on the other hand, if this is something that others will hit, I'm happy to continue working with you to find the cause and get a fix made.

I've attached my syslog, maybe there's something useful in there.

Revision history for this message
Stefan Bader (smb) wrote :

Paul,

have not yet looked at the syslog but wanted to give an answer right away. First, I think the potentially dangerous layout and your boot not completing are different issues. Whenever the layout becomes an issue I would suspect you would not even get as far as busybox without booting from a rescue system. But I could also be wrong and grub2 maybe has no problems there. But then this should be tracked in a different bug and be used to make the installer aware of this pitfall.

Anyway, for the current problem, we should try to find out why apparently the dmraid activation is not done before trying to mount the rootfs. Thinking about it, maybe it is related to the unusual setup but the old release did handle it correctly, then an upgrade should not break that.

Revision history for this message
Paul Hannah (pkhannah) wrote :

Stefan,

Bot sure it makes a difference, but just to confirm that the dmraid setup is in fact a 4-disk raid0 striped set.

Revision history for this message
Stefan Bader (smb) wrote :

Hm, ok. So its more likely that grub2 either is more capable than I believe or gets presented a BIOS emulated device and all is magically ok for that stage. So we should not worry too much about that part.

The syslog is a bit confusing with respect of timing. It is probably normal because there is one point in time where messages from early boot are merged into what takes place after / is mounted. I think this is from about before dropping into busybox:

Mar 8 19:11:52 VPCSA dmraid-activate: ERROR: Cannot retrieve RAID set information for isw_cibcjijfgb_Volume0
Mar 8 19:11:52 dmraid-activate: last message repeated 2 times
Mar 8 19:11:52 VPCSA failsafe: Failsafe of 120 seconds reached.
Mar 8 19:11:52 VPCSA dmraid-activate: ERROR: Cannot retrieve RAID set information for isw_cibcjijfgb_Volume0
Mar 8 19:11:52 VPCSA laptop-mode: Laptop mode
Mar 8 19:11:52 VPCSA laptop-mode: disabled, not active

So for some reason reading the RAID meta-data fails. But I am not sure about the timing. It all seems to be going on at the same time. Just I am not sure this is right.
The dmraid script in scripts/init-premount seems to call "dmraid -r -c" and then dmraid-activate $dev for each of the devices printed. Maybe you could try that in the busybox to see whether this acts differently by the time to drop there. If that works too, then the question is why those failures happen in the normal boot...

Revision history for this message
Paul Hannah (pkhannah) wrote :

dmraid -r -c produces

/dev/sdd
/dev/sdc
/dev/sdb
/dev/sda

dmraid-activate does absolutely nothing. when called with /dev/sd? it returns to the prompt with no obvious result, and in fact with any text ('-h', sadfkljadkfah, ...) it does the same.

Without parameter it does complain.

Not sure what I should be seeing from this, but /dev/mapper definitely only contains control afterwards.

Interestingly, dmraid -ay after that process doesn't mention the problem with the rename from earlier.

And exit from busybox again results in successful boot.

Revision history for this message
Stefan Bader (smb) wrote :

Looking at /sbin/dmraid-activate (in the installed system)... There seems to be a mismatch of expectations. dmraid -r -c prints the full path /dev/sda and so on. But dmraid-activate seems to only expect sda and otherwise exits. I need to check whether there have been recent changes to the userspace parts. I strongly suspect that the old kernel only works because the initramfs has not been re-created yet. So that still contains the older versions...

Revision history for this message
Stefan Bader (smb) wrote :

Could you try the following change to /usr/share/initramfs-tools/scripts/init-premount/dmraid after manually having booted the 12.04 kernel:

- dmraid-activate $dev
+dmraid-activate $(basename $dev)

Then run

update-initramfs -u (this regenerates the initrd for the currently running! kernel)

Then reboot the 12.04 kernel and see whether it works now without manual intervention?

Revision history for this message
Paul Hannah (pkhannah) wrote :

Stefan,

That file doesn't exist on my machine (but that folder exists and contains only brltty.)

pkh@VPCSA:/usr/share/initramfs-tools$ find . -name dmraid
./scripts/local-top/dmraid
./hooks/dmraid

The dmraid in local-top seems to be equivalent:

pkh@VPCSA:/usr/share/initramfs-tools$ grep dmraid-activate ./scripts/local-top/dmraid ./hooks/dmraid
./scripts/local-top/dmraid: dmraid-activate $dev
./hooks/dmraid: copy_exec /sbin/dmraid-activate sbin

So I made the change there, and afterwards exactly the same behaviour. syslog attached again in case there's something to see there.

N.B. The dmraid rename issue is completely unrelated -- it didn't happen this time either -- it'll be related to one of the string of semi-random dmraid commands I made to get the raid booted that first time.

Revision history for this message
Stefan Bader (smb) wrote :

Oh that sounds a bit as if I might still have remainders of some fiddling around I did for either Oneiric or earlier... :/ Ok, need to see how that looks on a fresh install and figure out how to set it right. local-top sounds to me like something done after local fs are mounted and that would be too late...

Revision history for this message
Stefan Bader (smb) wrote :

Ok, so scripts/local-top/dmraid is actually executed even sooner (right before trying to find the root fs). So it should be the right place. If possible, I would prefer to have a look at /var/log/boot.log instead of syslog. That seems to contain the messages that are interesting in a clearer form.

The messages about "ERROR: Cannot retrieve RAID set information for..." I realize are bogus in most cases. The command used to find info uses -si which limits information to only inactive sets and at least in my case the first call to dmraid-activate with one of the raid disks as an argument will bring up the array already. So subsequent calls will find the same Group already active. But that just as a side note.

Revision history for this message
Paul Hannah (pkhannah) wrote :
Revision history for this message
Stefan Bader (smb) wrote :

Darn, seems that going through the manual stage to boot does destroy (or at least not propagate) the information that usually is there. Seems to basically start after the raid becomes available and gets mounted. :/

Paul, maybe you can find something when you are in the busybox shell. I would hope somewhere in /var/log then. You could temporarily mount your / somewhere and copy files over then unmount and exit to finish the boot. But while you are there, also check whether in the initramfs you have the modified /scripts/local-top/dmraid (the one with basename). Is the raid discovered if you manually start that script?

Revision history for this message
Paul Hannah (pkhannah) wrote :

Stefan,

Just to let you know I'll get back to this next week -- had to be called away with work.

Revision history for this message
Paul Hannah (pkhannah) wrote :

Stefan,

Looks like it's being dealt with by the right people.

Thanks for your time,
Paul.

To post a comment you must log in.