system fails to boot without ramdisk and incorrect / in fstab

Bug #509841 reported by Scott Moser on 2010-01-19
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
mountall (Ubuntu)
Low
Unassigned

Bug Description

Binary package hint: upstart

I'm trying to boot a system without a ramdisk. bug 503212 was fixed, so we no longer have mountall sigsegv, but we still fail to boot.

I'm attaching boot logs with '--debug', and then will give some comments on the differences.

ProblemType: Bug
Architecture: amd64
Date: Tue Jan 19 20:56:43 2010
DistroRelease: Ubuntu 10.04
Package: upstart 0.6.3-11
ProcEnviron: SHELL=/bin/bash
ProcVersionSignature: User Name 2.6.32-10.14-server
SourcePackage: upstart
Tags: lucid
Uname: Linux 2.6.32-10-server x86_64

Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :
Scott Moser (smoser) wrote :

I filtered the 2 attached files above through a sed, sort, and uniq -u to find which lines were only in each log.

The output is attached here. Obviously, the boot with ramdisk has lots of messages as it gets further along.
The things that I thought were interesting in this attachment were:

a.) only the noramdisk boot has messages with:
   network-interface (eth0)
 and
   network-interface (lo)

  I don't know why these would not be present in the ramdisk log.

b.) The 'filesystem' and 'local-filesystem' boot event is not ever emitted from mountall in the noramdisk boot.
   most of the other messages in successful boot depend on local-filesystem or filesystem

Scott Moser (smoser) wrote :

Just for the record, the logs are from the uec-image (20100118) booted in kvm . In the attached logs, I have disabled ec2-init jobs (/etc/init/{ec2init,cloud}-*.conf) but since verified that they are not at fault, and booted with unmodified image.

I did this with;
$ tar -Sxvzf lucid-server-uec-amd64.tar.gz
$ mkdir test; cd test
$ qemu-img create -f qcow2 -b \
  ../lucid-server-uec-amd64.img lucid-server-uec-amd64.img
$ ln -s ../lucid*-virtual .
$ kvm -m 512 -echr 5 -serial stdio -nographic \
  -drive file=lucid-server-uec-amd64.img,if=ide \
  -kernel lucid-server-uec-amd64-vmlinuz-virtual \
  -initrd lucid-server-uec-amd64-initrd-virtual \
  -append "root=/dev/sda console=ttyS0 --debug" |
  tee boot.ramdisk.log

The only difference for without ramdisk wsa leaving out the '-initrd <arg>'

Scott Moser (smoser) wrote :

Cleaned "only" files.
produced by 'go no-ramdisk.log ramdisk.log' where 'go' has:
----
#!/bin/sh

clean(){ sed 's,^M,,g; s,([0-9]*),,g; s,process [0-9]*,process,g; /^[[]/d' | sort | uniq -u }
cat ${1} ${2} ${2} | clean > only.${1##*/}
cat ${2} ${1} ${1} | clean > only.${2##*/}
----

These logs were collected from boots as described in comment 5

Scott Moser (smoser) wrote :

Steve asked for /etc/init from the system, here is /etc for completeness.

Steve Langasek (vorlon) wrote :

The problem here is that mountall never finishes in the non-initramfs case. How straightforward would it be for you to edit /etc/init/mountall to run mountall with --debug?

Steve Langasek (vorlon) wrote :

Notably, with the initramfs, the mounted-tmp job triggers and it doesn't without. What is mounted on /tmp at the end of boot in the with-initramfs case?

Changed in upstart (Ubuntu):
assignee: nobody → Steve Langasek (vorlon)
Steve Langasek (vorlon) wrote :

Diagnosed to a mountall issue involving the rootfs; moving to correct package and unassigning myself. Scott to follow up with debug console output.

affects: upstart (Ubuntu) → mountall (Ubuntu)
Changed in mountall (Ubuntu):
assignee: Steve Langasek (vorlon) → nobody
Scott Moser (smoser) wrote :

An update here. In the above logs, the kernel command line for ramdisk boot was 'root=LABEL=uec-rootfs'. For ramdisk-less boot, it was 'root=/dev/sda' . (I have verified that initramfs will successfully boot even if 'root=/dev/sda' is given.)

Inside the image, /etc/fstab contained:
    /dev/sda1 / ext3 defaults 0 0

After adding '--debug' to the mountall.conf invocation of mountall, we saw that it was waiting on /:
| boredom_timeout: Waiting for /
| boredom_timeout: Could not prompt user; no action
| boredom_timeout: Waiting for /tmp
| ...
| try_mount: / waiting for device

So, mountall was waiting for /dev/sda1, which would never appear. I fixed /etc/fstab in the image and tested that an entry of 'LABEL=uec-rootfs' or 'sda' there resulted in successful boot. It was like this because the image is made for ec2 or UEC, where the filesystem image is put onto a partition first. In kvm testing I boot it directly.

Interestingly, if I boot with ramdisk and 'root=/dev/sda' with '/dev/sda1' in /etc/fstab, it does successfully boot. So, the initramfs is doing something that ends up making mountall happy.

I think theres still an issue here, though, as '/' was correctly mounted by the kernel. waiting for the entry for '/' in /etc/fstab could often result in timeout. This could be the case if the device name changed between kernels, or possibly some other scenario.

I also don't know if there is a foolproof way to get the device that a mount point is on.

Scott Moser (smoser) wrote :

I set this to low priority, because it will not affect the UEC images when booted under ec2 or UEC even without ramdisk as /etc/fstab will be correct in those cases.

If you believe that this might be higher priority due to other cases hitting it, please modify accordingly.

summary: - system fails to boot without ramdisk
+ system fails to boot without ramdisk and incorrect / in fstab
Changed in mountall (Ubuntu):
importance: Undecided → Low
Yves Lavoie (yves-lavoie-ing) wrote :

I am experimenting the same problem on:

ProblemType: Bug
Architecture: i686
DistroRelease: Ubuntu 10.04
Package: upstart 0.6.3-11
ProcEnviron: SHELL=/bin/bash
ProcVersionSignature: User Name 2.6.33-rc4
SourcePackage: upstart
Tags: lucid
Uname: Linux 2.6.33-rc4 i686

I am not using ramdisk and the system fails to boot with root=/dev/hda6
I was able to boot with mountall 1.0
mountall 2.4 waits forever, after showing a few event failures.

Yves Lavoie (yves-lavoie-ing) wrote :

Diagnosed to waiting for /usr and swap (respectively /dev/hda5 and /dev/hda7). Switching to UUID doesn't do any good.

Yves Lavoie (yves-lavoie-ing) wrote :

This bug prevents booting and last updates to libc6 prevent going back to mountall 1.0 so the PC doesn't boot anymore.
It suggest to consider a higher priority for this bug.

Yves Lavoie (yves-lavoie-ing) wrote :

Booting with root rw allows mountall to succeed with /.
Somehow, when root is ro, mountall fails to detect that it must be remounted rw for the system to operate properly.

Scott Moser (smoser) wrote :

Yves, does your custom kernel have
CONFIG_DEVTMPFS=y ?

It appears this might be a requirement. It would explain working fine with rw / (as /dev would be writable) but not without.

Yves Lavoie (yves-lavoie-ing) wrote :

Nope, it wasn't set. I'll fix it and will report back.
Meanwhile, if it is mandatory, I suggest warning the user upon installation that some requierements are not met.
Thanks.

Yves Lavoie (yves-lavoie-ing) wrote :

It works! CONFIG_DEVTMPFS=y solved the case, without the automount feature.
Adding the automount resulted in immediate crash while mounting drives.
Thanks for the help, it was very welcomed.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers