Hang while booting

Bug #32597 reported by rsidd
28
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned

Bug Description

udevplug is not waiting for devices, the hang seems to be actually in the kernel after loading a particular driver. All the drivers loaded and population of /sys can be obtained from the udev.logs attached to this bug.

Given what the submitter has says, I'm pretty convinced this isn't a udev bug.

This has been reported earlier, but those bug reports are closed: I posted to one of them (28439) without response, so am opening a new one. It is still true with today's ubuntu (kernel 2.6.15-16).

 have a hang in "detecting hardware", after which
(sometimes) the system locks solid
(othertimes) the system continues to boot after a while, but the touchpad doesn't work.

breezy worked fine on this system. I saw this around Jan 25, but it continues to be true after a dist-upgrade today (kernel 2.6.15-15-686, udev 079-0ubuntu14).

I tried the suggestions in bug 28439, and the answers are:
on booting with init=/bin/bash,
contents of /dev/.udev : db failed (no queue)

/dev is tmpfs, fully populated (afaict)

last line of udevplug -s -v:
/sys/devices/pci0000:00

contents of "failed":
lrwxrwxrwx 1 root root 36 Feb 8 13:38 devices@pci0000:00@0000:00:00.0 -> /sys/devices/pci0000:00/0000:00:00.0
lrwxrwxrwx 1 root root 36 Feb 8 13:38 devices@pci0000:00@0000:00:1e.0 -> /sys/devices/pci0000:00/0000:00:1e.0
lrwxrwxrwx 1 root root 49 Feb 8 13:38 devices@pci0000:00@0000:00:1e.0@0000:01:09.0 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:01:09.0
lrwxrwxrwx 1 root root 49 Feb 8 13:38 devices@pci0000:00@0000:00:1e.0@0000:01:09.2 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:01:09.2
lrwxrwxrwx 1 root root 49 Feb 8 13:38 devices@pci0000:00@0000:00:1e.0@0000:01:09.3 -> /sys/devices/pci0000:00/0000:00:1e.0/0000:01:09.3
lrwxrwxrwx 1 root root 36 Feb 8 13:38 devices@pci0000:00@0000:00:1f.0 -> /sys/devices/pci0000:00/0000:00:1f.0
lrwxrwxrwx 1 root root 36 Feb 8 13:38 devices@pci0000:00@0000:00:1f.3 -> /sys/devices/pci0000:00/0000:00:1f.3

These seem to be the PCI bridge and ISA bridge:
0000:00:1e.0 PCI bridge: Intel Corporation 82801 Mobile PCI Bridge (rev d3) (prog-if 01 [Subtractive decode])
        Flags: bus master, fast devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=05, sec-latency=216
        I/O behind bridge: 00003000-00003fff
        Memory behind bridge: b0100000-b01fffff
        Prefetchable memory behind bridge: 0000000020000000-0000000021f00000
        Capabilities: <available only to root>

0000:00:1f.0 ISA bridge: Intel Corporation 82801FBM (ICH6M) LPC Interface Bridge (rev 03)
        Subsystem: Hewlett-Packard Company: Unknown device 3080
        Flags: bus master, medium devsel, latency 0

The above is booting with "pci=assign-busses" (otherwise my PCMCIA doesn't work). But the udevplug hang happens even without this option.

Any ideas?

One more observation: I did the following which seems to fix things (mostly -- it still happens sometimes but not so often)

(1) comment out the udevplug lines from /etc/init.d/udev
(2) Create /etc/init.d/udevplug with *just* the udevplug lines in "start)" and nothing in the other sections
(3) Symlink this as /etc/rcS.d/S99udevplug (so it's the last thing to start in runlevel 1)

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Firstly it's worth pointing out that if you've fully upgraded, you won't see a "Detecting hardware" message anymore; instead you should see "Loading drivers", is that true?

Please attach /var/log/udev

Changed in udev:
assignee: nobody → keybuk
status: Unconfirmed → Needs Info
Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Obviously you'll need to undo your changes.

Revision history for this message
rsidd (rsidd) wrote :

Well, it now says "Loading hardware drivers..."

It took seven attempts to boot after undoing my changes (with my changes it seems to hang about once in ten times). /var/log/udev (with undone changes) is at
http://rsidd.online.fr/udevlog.gz

Now I'm redoing the changes...

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

You mis-interpreted me.

I need the copy of /var/log/udev from the boot where it "hangs", not one that works.

It isn't hanging, it will timeout after 3 minutes and carry on.

Revision history for this message
rsidd (rsidd) wrote :

Ah. Here you go
http://rsidd.online.fr/udevlog2.gz

By the way, sometimes it times out after 3 minutes but sometimes it hangs solid -- caps lock LEDs, etc stop working and I have to hard-reboot. Also, sometimes it times out and continues booting but the touchpad doesn't work. In the above log, it timed out and booted, and the touchpad still worked.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Ok, this shows:

1) all events were processed by udev; there are no UEVENTs for which there are no UDEVs

2) all events are processed pretty much as and when received from the kernel

3) entire processing of kernel events takes 23 seconds

4) there's a delay in processing block devices (fairly normal)

Nothing here is hanging.

After a boot when it times out, what's in /dev/.udev/queue ?

This is looking very much like a kernel driver bug somewhere.

Revision history for this message
rsidd (rsidd) wrote :

Will let you know in a day or so.

In the earlier test (init=/bin/bash) there was no /dev/.udev/queue ...

Revision history for this message
rsidd (rsidd) wrote :

Ok there's no queue in /dev/.udev (the directory doesn't exist)...

With this boot the touchpad failed to work, so for good measure here's /var/log/udev again:
http://rsidd.online.fr/udevlog3.gz

Another datapoint: while it's hanging the hard drive LED stays on. Even if the computer locks up completely, it stays on.

Any ideas? If it's a kernel driver bug, why does it happen much less often with my delayed-udevplug hack? Are there parameters (subsystems to ignore) that I can safely pass to udevplug at that stage in the boot process?

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Ben, as noticed in the summary, I'm convinced this isn't a udev bug but a hang in the kernel after loading a particular driver.

The attached udev.logs don't show udevplug waiting around, and the user confirms that /dev/.udev/queue doesn't exist -- so it's not a timeout there either.

Given the "Caps Lock staying on" I'm punting it in your direction. You can obtain a list of everything udev loads from the log files attached.

summary: + udevplug is not waiting for devices, the hang seems to be actually in
+ the kernel after loading a particular driver. All the drivers loaded
+ and population of /sys can be obtained from the udev.logs attached to
+ this bug.
+
+ Given what the submitter has says, I'm pretty convinced this isn't a
+ udev bug.
Changed in udev:
assignee: keybuk → kernel-team
status: Needs Info → Unconfirmed
Revision history for this message
Nicktastic (nicktastic) wrote :

I am also experiencing failures while booting at "Loading hardware drivers", which I understand to be udev-related. I had this problem with Flight 4, I had it with Flight 5, and I still have it after dist-upgrading from Flight 5, a few hours ago.

Specifically, "Loading hardware drivers" blocks the boot process for 2-3 minutes every time I boot, without fail. But that is where the consistent behavior ends.

Sometimes "Loading hardware drivers" reports failure, sometimes it reports success. When it fails, and sometimes when it succeeds - at any rate, the most common scnario - is that the boot continues until EVMS tries to start, and there the boot halts for good. The kernel is not locked, as the Caps/Num lights operate, and I can Alt+SysReq to unmount my disks and reboot, but I cannot Ctrl+C to halt the EVMS init script; it just sits there forever (I've let it sit for five minutes).

The other, more rare scenario is that "Loading hardware drivers" and EVMS both report success, but when I log in to GDM I am presented with a single terminal, which scrolls a repeating error message about permissions on /dev/null, presumably an infinite loop somwhere in a bash profile/rc file. I can Ctrl+C to break the loop and get a usable shell. The problem, obviously, is that /dev/null has 0640 permissions. Issuing /etc/init.d/udev restart fixes the problem. After doing so, I can re-login to GDM and get the GNOME desktop. This is the only scenario which results in a usable system. (Incidentally, this happens on both of my Breezy boxes everytime I boot, and has happened since before Breezy was released. It also happened a few months back when I ran Debian unstable. Infinitely frustrating.)

I have booted with init=/bin/bash as described earlier in this thread, and here are my findings. No queue directory exists in /dev/.udev before or after starting udevd, but it does appear after running udevplug. The last line printed by udevplug is /sys/devices/pci0000:00.

I have no udev log to offer, as everytime "Loading hardware drivers" fails, EVMS blocks the boot indefinitely, and I presume that the loggers are not running when I boot with init=/bin/bash. I have posted ouput from dmesg and lspci, along with the contents of my /dev/.udev directory after running udevplug, at the following url:

http://pasture.ath.cx/~nick/udev/

My kernel never locks, so I don't know if I should be reporting under a bug issued to the kernel team. Please advise.

And please help! Just tell me what you need from me. I really don't want to miss out on Dapper.

Revision history for this message
Nicktastic (nicktastic) wrote :

My problem is due to hdparm, which is invoked via /lib/udev/hdparm.

When /lib/udev/hdparm is run, the kernel reports the following errors:

hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown
hda: set_drive_speed_status: status=0x58 { DriveReady SeekComplete DataRequest }
ide: failed opcode was: unknown

The hdparm processes block forever in io-wait, and thus can't be killed.
This is the reason for udevplug's long runtime, and also the reason EVMS doesn't start.

My system booted perfectly after setting the first line of /lib/udev/hdparm to 'exit 0'.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

Nicktastic: can you attach your /etc/hdparm.conf file

Revision history for this message
Nicktastic (nicktastic) wrote : Nick's /etc/hdparm.conf

For Scott.

Revision history for this message
rsidd (rsidd) wrote :

 After a dist-upgrade today (March 30) I booted twice without incident, and without my kludge So I hoped the bug had gone away -- but alas, it hung on the third reboot...

The fourth time it booted again, so at least the bug seems much less severe than earlier.

I have put a "sleep 2" before the udevplug section in /etc/init.d/udev, am seeing if that helps.

Revision history for this message
ubuntu_demon (ubuntu-demon) wrote :

This bug might be related to certain asus motherboards.

see also this thread : http://www.ubuntuforums.org/showthread.php?t=167722

Revision history for this message
Christian Kujau (christiank) wrote :

Well, I still notice a 3 minute delay during bootup with udev-079-0ubuntu34. I thought it was a kernel bug and reported this one here: http://lkml.org/lkml/2006/7/12/301 but Kay told me that it might be a udev/userspace bug. The thing is: it goes away with 2.6.14 and was introduced somewhere during the 2.6.15 development (see my report for the exact timeline, along with mre logs)

The last thing I see on the console is "Loading hardware drivers...". then I have to wait 3 minutes until booting continues as usual and all is fine and working....

Thank you for your time,
Christian.

Changed in linux-source-2.6.15:
status: Unconfirmed → Confirmed
Revision history for this message
hmc8 (hmc8) wrote :

I have also the same problem. The Boot stops 9 of 10 times at "loading hardware drivers". After a few minutes the booting continues.
After that all works fine expect of my internet-connection (realtek networkcard with dslmodem).
pppoeconfig says it can't find the "access concentrator" of my provider.

I have to reboot until the "loading hardware drivers" bug don't appear. Then everything works fine.

Revision history for this message
magilus (magilus) wrote :

I have the same problem. I booted without splash and quiet and made a photo which is attached.

Revision history for this message
Ben Collins (ben-collins) wrote :

Last kernel version I see tested in 2.6.15-16. Have you updated to the latest dapper-security kernel (-26)? We had other bug reports with this condition, and they were fixed in the latest release.

Revision history for this message
Christian Kujau (christiank) wrote :

Well, the problems is slightly more complicated, methinks: I am booting vmlinuz-2.6.15-26-amd64-k8 (and former ubuntu-kernels) just fine *without* the delay. I noticed the delay when I upgraded to a vanilla 2.6.17. I traced it back to a single kernel-patch introduced in the 2.6.14->2.6.15 development (http://lkml.org/lkml/2006/7/12/308), but Kay said that it's not kernel but init-script related....

When I comment out the "Loading hardware drivers..." it seems to go away, but then no hw-drivers are loaded. But I have to reproduce this one for a current kernel...

Revision history for this message
magilus (magilus) wrote :

Ben, the screenshot was made with kernel version 2.6.15-26-k7 so yeah, the latest kernel release has been tried.

Revision history for this message
Ben Collins (ben-collins) wrote :

Scott, you may want to check the link to lkml in Christian's comment.

Revision history for this message
Scott James Remnant (Canonical) (canonical-scott) wrote :

That patch is just the one that adds the uevent attribute, if you back that out it'll prevent udevplug from re-sending the hotplug event, which will prevent the driver from being loaded in the first place, thus preventing the hang.

You'd get the same effect by changing the config to be =n :)

Revision history for this message
Christian Kujau (christiank) wrote :

> You'd get the same effect by changing the config to be =n :)

...to be "=n" as in "CONFIG_HOTPLUG=n"?

Revision history for this message
rsidd (rsidd) wrote :

Ben, is the kernel source package also updated? I'm running a kernel without preemption and with a larger stack size, because of ndiswrapper. My wifi card is (or was) very flaky with the stock kernel. But the source package reported itself as 2.6.15-9 even when the binary was up to 2.6.15-23.

Revision history for this message
Christian Kujau (christiank) wrote :

well, I've upgraded from 2.6.18-rc2-mm1/2.6.18-rc3 to 2.6.18-rc4-mm1 and the behaviour seems to be gone, no delays during bootup anymore (there are other issues now, as my kbd/mouse is way too fast now, but that's another story and that's what I deserve for tracking -mm :)). I am running LTS (6.06.1), so no changes in userland occured. I plan do test vanilla as well....later.

Revision history for this message
Ben Collins (ben-collins) wrote :

Testing with Edgy's 2.6.17.6 (and upcoming 2.6.17.8) would be much more helpful.

If it got fixed in 2.6.18 devel, then the only way that's going to help is if you are willing to use git-bisect to find the exact commit that fixes it.

Revision history for this message
Christian Kujau (christiank) wrote :

well, I've bisected already to find the patch that introduced it but I'm a bit short on time to do so for the fix again. I'll stick to the ubuntu kernel for now and I might upgrade to edgy soon. FWIW, 2.6.18-rc4 still hast the delays (2.6.18-rc4-mm1 has not, but *everything* was way too fast, so maybe the timeout which triggers the delay did not "succeed")

Revision history for this message
Christian Kujau (christiank) wrote :

somehow "Dapper" (LTS) has been put on the machine (x86-64) and the issue has been resolved _here_. No apt-pinning involved, stock-ubuntu packages with ubuntu-kernel and/or latest vanilla (-git and -mm) are booting without any delays.

Thanks,
Christian.

Revision history for this message
towsonu2003 (towsonu2003) wrote :

sorry to interrupt. just writing what's written in http://librarian.launchpad.net/3866033/P8090127.JPG to ease finding dupes of this (using "search" / google):
------quote starts------
* Setting LVM group
udevd-event[3299]: wait_for_sysfs: waiting for 'sys/devices/platform/i82365.0/bus' failed
------quote ends------

Revision history for this message
rsidd (rsidd) wrote :

I upgraded that laptop to edgy, and the hang seems to be gone now.

Revision history for this message
magilus (magilus) wrote :

For me it has gone, too.

Revision history for this message
towsonu2003 (towsonu2003) wrote :

so can we mark this as "fix released"? (means: works for everyone)

Changed in linux-source-2.6.15:
status: Confirmed → Needs Info
Revision history for this message
hmc8 (hmc8) wrote :

For me it still doesn't work 6 of 7 times. I have to reboot til ubuntu edgy is booting correct.
If i wait a few minutes ubuntu boot into the desktop, but the network connection/Internet doesn't work.
Ubuntu Edgy with dist-uprade

Revision history for this message
towsonu2003 (towsonu2003) wrote :

:(

Changed in linux-source-2.6.15:
status: Needs Info → Confirmed
Revision history for this message
magilus (magilus) wrote :

I think that you are running into an other problem. Could you please boot without the splash and then look where it hangs?

Revision history for this message
hmc8 (hmc8) wrote :

I didn't know how to boot without the splash so i bootet in "single user mode" because it happens there too.
I did a photo of my tft while, this is where it hangs while booting:

Revision history for this message
Bernd Gaßmann (bernd-gassmann) wrote : Re: Hang while booting -- my solution: noapic

Hello out there,

after installing ubuntu on my new notebook yesterday evening, I encountered
the same problem: Hang while booting. Reading this and the corresponding Bug 28439 I found out that it also seemed to be the udev problem mentioned here.
0) Boot up with init=/bin/bash in the kernel arguments
1) Run "udevd --daemon"
2) Check that /dev/.udev/queue does NOT exist
-> no
3) Run "udevplug -s -v"
-> hangs for 3 minutes after "/sys/devices/pci0000:00"
4) Report the last line before it hands
5) Wait for up to three minutes, to see whether it times out and drops you back to the shell.

I really thought that must be a kernel problem, and if I would have gotten the network to run I would have compiled a 2.6.18er kernel to try it out.
So I commented out the /sbin/udevplug in /etc/init.d/udev to hopefully get network access ... but now the computer froze at the line "Configuring network interfaces".
Then I thought about, what fragile features of the kernel could be switched off?
Because ACPI support makes often problems with notebooks, I first tried that one.
And Bingo! No hanging, all is booting OK.
So now the only problem left is that I don't have the ACPI support for now.
But that isn't too bad for the moment, because I am able to really work with my notebook ;-)

Perhaps this simple solution will help others having this problem.

Good luck,

Bernd.

Revision history for this message
Bernd Gaßmann (bernd-gassmann) wrote : Re: Hang while booting -- solution noapic

Oh,

in the last paragraph I forgot something.
Perhaps I should tell the ones who are not that firm with the linux system,
that the flag "noapic" I was talking about has to be set in the kernel arguments.
To test if it will work you can add this directly at the grub command line.
And later on just append it at the corresponding /boot/grub/menu.lst entries.

Bye,

Bernd.

Changed in linux-source-2.6.15:
assignee: kernel-team → ubuntu-kernel-team
Revision history for this message
Jan M. (fijam7) wrote :

I experience the same error as Nicktastic (hda: dma_intr: status=0x58 { DriveReady SeekComplete DataRequest } followed by lost interrupt errors) running Feisty with 2.6.20-15 kernel and indeed the problem is related to hdparm and udev. Commenting stuff in /etc/hdparm.conf or passing nohdparm on boot time fixes the issue but disables DMA mode. What puzzles me is that invoking hdparm from a running system with exactly the same commands as in hdparm.conf works just fine. Any way to get udev processing the file properly?

Regards,
Jan

Revision history for this message
Jan M. (fijam7) wrote :

This is also related to https://bugs.launchpad.net/ubuntu/+source/linux-source-2.6.20/+bug/107460 where I put the rest of my logs.

Revision history for this message
Christian Kujau (christiank) wrote :

Well, no more comments here since 04/2007, fixed_for_me and 8.04 LTS has been released anyway - can't we close this one?

Revision history for this message
Launchpad Janitor (janitor) wrote : This bug is now reported against the 'linux' package

Beginning with the Hardy Heron 8.04 development cycle, all open Ubuntu kernel bugs need to be reported against the "linux" kernel package. We are automatically migrating this linux-source-2.6.15 kernel bug to the new "linux" package. We appreciate your patience and understanding as we make this transition. Also, if you would be interested in testing the upcoming Intrepid Ibex 8.10 release, it is available at http://www.ubuntu.com/testing . Please let us know your results. Thanks!

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

The Ubuntu Kernel Team is planning to move to the 2.6.27 kernel for the upcoming Intrepid Ibex 8.10 release. As a result, the kernel team would appreciate it if you could please test this newer 2.6.27 Ubuntu kernel. There are one of two ways you should be able to test:

1) If you are comfortable installing packages on your own, the linux-image-2.6.27-* package is currently available for you to install and test.

--or--

2) The upcoming Alpha5 for Intrepid Ibex 8.10 will contain this newer 2.6.27 Ubuntu kernel. Alpha5 is set to be released Thursday Sept 4. Please watch http://www.ubuntu.com/testing for Alpha5 to be announced. You should then be able to test via a LiveCD.

Please let us know immediately if this newer 2.6.27 kernel resolves the bug reported here or if the issue remains. More importantly, please open a new bug report for each new bug/regression introduced by the 2.6.27 kernel and tag the bug report with 'linux-2.6.27'. Also, please specifically note if the issue does or does not appear in the 2.6.26 kernel. Thanks again, we really appreicate your help and feedback.

Revision history for this message
Thom Pischke (thom-pischke) wrote :

I'm seeing this bug as a regression in Intrepid. Had no problems on Hardy. 2 out 3 boots simply hang at Loading Hardware Drivers although I have a Dell Inspiron 1420n.

Revision history for this message
Leann Ogasawara (leannogasawara) wrote :

Hi Thom,

You might have been seeing bug 263059 which was resolved with a 2.6.27-7.12 or newer kernel.

Can anyone else here comment if this is still an issue with Intrepid since it's set to be released today. rsidd, since you're the original bug reporter it would be great to get your feedback. Thanks.

Changed in linux:
status: Confirmed → Incomplete
Revision history for this message
Thom Pischke (thom-pischke) wrote :

Hi Leann,

Yes, you're right. bug 263059 IS the bug I was seeing (iwl3945 driver), and it is fixed now in intrepid. Should have posted again here, sorry.

Revision history for this message
Launchpad Janitor (janitor) wrote : Kernel team bugs

Per a decision made by the Ubuntu Kernel Team, bugs will longer be assigned to the ubuntu-kernel-team in Launchpad as part of the bug triage process. The ubuntu-kernel-team is being unassigned from this bug report. Refer to https://wiki.ubuntu.com/KernelTeamBugPolicies for more information. Thanks.

Revision history for this message
Manoj Iyer (manjo) wrote :

Moving the status of the bug to Fixed Released, the last comment shows that this is reported as fixed in Intrepid.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers