system hangs on reboots after upgrading lucid alpha 2 hal packages

Bug #524135 reported by ppanon

This bug report was converted into a question: question #102242: system hangs on reboots after upgrading lucid alpha 2 hal packages.

6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
hal (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Binary package hint: hal

After applying updates for Lucid Alpha 2, the system hangs on reboot after file system detection. Narrowed down the issue to the installing of the update for hal/libhal1/libhal-storage1 packages from 0.5.14-0ubuntu2 to 0.5.14-0ubuntu3.

ppanon@whiteygu:~$ apt-cache policy hal
hal:
  Installed: 0.5.14-0ubuntu2
  Candidate: 0.5.14-0ubuntu3
  Version table:
     0.5.14-0ubuntu3 0
        500 http://ca.archive.ubuntu.com lucid/main Packages
 *** 0.5.14-0ubuntu2 0
        100 /var/lib/dpkg/status

Originally updated from Karmic install and was hanging in graphic boot while displaying "Waiting for /tmp". Have since had to try re-installing from Alpha 2 CD and it appears to hangs around file system initialization. This seems to happen whether the file systems are ext3 (original setup) of nearly all ext4 (after reinstall). Also using LVM on RAID1 md.

Revision history for this message
ppanon (ppanon) wrote :

This seems to have actually occurred because I was using the Update Manager administrative app. It appears that some updates were being held back requiring a dist-upgrade and some came through as a regular upgrade even though they required some of the other (new dist) packages. The Update Manger app didn't provide an option for the dist-upgrade. The resulting incomplete and inconsistent upgrade caused the boot problem. Doing an apt-get dist-upgrade immediately after the apt-get upgrade from the command line on an Alpha 2 re-install seems to get past the problem.

Revision history for this message
ppanon (ppanon) wrote :

Sigh, this still happens after all. It just doesn't seem to always happen right away after an update. Sometimes it needs a few reboots, but once it happens I can't seem to clear it up without reinstalling Alpha 2 (also appear to need to wipe at least my root partition). I've lost count of how many reinstalls I've had to do, but it's over 5. After my first re-install of 10.4 Alpha 2, it's been /var that's been hanging, not /tmp. I just get the ubuntu logo and "Waiting for /var [SM]" underneath. Before that logo I see an error about uvcvideo, followed by the results of e2fscks for root, /boot. /home, /tmp, but not /var. I'm also seeing the ureadahead errors indicated in bug #484677 but those go away after applying the "start on local-filesystems" change indicated in bug #432360. None of those changes appear to fix the issue with the wait for /var. If I boot from the Alpha 2 Alternative CD (running RAID and LVM) and use the rescue mode, I can run fsck on /var with no problems.

That said, if the init startup order did change, as is implied in 484677, then could it be that there is now something starting up earlier which is placing a lock on /var and preventing the fsck from running somehow? What is running those fscks anyways? It doesn't look like mountall should be doing it by default but something is. I no longer think this is hal -related since it seems to happen before hald ever starts, but I'm not sure what it is related to. init scripts? Can anybody help me out to figure out what's breaking here?

Revision history for this message
ppanon (ppanon) wrote :

I went through one more install/upgrade cycle and found that a) there appears to be no significant configuration changes in /etc that would be likely to cause this problem, and b) reformatting /var doesn't appear to be necessary during a reinstall to clear up the Waiting for /var hang. That seems to indicate that the problem is in one of the 570+ packages that get upgraded with an apt-get dist-upgrade applied to alpha2. Since the only visible major changes between the working pre-update boot sequence and the broken post-update changes are the uvcvideo error (unlikely) and the missing fsck startup, the problem seems likely to be either with fsck or the process which invokes it during init fs mounting.

I'm not sure how to figure out how upstart is starting up file system mounting. The only reference to fsck that I've seen is in /etc/init/mountall.conf. At first glance it would seem mountall shouldn't be trying to run fsck before mounting file systems since mountall has a parameter to force that action, but maybe that's for forcing a full check and it's calling fsck to increment a mount count by default. That would seem to make mountall and e2fsck the most likely suspected culprits for my woes. So I'll try to see if I can grab a copy of the Alpha2 versions of those packages and try to roll back to them after the dist-upgrade. I'm just wondering if I need to rebuild them after the new kernel and headers are installed by dist-upgrade but before rebooting. If anybody knowledgeable would be willing to give me some pointers, or even encouragement that I'm looking in the right direction if that's finally the case, I'd really appreciate it.

Revision history for this message
ppanon (ppanon) wrote :

Hmm. Well it doesn't seem to be the updates to e2fsprogs/libs or mountall. I obtained/prepared packages to roll back those updates after doing the dist-upgrade and I still had the interminable Waiting for /var [sm] behaviour, with one less e2fsck invocation being shown on bootup. So presumably something else is changing something or mounting and locking the /var filesystem somehow in a way that prevents is from being checked and mounted during the boot sequence.

Revision history for this message
ppanon (ppanon) wrote :

Summary: I had upgraded my Karmic system to Lucid Alpha 2 and everything went fine. I had also been applying updates with Update Manager successfully until about mid-February when a whole bunch of updates came out. After those updates, my machine would fail to completely boot. Ever since, I found I can re-install with the Lucid Alpha 2 alternate install CD to maintain my multi-filesystem setup (/boot, /, /home. /tmp, /var) that uses LVM over a RAID 1 (mirrorred) disk pair. However once I try to apply all the current updates with Update Manager, the system successfully boots once (although perhaps slower than usual), and then fails to complete booting up on the subsequent reboot. BTW, there also seems to be an issue with grub2 configuration in the install CD but I managed to bypass that by using LILO, generating a Grub configuration once Alpha2 was running, and keeping a tarball of the /boot/grub directory around that I could expand during subsequent Alpha 2 re-installs.

The boot sequence hangs with the Ubuntu logo and a "Waiting for /var [SM]" message. One thing I have noticed is that with a successful Lucid Alpha 2 boot, the boot sequence shows banners for 5 e2fsck startups and after the update only 4 show up. I also see the fsck results for all the file systems on Alpha2, but results from /var are missing after the dist-upgrade to what I presume are an early release of the Alpha 3 packages. I've tried rolling back the mountall and e2fsprogs/libs/etc packages to the Alpha 2 versions after doing the dist-upgrade, but that hasn't changed anything. After the upgrade, I've tried booting from the Alpha 2 CD into rescue mode and successfully fsck'd /var (the only problem it picked up was a "time stamp in the future" issue) but the system still wouldn't boot completely. With 500+ update packages in the queue it just isn't feasible for me to figure out which one is the culprit especially when the problem takes 2 reboots to manifest, so I need help. What should I be doing at this point to narrow down the cause of the problem when I can't get the system to complete booting to a shell prompt?

I just don't know what to try next but I would expect an Alpha 3 install will have similar problems for me when it comes out. There probably aren't too many people running Lucid Alpha on LVM/RAID with a separate /var filesystem, so I'm presuming that one of those items is a factor in causing this problem. Since those are more common on production systems, it would be good to address this before Lucid gets deployed to those.

Changed in hal (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.