Aborted upgrade process left laptop in a non-functional state

Bug #365485 reported by Sam Morris on 2009-04-23
Affects Status Importance Assigned to Milestone
update-manager (Ubuntu)

Bug Description

While upgrading a laptop from Ubuntu 8.10 to 9.04, the user noticed that openoffice.org-writer had taken about 15 minutes to be unpacked. Investigating futher I noticed that the dpkg process was in the uninterruptible sleep state (D). Examining the kernel message buffer with dmesg I saw that several OOPSes had occurred (I'll try to attach the output once I get the laptop up and running again). At this point I decided to reboot; that was a bad idea...

The laptop would no longer boot up. Instead of X starting at the end of the bootup process, there was simply a blank screen. After a few moments I realised that the system was simply sitting on a blank virtual console, and was able to switch to another tty and log in.

At this point, I ran startx, but the system froze completely with a black screen and a mouse pointer. I had to hard-reboot; this time, I ran 'update-manager -d' to try to resume the upgrade, but got an error saying that pygtk could not be imported.

I doubt the exact cause of this problem will ever be identified, even if it is possible for it to be fixed. Instead, I file this bug to draw your attention to the fact that the present upgrade process is very unreliable. If an upgrade is interrupted, it must not leave the system in a totally unusable state!

As a Debian user it is extremely disappointing to see that you have taken the traditional and very reliable 'dist-upgrade' process, and replaced it with some weird, fragile upgrade script that requires working python and X installations.

It is also frustrating that there is basically no end-user documentation for the upgrade process. The user gets a funny icon in their notification area, which they click on to trigger the upgrade script... but where are they presented with the release notes, including upgrade and recovery procedures? I'm talking about a document like Debian's: <http://www.debian.org/releases/lenny/i386/release-notes/ch-upgrading.en.html>.

It is also very frustrating to see that the fragility of the ugprade process has not improved since I reported bug #108276 for the Feisty upgrade process, two years ago.

Martin-Éric Racine (q-funk) wrote :

It's really frustrating to see Debian users come here and display a complete lack of basic English reading skills by reporting a bug against the entirely wrong package (upgrade-system), despite mentioning in their bug report that they really mean to complain about another package (update-manager).

affects: upgrade-system (Ubuntu) → update-manager (Ubuntu)
Sam Morris (yrro) wrote :

Some particularly amusing leftovers from the upgrade script were:

 * existence of a mysterious '/usr/shareFeisty' directory;
 * the permissions of /dev/null were reset such that only root may write to it
 * screwed up ttf-uralic package that complained it could not be removed because its fonts had already been de-registered
 * failure to configure ubuntu-standard package because atd package would not configure; this was because /etc/init.d/atd start failed; but it did not print out why. Strace revealed that it was trying to connect to /dev/log, but the connection was refused. After I manually started sysklogd, I could read the real error message in syslog: apparantly atd did not have permission to access /var/sppol/cron/atjobs. This directory is now owned by user/group bin(!), as was /var/spool/cron... what the hell?

Sam Morris (yrro) wrote :

Jesus, I'm sorry I don't have a magical built-in knowledge of which package to file bugs such as this against. I'm sorry for naively typing in 'upgrade' into the unhelpful package search box and picking a package that sounded relevant to the problem at hand!

Daniel Holbach (dholbach) wrote :

Martin-Éric: your comment is completely out of order. It's a mistake very easy to make.

I'm sure I don't need to remind you of http://www.ubuntu.com/community/conduct

Michael Vogt (mvo) wrote :

Hello Sam, sorry for the trouble you experienced during the upgrade. I agree with you that the lack of documentation is a problem. However I would like to point out that we use the same underlaying technology /apt/dpkg) as debian to perform the upgrade. A kernel oops in the middle of the upgrade (when udev/X/dbus are in not well definied states) is something that a debian system would not take lightly as well.

There is a option in the "Recovery" boot menu called "dpkg - Repair broken packages" that should be able to help. Please let me know if that helps with the problem.

Michael Vogt (mvo) wrote :

I understand your frustration, another observation:

  * existence of a mysterious '/usr/shareFeisty' directory;

That looks like file system corruption (a result of the oops?) more than anything else. Could you please run a filesystem check please?

Sam Morris (yrro) wrote :

The difference is that on Debian, I can always resume an upgrade done with apt-get (or aptitude) dist-upgrade. Even though the dpkg process had gotten wedged in state D, the (first) reboot was a normal one; after I rebooted, I fully expected 'dpkg --configure -a' to resume where it left off, as it did.

Now although it turns out that I can do that on Ubuntu too, I had no way of knowing that; all I had to go on was that the upgrade was started from a mysterious icon in the notification area that no longer appears; after some research I found out that this was update-manager, but since that requires pygtk and X, I was not able to run it. For all I knew, this upgrade script did important things in addition to a dist-upgrade, things that wouldn't be done if the script was not used...

The problem here is basically the fragility of the upgrade process caused by relying on the update-manager script. From the PoV of an experienced Debian user, it seems like a pretty, but fragile front-end to running 'aptitude dist-upgrade'. And since it does not present any upgrade documentation to the user, the user is powerless to fix their system when an upgrade is aborted for whatever reaso.
I have now got the system to a working state by doing the old 'dpkg --configure --pending' followed by a few invocations of 'aptitude dist-upgrade' punctuated by fixing the issues I listed above. Working enough to download an install CD, anyway, and do a fresh install. :)

As for /usr/shareFeisty, I assumed that was an artifact of buggy maintainer scripts, rather than filesystem corruption; although, since I didn't run fsck before re-installing, I can't rule that out.

Jason White (tinystoy28) on 2013-01-30
summary: - Aborted upgrade process left laptop in totally fucked state
+ Aborted upgrade process left laptop in a non-functional state
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Bug attachments