Upgrade from 16.04 to 18.04 crashed, required dpkg --configure -a to recover (multiple machines)

Bug #1789977 reported by Deborah Hooker
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ubuntu-release-upgrader (Ubuntu)
Expired
Undecided
Unassigned

Bug Description

I've now had three systems we've tried to upgrade from 16.04 to 18.04.1 which crashed during the upgrade and had to be recovered with dpkg --configure -a. I was able to recover two of them with a bit of manual care, but I am now hesitant to upgrade other machines until we know what's going on here. I have included the /var/log/dist-upgrade files from my machine (since I used my own machine as the most recent guinea pig). It does look like I have lvm2-activation segfault messages in the kern.log on at least two machines right around the time of the upgrade crash, but I don't know if that's coincidental or not.

Sample of the segfault messages from my machine:
Aug 29 15:10:27 freyja kernel: [195160.629514] lvm2-activation[15715]: segfault at d0 ip 00007fd827fca856 sp 00007ffde5d35680 error 4 in liblvm2app.so.2.2[7fd827fb9000+101000]
Aug 29 15:10:27 freyja kernel: [195160.713440] lvm2-activation[15733]: segfault at d0 ip 00007f44b3988856 sp 00007ffdb6e5d4c0 error 4 in liblvm2app.so.2.2[7f44b3977000+101000]
Aug 29 15:10:27 freyja kernel: [195160.798651] lvm2-activation[15752]: segfault at d0 ip 00007f346d42f856 sp 00007ffd5768e4f0 error 4 in liblvm2app.so.2.2[7f346d41e000+101000]
Aug 29 15:10:29 freyja kernel: [195161.966851] lvm2-activation[15828]: segfault at d0 ip 00007f777861b856 sp 00007fff93d2c630 error 4 in liblvm2app.so.2.2[7f777860a000+101000]
Aug 29 15:10:29 freyja kernel: [195162.486662] lvm2-activation[15917]: segfault at d0 ip 00007f9103082856 sp 00007fff41154300 error 4 in liblvm2app.so.2.2[7f9103071000+101000]
Aug 29 15:10:29 freyja kernel: [195162.563455] lvm2-activation[15935]: segfault at d0 ip 00007f52b6bbc856 sp 00007fff1870edf0 error 4 in liblvm2app.so.2.2[7f52b6bab000+101000]
Aug 29 15:10:30 freyja kernel: [195163.412017] lvm2-activation[15979]: segfault at d0 ip 00007f6936a39856 sp 00007ffd51ffb310 error 4 in liblvm2app.so.2.2[7f6936a28000+101000]
Aug 29 15:10:33 freyja kernel: [195166.198145] lvm2-activation[16105]: segfault at d0 ip 00007f3f237e0856 sp 00007ffc39518a00 error 4 in liblvm2app.so.2.2[7f3f237cf000+101000]
Aug 29 15:11:05 freyja kernel: [195198.249543] lvm2-activation[21607]: segfault at d0 ip 00007f49dd5f5856 sp 00007fff75685b10 error 4 in liblvm2app.so.2.2[7f49dd5e4000+101000]
Aug 29 15:11:09 freyja kernel: [195202.640113] lvm2-activation[21835]: segfault at d0 ip 00007fc43ec0e856 sp 00007ffe6658e450 error 4 in liblvm2app.so.2.2[7fc43ebfd000+101000]
Aug 29 15:11:11 freyja kernel: [195204.413656] lvm2-activation[21970]: segfault at d0 ip 00007f0f3bd83856 sp 00007ffdef97c380 error 4 in liblvm2app.so.2.2[7f0f3bd72000+101000]
Aug 29 15:11:11 freyja kernel: [195204.485270] lvm2-activation[21988]: segfault at d0 ip 00007f6ea5b4f856 sp 00007fff9e026e90 error 4 in liblvm2app.so.2.2[7f6ea5b3e000+101000]
Aug 29 15:11:11 freyja kernel: [195204.554083] lvm2-activation[22006]: segfault at d0 ip 00007f93b3d47856 sp 00007ffcdef72de0 error 4 in liblvm2app.so.2.2[7f93b3d36000+101000]
Aug 29 15:11:14 freyja kernel: [195206.946621] lvm2-activation[22115]: segfault at d0 ip 00007fb5f1816856 sp 00007fffe506a9e0 error 4 in liblvm2app.so.2.2[7fb5f1805000+101000]
Aug 29 15:11:25 freyja kernel: [195218.473850] lvm2-activation[22749]: segfault at d0 ip 00007f277405f856 sp 00007ffe3c670390 error 4 in liblvm2app.so.2.2[7f277404e000+101000]
Aug 29 15:11:54 freyja kernel: [195246.961884] lvm2-activation[23901]: segfault at d0 ip 00007fbfda96f856 sp 00007fff601f88f0 error 4 in liblvm2app.so.2.2[7fbfda95e000+101000]
Aug 29 15:11:57 freyja kernel: [195250.704760] lvm2-activation[24042]: segfault at d0 ip 00007f833389c856 sp 00007ffdc0a0fcd0 error 4 in liblvm2app.so.2.2[7f833388b000+101000]
Aug 29 15:11:57 freyja kernel: [195250.787236] lvm2-activation[24060]: segfault at d0 ip 00007fc4aa5be856 sp 00007fff226ab300 error 4 in liblvm2app.so.2.2[7fc4aa5ad000+101000]
Aug 29 15:11:58 freyja kernel: [195251.543320] lvm2-activation[24091]: segfault at d0 ip 00007ff974d37856 sp 00007ffcf7801690 error 4 in liblvm2app.so.2.2[7ff974d26000+101000]
Aug 29 15:11:58 freyja kernel: [195251.665093] lvm2-activation[24110]: segfault at d0 ip 00007f3d949bc856 sp 00007ffc6d0cc830 error 4 in liblvm2app.so.2.2[7f3d949ab000+101000]
Aug 29 15:13:11 freyja kernel: [195324.098299] lvm2-activation[27039]: segfault at d0 ip 00007faabf509856 sp 00007ffeb550cc20 error 4 in liblvm2app.so.2.2[7faabf4f8000+101000]
Aug 29 15:13:11 freyja kernel: [195324.275785] lvm2-activation[27067]: segfault at d0 ip 00007fe5456d9856 sp 00007ffd5bb51b90 error 4 in liblvm2app.so.2.2[7fe5456c8000+101000]
Aug 29 15:13:13 freyja kernel: [195326.100150] lvm2-activation[27182]: segfault at d0 ip 00007f16d56ec856 sp 00007ffd44857010 error 4 in liblvm2app.so.2.2[7f16d56db000+101000]
Aug 29 15:13:13 freyja kernel: [195326.420312] lvm2-activation[27233]: segfault at d0 ip 00007f0bf29de856 sp 00007fff286ea440 error 4 in liblvm2app.so.2.2[7f0bf29cd000+101000]
Aug 29 15:13:13 freyja kernel: [195326.503807] lvm2-activation[27251]: segfault at d0 ip 00007f6476061856 sp 00007ffd422afac0 error 4 in liblvm2app.so.2.2[7f6476050000+101000]
Aug 29 15:13:19 freyja kernel: [195331.944316] lvm2-activation[27513]: segfault at d0 ip 00007f2be539d856 sp 00007ffc2f03c940 error 4 in liblvm2app.so.2.2[7f2be538c000+101000]
Aug 29 15:14:11 freyja kernel: [195383.946410] lvm2-activation[29411]: segfault at d0 ip 00007fc93d19f856 sp 00007ffd6ae018c0 error 4 in liblvm2app.so.2.2[7fc93d18e000+101000]
Aug 29 15:15:46 freyja kernel: [195479.584771] lvm2-activation[32464]: segfault at d0 ip 00007f35108a2856 sp 00007ffe3b345490 error 4 in liblvm2app.so.2.2[7f3510891000+101000]
Aug 29 15:15:48 freyja kernel: [195481.652139] lvm2-activation[32554]: segfault at d0 ip 00007f6b476a1856 sp 00007ffdde2fa720 error 4 in liblvm2app.so.2.2[7f6b47690000+101000]
Aug 29 15:15:49 freyja kernel: [195482.225092] lvm2-activation[32620]: segfault at d0 ip 00007fb7f84ab856 sp 00007ffdf59f8840 error 4 in liblvm2app.so.2.2[7fb7f849a000+101000]
Aug 29 15:16:06 freyja kernel: [195499.517210] lvm2-activation[1075]: segfault at d0 ip 00007fbbe2286856 sp 00007fff4fbc96b0 error 4 in liblvm2app.so.2.2[7fbbe2275000+101000]
Aug 29 15:16:06 freyja kernel: [195499.748871] lvm2-activation[1112]: segfault at d0 ip 00007fb41a38b856 sp 00007ffd8b1edb80 error 4 in liblvm2app.so.2.2[7fb41a37a000+101000]
Aug 29 15:16:08 freyja kernel: [195501.755037] lvm2-activation[1197]: segfault at d0 ip 00007f7e5515c856 sp 00007ffe67271a20 error 4 in liblvm2app.so.2.2[7f7e5514b000+101000]

Any ideas? We can conceivably manually take each employee's system and assume the upgrade will crash, and just manually fix each one (I think) but that seems like a terrible solution.

ProblemType: Bug
DistroRelease: Ubuntu 18.04
Package: ubuntu-release-upgrader-core 1:18.04.25
ProcVersionSignature: Ubuntu 4.15.0-33.36-generic 4.15.18
Uname: Linux 4.15.0-33-generic x86_64
NonfreeKernelModules: nvidia_modeset nvidia
ApportVersion: 2.20.9-0ubuntu7.2
Architecture: amd64
CrashDB: ubuntu
CurrentDesktop: ubuntu:GNOME
Date: Thu Aug 30 13:46:49 2018
PackageArchitecture: all
ProcEnviron:
 TERM=xterm-256color
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/tcsh
SourcePackage: ubuntu-release-upgrader
Symptom: ubuntu-release-upgrader
UpgradeStatus: Upgraded to bionic on 2018-08-29 (0 days ago)
VarLogDistupgradeTermlog:

Revision history for this message
Deborah Hooker (dhookersl) wrote :
tags: added: xenial2bionic
tags: added: third-party-packages
Revision history for this message
Brian Murray (brian-murray) wrote :

Are there any crash files in /var/crash/ which are related to the upgrade process?

I tried recreating the failure by using the apt-clone file attached to this bug report and then upgrading but I did not receive a crash.

Changed in ubuntu-release-upgrader (Ubuntu):
status: New → Incomplete
Revision history for this message
Deborah Hooker (dhookersl) wrote :

I haven't found any crash files in /var/crash on either system that I looked at in detail. It actually sort of acts like a normal 'install for a while, then reboot', it just comes back up in a half-installed state after that first reboot, which is why I think it crashed rather than normally rebooted. lastlog also shows a crash rather than a normal reboot, at least on my system.

Revision history for this message
Deborah Hooker (dhookersl) wrote :

Honestly, if this were one system I'd be of the opinion that I recovered it and it's fine. What's concerning me is that we've seen it on three systems now and we have a bunch more that need to be upgraded.

The only thing I can think of if it's not reproducible outside our office is that it's something to do with our local debian cache. I'm going to add our local debian guru to the bug. I'm not sure how to diagnose if that's the cause or what part of that might be the cause.

Revision history for this message
Erich E. Hoover (ehoover) wrote :

How are your LVM2 volumes configured? Based on that segfault, I would guess that it has something to do with that (and would not necessarily be reproducible if the volume configuration is not similar).

I would have a hard time believing that it has anything to do with our custom Debian packages. However, it might be worth the trouble to try and configure a fresh 16.04 system and upgrading it, then do it again and try upgrading after adding our packages.

Revision history for this message
Deborah Hooker (dhookersl) wrote :

Neither system had LVM configured at all, which is why I thought it might be coincidental, but I wanted to include it anyway.

I did have to manually deal with our custom edits to /etc/ssh/ssh_config and /etc/strongswan.d/charon.conf and System76's custom edits to /etc/default/grub in order to recover the system (as part of the dpkg --configure -a process). I don't have any evidence that those custom edits cause the upgrade to fail, though.

Revision history for this message
Brian Murray (brian-murray) wrote :

I found a couple of crash reports (bug 1767747) in Launchpad regarding lvm2 and the crash happening on reboot after upgrading to 18.04, although its possible they only received the notification after rebooting. Do you happen to know whether the lvm2 crash occurred after reboot or during the upgrade?

There is also this bucket, which is private, in the Ubuntu Error Tracker but that stacktrace (while incomplete) looks rather different.

https://errors.ubuntu.com/problem/c53304c6d22dcaad9f9b61ac48cc8e1cfa66ef17

Revision history for this message
Deborah Hooker (dhookersl) wrote :

I am almost certain that the lvm2 messages happen in the couple of minutes before the box crashes/reboots itself. The fact that they happen over several minutes is part of what makes me think that it's coincidental and just something that's unhappy because we're in the middle of installing stuff underneath it. But they were significant and they did happen at the right time, so I didn't want to leave them out.

I can try to get some time from someone here to set up a machine, remove just our custom edits, and try an upgrade if that seems valuable. I'd also appreciate any other suggestions for things to try -- I don't think we can try every possible combination but we can try at least a few things if anyone has any ideas.

tags: added: id-5b91488653e37e4a1e6b8095
Revision history for this message
Erich E. Hoover (ehoover) wrote :

@dhookersl I believe that if he ran apt-clone on your file that it would have installed our custom packages for him, so his environment should be perfectly identical to yours except for the charon config file (that is the only config file we don't package).

Revision history for this message
Deborah Hooker (dhookersl) wrote :

So in that case the only possible culprits are the charon config file and the System76 stuff (since presumably it won't actually affect a non-System76 machine). I can poke around with System76 and see if they know anything, and we should be able to try a run with the charon config put back to normal to see if that helps anything.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for ubuntu-release-upgrader (Ubuntu) because there has been no activity for 60 days.]

Changed in ubuntu-release-upgrader (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.