16.04 recovery shell works only for two minutes

Bug #1662137 reported by Detlev Zundel on 2017-02-06
66
This bug affects 14 people
Affects Status Importance Assigned to Milestone
friendly-recovery (Ubuntu)
High
Dimitri John Ledkov
Bionic
High
Dimitri John Ledkov

Bug Description

Selecting "Rescue" shell from the "Advanced options" menu in grub enters the "friendly-recovery" service and allows to drop into a root shell. After ~120 seconds, systemd sees a timeout and starts another "friendly-recovery" whiptail process. This and the running shell now compete for tty access (input and output) thus making the shell nearly unusable.

Diagnosing this, it becomes clear that in the first root shell entered from friendly-recovery, "systemctl list-jobs" lists the generated "wait for disk" tasks for the /boot partition and the swap partition still to be running. Only when they run into a timeout, the next friendly-recovery is spawned. I'll attach a log session for such a boot that has two manual "logger" entries in it:

Feb 06 10:54:57 harry root[605]: now in root shell from friendly-recovery
...
Feb 06 10:56:41 harry root[1254]: now running in parallel

This clearly shows the timing of the startup.

I have no idea why the "wait for disk" units run into a timeout as the system boots correctly in the "non-rescue" mode. I'll further attach a "systemd-analyze blame" from a regular bootup to show that there is no trace of such a timeout to be found.

Furthermore i do have this behaviour on a real machine and on an (independent) VM, so affects more than the machine the bug was reported from.

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: systemd 229-4ubuntu16
ProcVersionSignature: Ubuntu 4.4.0-62.83-generic 4.4.40
Uname: Linux 4.4.0-62-generic x86_64
ApportVersion: 2.20.1-0ubuntu2.5
Architecture: amd64
CurrentDesktop: GNOME
Date: Mon Feb 6 11:02:37 2017
InstallationDate: Installed on 2015-03-27 (681 days ago)
InstallationMedia: Ubuntu 14.04.2 LTS "Trusty Tahr" - Release amd64 (20150218.1)
MachineType: Hewlett-Packard HP EliteBook 8460p
ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-62-generic root=UUID=310be2be-96ea-4fe9-b929-75605e718fdc ro resume=/dev/sda6 loop.max_part=63 quiet splash vt.handoff=7
SourcePackage: systemd
SystemdDelta:
 [EXTENDED] /etc/systemd/system/display-manager.service → /lib/systemd/system/display-manager.service.d/xdiagnose.conf
 [EXTENDED] /lib/systemd/system/rc-local.service → /lib/systemd/system/rc-local.service.d/debian.conf
 [EXTENDED] /lib/systemd/system/systemd-timesyncd.service → /lib/systemd/system/systemd-timesyncd.service.d/disable-with-time-daemon.conf

 3 overridden configuration files found.
UpgradeStatus: Upgraded to xenial on 2016-12-01 (66 days ago)
dmi.bios.date: 12/22/2011
dmi.bios.vendor: Hewlett-Packard
dmi.bios.version: 68SCF Ver. F.22
dmi.board.name: 161C
dmi.board.vendor: Hewlett-Packard
dmi.board.version: KBC Version 97.4A
dmi.chassis.asset.tag: 617H
dmi.chassis.type: 10
dmi.chassis.vendor: Hewlett-Packard
dmi.modalias: dmi:bvnHewlett-Packard:bvr68SCFVer.F.22:bd12/22/2011:svnHewlett-Packard:pnHPEliteBook8460p:pvrA0001D02:rvnHewlett-Packard:rn161C:rvrKBCVersion97.4A:cvnHewlett-Packard:ct10:cvr:
dmi.product.name: HP EliteBook 8460p
dmi.product.version: A0001D02
dmi.sys.vendor: Hewlett-Packard

Detlev Zundel (laodzu) wrote :
Detlev Zundel (laodzu) wrote :
Changed in systemd (Ubuntu):
assignee: nobody → Dimitri John Ledkov (xnox)
milestone: none → ubuntu-17.02
importance: Undecided → High
status: New → Confirmed
Nathan Dorfman (ndorf) wrote :

Although I couldn't reproduce it in a VM, this reliably happens on both of my hardware installs, an old Thinkpad and an Asus Z87-based desktop. The recovery shell is completely unusable. Luckily Ctrl+Alt+Del still reboots the system cleanly.

The workaround seems to be to boot directly to the systemd rescue (or emergency) target, instead of recovery. This can be done simply by adding "rescue" or "emergency" to the default linux command line.

Gordon (kmputerguy) wrote :

I have slightly different behavior but the same ultimate result.

After two minutes in the root shell, the friendly-rescue menu shows up again, conflicting in a bizarre way with the existing prompt; sometimes keystrokes affect one process and sometimes the other, sometimes both somehow. They write over eachother breaking both the shell and the menu, and usually making the whole server unusable until a reboot.

My fix was to uninstall friendly-rescue, which at least gives me another root shell that breaks less when it starts overtop of my existing one after two minutes.

Can confirm @kmputerguy's report.
Seen this bug on multiple hardware configurations, it basically renders the rescue mode useless for involved maintenance operations.

tags: added: rls-bb-incoming
Steve Langasek (vorlon) on 2018-03-15
tags: removed: rls-bb-incoming
tags: added: id-5aaa925072882f79f72239f6
Dimitri John Ledkov (xnox) wrote :

I think i can solve this in friendly recovery.

affects: systemd (Ubuntu Bionic) → friendly-recovery (Ubuntu Bionic)
Changed in friendly-recovery (Ubuntu Bionic):
milestone: ubuntu-17.02 → none
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package friendly-recovery - 0.2.38

---------------
friendly-recovery (0.2.38) unstable; urgency=medium

  * Make friendly-recovery block the systemd emergency / rescue shell
    modes. LP: #1662137
  * Bump debhelper and starndards version, drop dh-systemd build-dep.
  * Support activating networking using systemd units for resolved,
    networkd, NetworkManager, ifupdown. LP: #1682637

 -- Dimitri John Ledkov <email address hidden> Thu, 29 Mar 2018 14:38:38 +0100

Changed in friendly-recovery (Ubuntu Bionic):
status: Confirmed → Fix Released
Joi Owen (jlellis) wrote :

I have this issue on a hyper-v gen2 secure-boot Ubuntu 16.04 Desktop install. Because it's gen2 Hyper-v I can't boot my usual SystemRescueCD, and dropping into recovery mode from the grub menu is also worthless due to this bug. My last option is to boot from an install cd and try to resize the partition from there. A 5 minute process has become a two-day nightmare.

Kyle J. McKeown (drift91) wrote :

This is still happening for me in 18.04.1 LTS. Can anyone else confirm the bug is still present? I'm not sure if I just did a botched update that didn't replace the systemd package or if the bug is back.

Another oddity that I'm experiencing is the text cursor going to the left of "ubuntu login:" or "kyle@ubuntu:~$" after a similar duration as the recovery bug.

Tihomir Heidelberg (9a4gl) wrote :

This annoying bug is fixed 7 months ago and still not released for LTS version (16.04) ?

Dimitri John Ledkov (xnox) wrote :

@Tihomir

It has been fix released in all systemd-based LTS releases.... have you retested this with uptodate friendly-recovery?

Tihomir Heidelberg (9a4gl) wrote :

Hm, weird. I did update,upgrade and dist-upgrade. friendly-recovery is not upgraded and still 0.2.31. But new kernel is installed and now the problem is gone. Both with old kernel and new one. Is there a chance that new kernel and all post-install scripts fixed the issue ?

Mathew Hodson (mathew-hodson) wrote :

@Tihomir

There was a Xenial update for friendly-recovery done in bug 1766872 that likely fixed your issue.

Alan Mimms (alanmjabil) wrote :

I have Bionic with friendly-recovery 0.2.38ubuntu1 and the problem still occurs. I apparently had three filesystems that were in "timeout". I went to recovery to see what was causing this and got ~120 seconds before the session was destroyed and the recovery options menu was redisplayed. I couldn't get anything to work after that. I hit down arrow many times to try to get to the root shell item and it moved down one line to the second menu item after a few down arrow keys and then wouldn't budge.

To me it seems there are SEVERAL (at least two, but maybe more) shells all competing for keystrokes.

FYI: I think the filesystem errors were caused by my installation of a new NVMe SSD and knocking loose a SATA interface cable or two. I think this is a good way to reproduce this problem, in fact. Set up several filesystems (EXT4 or BTRFS or even VFAT) on a drive then power off and pull the cable and boot again. You'll likely see the same effect I had.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers