rc-sysinit run twice due to failsafe race condition

Bug #1557761 reported by Will Bryant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

We have noticed that after the upgrade from 12.04 to 14.04, daemons that are started from rc.d scripts are sometimes being run twice.

We've tracked this down to a race condition in the failsafe script. Here is the normal sequence of events:

Mar 16 10:39:37 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:39:37 failsafe: net-device-up start event emitted
Mar 16 10:39:37 failsafe: starting failsafe script
Mar 16 10:39:37 failsafe: sleeping in failsafe script
Mar 16 10:39:37 failsafe: static-network-up start event emitted
Mar 16 10:39:37 failsafe: rc-sysinit starting event emitted
Mar 16 10:39:37 kernel: [ 2.056689] init: failsafe main process (642) killed by TERM signal

(Note the inaccurate message about the 120 seconds being reached which is actually logged immediately on boot - best just to ignore that. The TERM warning is also harmless - that is the normal result.)

Here is what we see on a bad boot, where the rc.d scripts are started twice:

Mar 16 10:24:47 failsafe: static-network-up start event emitted
Mar 16 10:24:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:24:47 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:24:47 failsafe: net-device-up start event emitted
Mar 16 10:24:47 failsafe: starting failsafe script
Mar 16 10:24:47 failsafe: sleeping in failsafe script
Mar 16 10:26:47 failsafe: emitting from failsafe script
Mar 16 10:26:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:26:47 kernel: [ 122.229597] init: failsafe main process (797) killed by TERM signal

rc-sysinit has been emitted twice.

Note that the rc-sysinit event has been emitted before the failsafe script has been emitted, because in this boot it happens that the static-network-up event was emitted before the net-device-up event.

As a result, the normal stop on "starting rc-sysinit" rule in the failsafe job definition doesn't work because the failsafe job is not yet running.

Another way to look at the issue is that the rc-sysinit job definition's "start on (filesystem and static-network-up) or failsafe-boot" means that it will always start twice if it finishes before the failsafe handler fires.

Revision history for this message
Will Bryant (willbryant) wrote :

Current versions:

will@nz-stg-app-wlg-d7:~$ cat /etc/lsb-release
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=14.04
DISTRIB_CODENAME=trusty
DISTRIB_DESCRIPTION="Ubuntu 14.04.4 LTS"
will@nz-stg-app-wlg-d7:~$ uname -a
Linux nz-stg-app-wlg-d7 3.13.0-79-generic #123-Ubuntu SMP Fri Feb 19 14:27:58 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
will@nz-stg-app-wlg-d7:~$ dpkg -l upstart
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-=============================================-===========================-===========================-===============================================================================================
ii upstart 1.12.1-0ubuntu4.2 amd64 event-based init daemon

Will Bryant (willbryant)
description: updated
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in upstart (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.