Comment 0 for bug 1557761

Revision history for this message
Will Bryant (willbryant) wrote :

We have noticed that the upgrade from 12.04 to 14.04 has resulted in daemons that are started from rc.d scripts sometimes being run twice.

We've tracked this down to a race condition in the failsafe script. Here is the normal sequence of events:

Mar 16 10:39:37 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:39:37 failsafe: net-device-up start event emitted
Mar 16 10:39:37 failsafe: starting failsafe script
Mar 16 10:39:37 failsafe: sleeping in failsafe script
Mar 16 10:39:37 failsafe: static-network-up start event emitted
Mar 16 10:39:37 failsafe: rc-sysinit starting event emitted
Mar 16 10:39:37 kernel: [ 2.056689] init: failsafe main process (642) killed by TERM signal

(Note the inaccurate message about the 120 seconds being reached which is actually logged immediately on boot - best just to ignore that. The TERM warning is also harmless - that is the normal result.)

Here is what we see on a bad boot, where the rc.d scripts are started twice:

Mar 16 10:24:47 failsafe: static-network-up start event emitted
Mar 16 10:24:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:24:47 failsafe: Failsafe of 120 seconds reached.
Mar 16 10:24:47 failsafe: net-device-up start event emitted
Mar 16 10:24:47 failsafe: starting failsafe script
Mar 16 10:24:47 failsafe: sleeping in failsafe script
Mar 16 10:26:47 failsafe: emitting from failsafe script
Mar 16 10:26:47 failsafe: rc-sysinit starting event emitted
Mar 16 10:26:47 kernel: [ 122.229597] init: failsafe main process (797) killed by TERM signal

rc-sysinit has been emitted twice.

Note that the rc-sysinit event has been emitted before the failsafe script has been emitted, because in this boot it happens that the static-network-up event was emitted before the net-device-up event.

As a result, the normal stop on "starting rc-sysinit" rule in the failsafe job definition doesn't work because the failsafe job is not yet running.

Another way to look at the issue is that the rc-sysinit job definition's "start on (filesystem and static-network-up) or failsafe-boot" means that it will always start twice if it finishes before the failsafe handler fires.