init: does not wait for parent to exit when following forks

Reported by Scott James Remnant (Canonical) on 2010-03-02
192
This bug affects 67 people
Affects Status Importance Assigned to Milestone
upstart
Medium
James Hunt
Declined for 0.1 by Scott James Remnant (Canonical)
Declined for 0.2 by Scott James Remnant (Canonical)
Declined for 0.3 by Scott James Remnant (Canonical)
Declined for 0.5 by Scott James Remnant (Canonical)
Declined for 0.6 by Scott James Remnant (Canonical)
Declined for Trunk by Scott James Remnant (Canonical)
upstart (Ubuntu)
Medium
Steve Langasek
Precise
Medium
Steve Langasek

Bug Description

When following a fork, once Upstart receives the new pid, it immediately starts following that and forgets about the parent entirely not even waiting for it to exit.

There are two problems with this:

The first is the kind of daemon that forks early, but then maintains communication with the parent so that the parent can exit with an appropriate exit code in case of error (rather than the kind which only fork once they are ready). These kinds aren't ready until the parent exits, not at the point of the fork. Since upstart follows the fork, it will think they are ready "too early"

The second is simply any that writes a pid file (which is perhaps a simplification of the above), if they write the pid file in the parent (as even nih_main_daemonise does!) then there's no guarantee the pid file actually exists when the "start" command returns.

(Other than the fact we use ptrace tends to mean that the race works out in our favour)

Changed in upstart:
status: New → Triaged
importance: Undecided → Medium
David Robert Lewis (afrodeity) wrote :

Jun 17 12:50:26 afrodeity-desktop init: plymouth-stop pre-start process (1948) terminated with status 1

Rovano (rovano) wrote :

I tested Kubuntu 10.04(full update 9.7.) on slowly USB flash drive and HP old notbook with Intel GMA945 and this problem.plymouth pre start proces.
Must start with single no splash and manual startx.

On notebook harddisk is Ubuntu 10.04(full update) and no problem.

Or USB flash drive instalation Kubuntu is bad(crashed)? 1 unicate problem solved with fsck on /.

Andrew Edmunds (andrew-edmunds) wrote :

Just to note that statd is affected by this bug and this results in NFS mounts failing sometimes on boot. I have posted a patch to address that particular issue in https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/610863

ErikShreve (eshreve) wrote :

Also, this bug (I believe) impacts Stunnel4. Upstart ends up not tracking the first PID. But the first stunnel process continues to run. Thus, when Upstart tries to stop stunnel, one process is left running.

This and bug #406397 are fundamental due to Upstart's use of ptrace() for fork following - a rewrite of Upstart is in progress which uses other newer kernel mechanisms to follow forks and would not be vulnerable to this kind of issue. However there's no "quick fix" to it, and no way to backport that code.

The way strace does it should work. I don't recall the mechanism, but it
shouldn't be too hard to use. Maybe the rewrite in question is in that
direction, dunno.

On Wed, Sep 1, 2010 at 6:34 AM, Scott James Remnant <email address hidden>wrote:

> This and bug #406397 are fundamental due to Upstart's use of ptrace()
> for fork following - a rewrite of Upstart is in progress which uses
> other newer kernel mechanisms to follow forks and would not be
> vulnerable to this kind of issue. However there's no "quick fix" to it,
> and no way to backport that code.
>
> --
> init: does not wait for parent to exit when following forks
> https://bugs.launchpad.net/bugs/530779
> You received this bug notification because you are a member of Goobuntu
> Team, which is a subscriber of a duplicate bug (533059).
>
> Status in Upstart: Triaged
>
> Bug description:
> When following a fork, once Upstart receives the new pid, it immediately
> starts following that and forgets about the parent entirely not even waiting
> for it to exit.
>
> There are two problems with this:
>
> The first is the kind of daemon that forks early, but then maintains
> communication with the parent so that the parent can exit with an
> appropriate exit code in case of error (rather than the kind which only fork
> once they are ready). These kinds aren't ready until the parent exits, not
> at the point of the fork. Since upstart follows the fork, it will think
> they are ready "too early"
>
> The second is simply any that writes a pid file (which is perhaps a
> simplification of the above), if they write the pid file in the parent (as
> even nih_main_daemonise does!) then there's no guarantee the pid file
> actually exists when the "start" command returns.
>
> (Other than the fact we use ptrace tends to mean that the race works out in
> our favour)
>
>
>

dyuen (daniellyyuen) wrote :

In Bug #406397

Scott James Remnant wrote on 2009-12-13: #13

>On Sat, 2009-12-12 at 09:55 +0000, Lars Düsing wrote:
>
>> I'm wondering why this bug has a importance of "low", as it renders
>> using upstart for many daemons (including apache, postfix and others) as
>> impossible.
>>
>Because conversion of those daemons over to Upstart is not a priority;
>just carry on using the existing init script.

>Scott

What if someone made a daemon, want to keep it up all time, went into this bug, and ubuntu have upstart replaced inittab? What should he/she do? Any suggestion?

Bård Lind (bard-lind) wrote :

Hi.
I'm trying to setup UEC with CC and NC on the same hw.

Unfortenately I'm not able to launch any instances due to the bug that's a duplicate of this.

Any hints on how I can get a version of UEC which actuarialy works?

Bård

AlainKnaff (kubuntu-misc) wrote :

> The first is the kind of daemon that forks early, but then maintains communication with the parent so that the parent can exit
> with an appropriate exit code in case of error (rather than the kind which only fork once they are ready). These kinds aren't
> ready until the parent exits, not at the point of the fork. Since upstart follows the fork, it will think they are ready "too early"

Waiting until the parent dies will not work in cases such as squid, where the parent never dies.

A more appropriate solution would be to add the possibility to track by pidfile (to be specified in /etc/init/<service>.conf)

Ben Hekster (heksterb) wrote :

I'd like to add my two cents to this discussion, as it bears some relation to the mechanism Upstart uses to determine the PID to monitor.

I was attempting to Upstart-enable PBS/TORQUE-- which is a suite of cluster resource management software. Because of the way it was packaged (not wanting to mess with the RPM specification too much), and to maintain maximum compatibility with the SysV version, I tried doing this by having the Upstart '.conf' directly invoke the SysV scripts by doing something like:

exec /etc/init.d/pbs_mom start

After struggling with this for a while I finally understood why this can't work. The 'pbs_mom' SysV script calls LSB 'start-stop-daemon', which in turn invokes the actual PBS daemon executable. In other words, there are *three* fork()s involved in starting the daemon-- so neither 'expect fork' (one fork) nor 'expect daemon' (two forks) find the right PID in this case.

In fact, the situation is actually worse than that: the 'pbs_mom' script calls 'start-stop-daemon' once early on in a 'test' mode, to verify that the daemon isn't actually already running. So in actual fact, there are *five* fork()s involved in starting the daemon.

It's obvious that the way Upstart is designed, it can never reliably support starting services in this way. The right approach is just to start the daemon directly, without going through intermediate SysV scripts.

It is somewhat concerning to me, though, that there seem to be valid use cases and other boundary cases that cannot be supported by Upstart under its current design. The alternative, already proposed by others, would be the ability to specify the PID (or a PID file) to Upstart. The assumption that Upstart will always be able to 'reverse engineer' the daemon PID, by some clever mechanism or other, seems to me somewhat questionable.

Andrew Pollock (apollock) wrote :

I've grappled with similar issues in the past, and just conceded defeat, and let the process Upstart's managing run in the foreground. This offends my sensibilities, in that I feel that services should daemonize themselves, but aside from that, seems to work fine.

Scott James Remnant (scott) wrote :

The problem with the approach of informing the init daemon about the
pid is that daemon authors have historically proven themselves
incapable of discovering it, I've seen very few examples that actually
generate a pid file correctly (including without race conditions)

That's not to say that finding them out isn't without it's problems
either, though I believe we have a solution now.

Scott

Clint Byrum (clint-fewbar) wrote :

On Wed, 2011-01-12 at 16:37 +0000, Ben Hekster wrote:
> I'd like to add my two cents to this discussion, as it bears some
> relation to the mechanism Upstart uses to determine the PID to monitor.
>
> I was attempting to Upstart-enable PBS/TORQUE-- which is a suite of
> cluster resource management software. Because of the way it was
> packaged (not wanting to mess with the RPM specification too much), and
> to maintain maximum compatibility with the SysV version, I tried doing
> this by having the Upstart '.conf' directly invoke the SysV scripts by
> doing something like:
>
> exec /etc/init.d/pbs_mom start
>
> After struggling with this for a while I finally understood why this
> can't work. The 'pbs_mom' SysV script calls LSB 'start-stop-daemon',
> which in turn invokes the actual PBS daemon executable. In other words,
> there are *three* fork()s involved in starting the daemon-- so neither
> 'expect fork' (one fork) nor 'expect daemon' (two forks) find the right
> PID in this case.
>
> In fact, the situation is actually worse than that: the 'pbs_mom' script
> calls 'start-stop-daemon' once early on in a 'test' mode, to verify that
> the daemon isn't actually already running. So in actual fact, there are
> *five* fork()s involved in starting the daemon.
>
> It's obvious that the way Upstart is designed, it can never reliably
> support starting services in this way. The right approach is just to
> start the daemon directly, without going through intermediate SysV
> scripts.
>
> It is somewhat concerning to me, though, that there seem to be valid use
> cases and other boundary cases that cannot be supported by Upstart under
> its current design. The alternative, already proposed by others, would
> be the ability to specify the PID (or a PID file) to Upstart. The
> assumption that Upstart will always be able to 'reverse engineer' the
> daemon PID, by some clever mechanism or other, seems to me somewhat
> questionable.
>

You can absolutely use the init.d script from upstart if there is a need
to depend on upstart events to time the startup:

-----------

# start-pbs

start on upstart-event-to-start-on

task

exec /etc/init.d/pbs start

-----------

# stop-pbs

start on upstart-event-to-stop-on

task

exec /etc/init.d/pbs stop

-----------

This is a bit of a hack, but it works if the init.d script must be used.

Scott James Remnant (scott) wrote :

Also

start on upstart-event-to-start-on
stop on upstart-event-to-stop-on

pre-start exec /etc/init.d/pbs start
post-stop exec /etc/init.d/pbs stop

Steve Langasek (vorlon) on 2012-01-18
Changed in upstart (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
assignee: nobody → James Hunt (jamesodhunt)
Andrey Andreev (andyceo) wrote :

Can confirm this behavior. My dmesg shows:

....
[ 77.294511] init: plymouth-stop pre-start process (1944) terminated with status 1
[ 125.756619] init: bootchart post-stop process (3866) terminated with status 141
[ 248.972426] Valid eCryptfs headers not found in file header region or xattr region
....

Dell Inspiron 1525, Ubuntu 11.10 amd64

Steve Langasek (vorlon) wrote :

> Waiting until the parent dies will not work in cases such as squid, where the parent
> never dies.

This didn't make sense to me, so I had a look. The current squid package in Ubuntu runs with -N, which is "no daemonize" (i.e., foreground) mode. So in that case, certainly, the parent doesn't die... because the parent doesn't fork. And the upstart job uses neither 'expect fork' nor 'expect daemon'. However, if run without -N, squid certainly does fork and exit as expected.

We still certainly need to think carefully about changing upstart to track exits instead of just forks; but for my part I can't see any way that this behavior change would break existing jobs.

Steve Langasek (vorlon) on 2012-06-01
Changed in upstart (Ubuntu Precise):
assignee: James Hunt (jamesodhunt) → Steve Langasek (vorlon)
Changed in upstart (Ubuntu):
assignee: James Hunt (jamesodhunt) → Steve Langasek (vorlon)
James Hunt (jamesodhunt) on 2013-10-01
Changed in upstart:
assignee: nobody → James Hunt (jamesodhunt)
Steve Langasek (vorlon) on 2013-11-12
Changed in upstart:
status: Triaged → In Progress
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers