PLD Linux Distribution

init: job stuck with expect fork/daemon when parent reaps child

Reported by Esmil on 2009-07-29
398
This bug affects 73 people
Affects Status Importance Assigned to Milestone
upstart
Medium
James Hunt
PLD Linux
Undecided
Unassigned
upstart (Debian)
Confirmed
Unknown
upstart (Ubuntu)
Low
Unassigned

Bug Description

Hi

Wrong use of the expect fork stanza can create job with status
  job stop/killled, process nnn
without any process nnn running on the system.

As an example the following avahi.conf should have used
"expect daemon", but will instead create a stuck job.

stop on stopping dbus-system
respawn
expect fork
exec avahi-daemon -D

/Emil Renner Berthing

Confirmed on test system here. I guess that a side-effect of ptrace() is that we don't get the SIGCHLD signal for the process, or could we be ignoring it?

Changed in upstart:
importance: Undecided → High
status: New → Confirmed

Unfortunately what we're running into here turns out to be good old-fashioned UNIX semantics.

Becoming a daemon involves calling fork() twice to completely detach from your calling session and terminal before carrying on in the grandchild process while the parent and child both exit.

When the parent exits, init receives SIGCHLD because it is the process that spawned it and is its own parent. The child process is then reparented to the init daemon.

When the child exits, if the parent has *not yet* exited (remember that after a fork() things do not happen in a deterministic order) the parent receives the SIGCHLD for it because it spawned it, otherwise init receives the SIGCHLD because it's the new parent.

This isn't normally a problem because daemons don't worry about handling SIGCHLD before daemonising, so the SIGCHLD is still pending when the parent exits and the child is reparented. The kernel notices this and sends SIGCHLD to the init daemon after reparenting the zombie.

But for some reason known only to himself, Lennart has chosen to explicitly and deliberately install a SIGCHLD handler during the damonisation (inside the actual call to daemon_fork() in the libdaemon library avahi uses).

We simply can't handle this through ptrace and signals. Fixing this properly is going to neet the netlink-based code. Fortunately it probably only affects Lennart's code and only when you get the "expect" line wrong.

It's somewhat unfortunate that this results in a stuck job, because init is basically still waiting for the SIGCHLD that never comes. This isn't a situation that's really possible to deal with, if you like it's an assertion error.

The only alternative would be to have some kind of timeout after sending SIGKILL for the process to die, but then we'd hit other problems when the process is truly still running (e.g. NFS timeout in kernel deadlock).

I strongly dislike the idea of a "just forget about it" flag, but I guess we'll need one of those too.

Changed in upstart:
status: Confirmed → Triaged
Johan Kiviniemi (ion) wrote :

...or perhaps patch software that uses the SIGCHLD handler in question, at least until the proc connector implementation? :-P

Demoting this since it only affects avahi

Changed in upstart:
importance: High → Low
summary: - init: job stuck after wrong use of expect fork
+ init: job stuck expect fork/daemon when parent reaps child (avahi-
+ daemon)
summary: - init: job stuck expect fork/daemon when parent reaps child (avahi-
+ init: job stuck with expect fork/daemon when parent reaps child (avahi-
daemon)

This also happens when you use "expect fork" or "expect daemon" combined with "script", Upstart ends up following the first spawned child whose exit status is reaped by the shell. For example:

  script
      ARGS=$(cat /etc/default/myservice)
      exec /sbin/myservice $ARGS
  end script

Upstart ends up with the pid of "cat", and never receives SIGCHLD for it. So stays in running indefinitely, and when you try and stop it, hangs in "stop/killed"

summary: - init: job stuck with expect fork/daemon when parent reaps child (avahi-
- daemon)
+ init: job stuck with expect fork/daemon when parent reaps child
Changed in upstart (Ubuntu):
status: New → Triaged
importance: Undecided → Low
Daniel Hahler (blueyed) wrote :

A good workaround for this might be pidfile handling, where upstart would use a given pidfile for monitoring the job.

On Sat, 2009-10-03 at 15:10 +0000, Daniel Hahler wrote:

> A good workaround for this might be pidfile handling, where upstart
> would use a given pidfile for monitoring the job.
>
Ugh, god no.

Do you know how many bugs there are with a system of requiring
applications to write the correct pid to a file? Do you know how many
major problems you introduce if they get it wrong?

The "solution" is to use something like cgroups or the proc_connector to
track multiple processes.

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Daniel Hahler (blueyed) wrote :

It would be a good workaround in this case though..

I'm not saying that "pidfile" should get used by default, but it's a nice option to have in case the program's pidfile behavior is sane.

Lars Düsing (lars.duesing) wrote :

I run in the same problem on https://bugs.edge.launchpad.net/ubuntu/+source/aiccu/+bug/223825
comment #35 downwards.
Any hints so far?

Lars Düsing (lars.duesing) wrote :

After looking into aiccu deeper, I see you have a real problem. You keep the first pid you get from the first fork/clone. At least aiccu uses multiple threads on initialization:

lars@artus:~$ sudo strace -o aiccu.log aiccu start
lars@artus:~$ grep clone aiccu.log
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0xbfe8ca94) = 2992
clone(child_stack=0, flags=CLONE_PARENT_SETTID|SIGCHLD, parent_tidptr=0xbfe8ca94) = 2994
clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0xb773e938) = 2996
lars@artus:~$ ps aux|grep aiccu
root 2996 0.0 0.0 19732 832 ? Ssl 10:39 0:00 aiccu start
lars@artus:~$ service aiccu status
aiccu start/running, process 2992
lars@artus:~$ ps aux|grep 2992
lars 3049 0.0 0.0 3040 792 pts/0 R+ 10:41 0:00 grep 2992

That means: you track pid 2992, which is only used during initialization. Real pid is 2996.

On "service aiccu stop" you try to kill pid 2992 - which is not there, and you keep stuck in a inifinite loop.

Lars Düsing (lars.duesing) wrote :

I'm wondering why this bug has a importance of "low", as it renders using upstart for many daemons (including apache, postfix and others) as impossible.

Changed in aiccu (Ubuntu):
status: New → In Progress
status: In Progress → New
Lars Düsing (lars.duesing) wrote :

Back to the bug itself again: A very evil solution would be checking the "lowest" pid of the daemon (ok, you have to look out when pids are starting again from zero again...)

On Sat, 2009-12-12 at 09:55 +0000, Lars Düsing wrote:

> I'm wondering why this bug has a importance of "low", as it renders
> using upstart for many daemons (including apache, postfix and others) as
> impossible.
>
Because conversion of those daemons over to Upstart is not a priority;
just carry on using the existing init script.

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Adam Nelson (adam-varud) wrote :

I would vote for this to be higher priority. For anybody who doesn't know 'old' init, it's frustrating to have to learn all the old syntax just to get common stuff like Apache running properly. In my case, I need a custom Xvfb service and there's nothing out there right now. Apache already has maintained service scripts, but other programs don't.

If not increasing the priority of the ticket, can the Ubuntu docs be updated to reflect that Upstart is not ready for mainline users to start moving init scripts to the format?

PFudd (kernel-pkts) wrote :

Hi...

I got bitten by this bug earlier today.

I was attempting to set 'autossh' to be respawned, via a new autossh.conf file. I copied the cron.conf file, and changed the exec line to the appropriate autossh command line, and left the rest the same. The mistake was that autossh doesn't fork into the background, but 'expect fork' was set.

The outcome was that 'initctl stop autossh' and 'initctl start autossh' both hang, and I had to copy autossh.conf to autossh2.conf and delete the 'expect fork' line before I could get autossh to run. I can't get rid of the old autossh entry in 'initctl list', except possibly by rebooting. Since I'm remotely logged in, I don't want to do that unless I'm confident my connection is going to work... yay, catch-22.

exec /usr/bin/autossh -M 81 -N -R 29:127.0.0.1:22 servername

For sanity's sake, I'm closing the Ubuntu tasks for upstream Upstart bugs. I've experimented with having both, but it is just making bugs hard to find now. Will use the policy whereby bugs on the Ubuntu package exist in the Ubuntu packaging or patches only, any bugs in the Upstart code are Upstream bugs.

Changed in upstart (Ubuntu):
status: Triaged → Invalid

On Wed, 2010-04-07 at 18:47 +0000, Lars Düsing wrote:

> Has anybody forwarded this problem to upstream?
>
Ubuntu are upstream for Upstart (I wrote it!)

Scott
--
Scott James Remnant
<email address hidden>

Changed in upstart:
importance: Low → Medium

I had the same problem with Upstart 0.6.5. Job locked at the 'killed' state when the process Upstart waited for didn't even exist. This is quite a serious problem as:

1. Can happen when the job 'expect fork|daemon' stanza does not match what the process really does. And when it happens fixing the job description won't help, as a job with same name cannot be started any more.

2. Even worse: the system won't even shut down properly – it will stay waiting for the 'killed' job to finish which will never happen. Not good, when the shutdown is reboot during remote trouble-shooting.

The solution seem 'obvious' to me: don't wait for a process that doesn't even exist (even in a 'zombie' state). Or do I miss something?

This and bug #530779 are fundamental due to Upstart's use of ptrace() for fork following - a rewrite of Upstart is in progress which uses other newer kernel mechanisms to follow forks and would not be vulnerable to this kind of issue. However there's no "quick fix" to it, and no way to backport that code.

Peter Júnoš (petoju) wrote :

When will be that rewrite complete?

Or anyone has workaround, how to kill start/killed or stop/killed job? It could be only one-time job / script.

Tim Nicholas (tjn) wrote :

I've been bitten by this with squid after having to kill the squid processes as a result of the upgrade process borking on 10.04 (LTS).

I was upgrading to this: https://launchpad.net/ubuntu/+source/squid/2.7.STABLE7-1ubuntu12.2 replacing squid 2.7.STABLE7-1ubuntu12.

I assume I'm missing something, cos it looks a lot like I'm fucked and I need to reboot to get squid to start from init again.

Surely not...

Clint Byrum (clint-fewbar) wrote :

Tim, I'm sorry that you're experiencing the issue. I believe that squid may have always had this issue and we only stirred it up by updating squid.

There is a really scary workaround available which is to create a program that exhausts the pid-space until tracked pid exists again, and then upstart will be able to re-attach and kill it...

http://heh.fi/tmp/workaround-upstart-snafu

I do think that we may need something a little less perfect than a full rewrite of the ptrace tracker that we could possibly backport to lucid. There seems to be a missing command that would allow an administrator to tell upstart to just forget about an instance.

Tim Nicholas (tjn) wrote :

Thanks for your help Clint.

Is Ubuntu still intending to stick with upstart in future releases?

Johan Kiviniemi (ion) wrote :

This specific issue will be fixed in a future release of Upstart with robust fork tracking. Is there a specific reason to switch? It’s not as if there are better alternatives out there. :-)

Tim Nicholas (tjn) wrote :

I personally think being different from everyone else is a problem.

Also, this is the sort of bug that should be architecturally impossible in an init system. init shouldn't have to track state perfectly to function on the basic level of 'start jobs' 'stop jobs'.

I understand there are benefits for workstations with this (boot speed etc), but for a server upstart is notably less good than old-school init - even without the bugs, it's still very opaque - what with its lack of logging etc.

Anand Chitipothu (anandology) wrote :

If I notice a start/killed or stop/killed job, is there any way to get rid of it other than rebooting the machine?

Guido Scalise (guido-scalise) wrote :

Anand, there's a rather harsh workaround using a ruby script you can download here:
http://heh.fi/tmp/workaround-upstart-snafu

It worked for me, but, as I said, the approach is quite "aggressive".

Basically it forks new dummy, short-lived, processes until the PID sequence restarts. Then, when one of these subprocesses gets the "blocked" PID, it waits until upstart kills it, thus unlocking the job's state.

I didn't see any side effects, but YMMV.

Anand Chitipothu (anandology) wrote :

Thanks Guido Scalise. That fixed it.

David Ressman (davidressman) wrote :

Out of curiosity, why can't a piece be written into init that allows one to manually remove processes from the state table?

Rich Wales (richw) wrote :

I think I'm running into this bug on an Ubuntu Lucid server. I tried to install the rsyslog package on this server, but the start / stop / restart functions would never complete (hangs forever). In the end, I had to abandon rsyslog and go back to sysklogd — bad solution, I need rsyslog, but I obviously don't have a choice if it simply won't work.

There are a bunch of other packages on this same server which use upstart-job as their init scripts; they seem to work OK as far as I'm aware, but the fact that rsyslog won't work makes me nervous.

Lucid is supposed to be an LTS release with support until 2015 (server edition), but is that fact going to help me in practice (i.e., is upstart ever going to be fixed for Lucid)? If I were to decide to upgrade this server to Maverick, is there any realistic reason to suppose that would help?

Should I simply scrounge around for an old-style init script for rsyslog to run on this server?

Is there some other solution I should be looking at?

Changed in aiccu (Ubuntu):
status: New → Confirmed

Likewise to using a script stanza, this is also a problem when a script that actually starts a daemon is specified in `exec'.

Although it wouldn't help with the original problem (avahi-deamon), for scripts a workaround/fix could be if you could tell upstart the name of the final executable, so it could ignore forks that exec another executable. Don't know if that's feasible?

Scott, would it be possible to add the thing that David Ressman suggested, please? Something to manually reset a job state without restarting the machine.

I'm a newbie to upstart and this bug is really annoying as it requires me to restart my machine when I write the conf in a wrong way... Makes debugging a nightmare.

While implementing a new upstart script, i also ran into this problem, and am now stuck with a job waiting for a non-existing pid.

I think the worst-case scenario has not been mentioned yet, which is that when the actual pid does appear, it will be killed by this job thinking it is the process it has been waiting to kill. This could be a harmless process (or even created for that purpose by a script mentioned above), but it could also be a critical process. Thus the bug is less harmless than it appears to be.

Peter Júnoš (petoju) wrote :

According to Mark Hurenkamp's comment, severity of this bug should be set to Critical.
Image your very important app got the same PID as non-existent process had.
Then Upstart:
- causes data corruption - imagine you are editing your photos and something kills your application during rewrite of a photo
- severely affects applications beyond the package responsible for the root cause - any application can be killed ("crashing") randomly

Clint Byrum (clint-fewbar) wrote :

Peter, I understand that this is a dangerous bug. However, it is quite detectable when a user has made the mistake, and avoidable by taking care when creating new jobs. So I think Medium is appropriate, as there are workarounds, and the potential for these problems is quite low.

Given that the PID would be killed pretty much as soon as it started and that you're most likely to trigger this while writing an unstart job for a daemon, I'd agree that the potential for serious problems is low.

However, I don't think this is a user error/mistake, some daemons are just more awkward than others, may require scripts, etc., I don't think it's reasonable to expect every daemon out there to be re-written to play well.

A short idea to overcome most of that problems once:
Why do not give upstart a config-line like "pidfile /var/run/xyz.pid"
and interpret this one?

Lars Düsing (lars.duesing) wrote :

oh, i see... this idea was turned down. sorry.

Yeah, as Scott already mentioned, pidfiles introduce their own set of problems, but also they aren't a solution for all use cases. E.g. the daemon I was dealing with does not have any way to write out a pidfile, forks multiple times, requires a script to do some setup before it's launched, and is a third party and closed source, so I can't do anything about this mess ;)

Changed in upstart (Debian):
status: Unknown → New
no longer affects: aiccu (Ubuntu)
Clint Byrum (clint-fewbar) wrote :

It seems to me that upstart could forget about a pid if it can verify that it has in fact disappeared. I've seen quite a few people bit by this bug in #upstart on Freenode and on various forum sites. Because I think the impact is perhaps higher than we might have originally anticipated, I'd suggest that upstart devs raise the priority to High.

john miller (johnmille1) wrote :

Could upstart track an arbitrary number of forks? instead of limiting to 1 (expect fork) or 2 (expect daemon) could expect N be supported?

Also here is a bash script to exhaust the pid space if your system does not have ruby.

pass the pid upstart is waiting for as an argument like this exhaustPIDspace.sh 11920

#!/bin/bash

usleep 1 &
firstPID=$!
#first lets exhaust the space
while [ $! -ge $firstPID ]
do
    usleep 1 &
done

while [ $! -le $1 ]
do
    usleep 1 &
done

john miller (johnmille1) wrote :

I came up with a hack to use upstart with applications that fork more than twice to use until the rewrite makes it downstream. It works for my application on my system. YMMV.

1. start the application in the pre-start section
2. in the script section run a script that runs as long as the application runs. The pid of this script is what upstart will track.
3. in the post-stop section kill the application

example

env DAEMON=/usr/bin/forky-application

pre-start script
    su -s /bin/sh -c "$DAEMON" joeuseraccount
end script

script
    sleepWhileAppIsUp(){
        while pidof $1 >/dev/null; do
            sleep 1
        done
    }

    sleepWhileAppIsUp $DAEMON
end script

post-stop script
    if pidof $DAEMON;
    then
        kill `pidof $DAEMON`
        #pkill $DAEMON # post-stop process (19300) terminated with status 1
    fi
end script

a similar approach could be taken with pid files.

Changed in upstart (Debian):
status: New → Confirmed
Jeff Lambert (jl-newtraxtech) wrote :

John Miller: THANK YOU!!! Only thing is I'm not sure if this is compatible with respawn

john miller (johnmille1) wrote :

I use it with respawn with out an issue just make sure you kill everything/clean up in post-stop. I do the same kill/cleanup in pre-start just to be sure.

Wade Fitzpatrick (wade-8) wrote :

I had a problem with john miller's bash script because bash didn't die quickly enough for the microsleep to get re-parented by init (upstart), so here is a better version:

#!/bin/bash

usleep 1 &
firstPID=$!
#first lets exhaust the space
while (( $! >= $firstPID ))
do
    usleep 1 &
done

# [ will use testPID itself, we want to use the next pid
declare -i testPID
testPID=$(($1 - 1))
while (( $! < $testPID ))
do
    usleep 1 &
done

# fork a background process then die so init reaps its pid
sleep 3 &
echo "Init will reap PID=$!"
kill -9 $$
# EOF

Example usage:
# sh /tmp/upstart_fix.sh 19915
Init will reap PID=19915
Killed

James Hunt (jamesodhunt) on 2013-10-01
Changed in upstart:
assignee: nobody → James Hunt (jamesodhunt)
J G Miller (jgmiller) wrote :
Download full text (3.8 KiB)

This issue also seems to be the cause of difficulties with minidlna if the -R flag is added to rescan the media files directory because minidlnad creates an additional process to do the rescanning. So if expect daemon is used for the two process at startup, then the shutdown runs into difficulties because there is only one process to be killed, the rescan process having ended by itself.

So until a fix is found for upstart, users should be advised not to add -R for rescan to the upstart configuration file, and that if they need to do a rescan at startup, use the traditional sysV rc init.d script and not upstart to start and stop the daemon.

It would also be very useful if there was some initctl command to wipe that state of service jobs which are in a bad state eg
 minidlna stop/killed, process 5114 or minidlna start/killed, process 5114
since once the upstart job gets into this state, using start or stop results in upstart hanging on

connect(3, {sa_family=AF_FILE, path=@"/com/ubuntu/upstart"}, 22) = 0
fcntl64(3, F_GETFL) = 0x2 (flags O_RDWR)
fcntl64(3, F_SETFL, O_RDWR|O_NONBLOCK) = 0
geteuid32() = 0
getsockname(3, {sa_family=AF_FILE, NULL}, [2]) = 0
poll([{fd=3, events=POLLOUT}], 1, 0) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "\0", 1, MSG_NOSIGNAL) = 1
send(3, "AUTH EXTERNAL 30\r\n", 18, MSG_NOSIGNAL) = 18
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
read(3, "OK 37d274edad794a392790c969525d3"..., 2048) = 37
poll([{fd=3, events=POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "NEGOTIATE_UNIX_FD\r\n", 19, MSG_NOSIGNAL) = 19
poll([{fd=3, events=POLLIN}], 1, -1) = 1 ([{fd=3, revents=POLLIN}])
read(3, "AGREE_UNIX_FD\r\n", 2048) = 15
poll([{fd=3, events=POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
send(3, "BEGIN\r\n", 7, MSG_NOSIGNAL) = 7
poll([{fd=3, events=POLLIN|POLLOUT}], 1, -1) = 1 ([{fd=3, revents=POLLOUT}])
sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\2\1\r\0\0\0\1\0\0\0_\0\0\0\1\1o\0\23\0\0\0/com/ubu"..., 112}, {"\10\0\0\0minidlna\0", 13}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 125
clock_gettime(CLOCK_MONOTONIC, {13369, 479497006}) = 0
poll([{fd=3, events=POLLIN}], 1, 25000) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\1&\0\0\0\1\0\0\0\17\0\0\0\5\1u\0\1\0\0\0\10\1g\0\1o\0\0"..., 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 70
recvmsg(3, 0xbfd42610, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\2\1\4\0\0\0\2\0\0\0x\0\0\0\1\1o\0!\0\0\0/com/ubu"..., 136}, {"\0\0\0\0", 4}], msg_controllen=0, msg_flags=0}, MSG_NOSIGNAL) = 140
clock_gettime(CLOCK_MONOTONIC, {13369, 504154425}) = 0
poll([{fd=3, events=POLLIN}], 1, 25000) = 1 ([{fd=3, revents=POLLIN}])
recvmsg(3, {msg_name(0)=NULL, msg_iov(1)=[{"l\2\1\1(\0\0\0\2\0\0\0\17\0\0\0\5\1u\0\2\0\0\0\10\1g\0\1o\0\0"..., 2048}], msg_controllen=0, msg_flags=MSG_CMSG_CLOEXEC}, MSG_CMSG_CLOEXEC) = 72
recvmsg(3, 0xbfd425d0, MSG_CMSG_CLOEXEC) = -1 EAGAIN (Resource temporarily unavailable)
sendmsg(3, {msg_name(0)=NULL, msg_iov(2)=[{"l\1\2\1\10\0\0\0\3\0...

Read more...

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Related questions

Related blueprints

Remote bug watches

Bug watches keep track of this bug in other bug trackers.