init: expect fork/daemon do not work as expected

Bug #264711 reported by Rolf Offermanns on 2008-09-04
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart
Medium
Scott James Remnant (Canonical)
upstart (Ubuntu)
Undecided
Unassigned

Bug Description

I use upstart-0.5.0.

I see the same problem as described by Adam Sampson on the upstart-devel list:
https://lists.ubuntu.com/archives/upstart-devel/2008-August/000738.html

I try to start the dbus daemon with the "expect daemon" stanza (I also tried "fork"), but everytime I get the "main process killed by TRAP signal" message on the console.

The dbus job is in the "running" state after that! It this intended?

My job looks like this:
pre-start script
        /bin/dbus-uuidgen --ensure
end script

respawn
expect daemon
exec /bin/dbus-daemon --system

Kelvin Chan (kelvin-t-chan) wrote :

I got similar problem.

I found that the dbus-daemon did not got respawn when it was kill. After I killed dbus-daemon, the dbus-daemon pid gone (not shown in ps command), but upstart initctl reported that dbus-daemon is still running.

Here are the steps I performed:

1) I listed the dbus-daemon pid to make sure it's running, and it had pid 1031.
bash-3.00# ps axg | grep dbus-daemon
 1031 ? Ss 0:00 /usr/bin/dbus-daemon --fork --system
 1060 ? R+ 0:00 grep dbus-daemon

2) Then checked whether the job is running:
bash-3.00# /opt/upstart/sbin/initctl status systembus
systembus running

3) Killed the dbus-daemon process, and verified pid 1031 is gone:
bash-3.00# pidof dbus-daemon | xargs kill
bash-3.00# ps axg | grep dbus-daemon
 1066 ? R+ 0:00 grep dbus-daemon

4) After the kill above, upstart did not respawn dbus-daemon. Upstart think it's still alive:
bash-3.00# /opt/upstart/sbin/initctl status systembus
systembus running

5) Even when I had explicitly stop & start the job again, I did not see the dbus-daemon launched:
bash-3.00# /opt/upstart/sbin/initctl stop systembus
bash-3.00# /opt/upstart/sbin/initctl status systembus
systembus not running
bash-3.00# ps axg | grep dbus-daemon
bash-3.00# /opt/upstart/sbin/initctl start systembus
init: systembus main process (1076) killed by TRAP signal
init: systembus main process ended, respawning
bash-3.00# /opt/upstart/sbin/initctl status systembus
systembus running
bash-3.00# ps axg | grep dbus-daemon
 1079 ? R+ 0:00 grep dbus-daemon

Here is my 'systembus' job script that start the dbus-daemon:
bash-3.00# cat /etc/init/jobs.d/systembus
# DBUS system daemon
#
# started until it is shut down again.

start on stopped rc2 or stopped rc3 or stopped rc4 or stopped rc5

stop on runlevel 0 or runlevel 6

export PATH=/opt/upstart/sbin:$PATH

respawn
respawn limit 4 60

expect fork
exec /usr/bin/dbus-daemon --fork --system

Jeff Oliver (jeffrey-oliver) wrote :

I wrote a simple job file for xinetd.

# xinetd - on-demand internet services
#

start on started network
stop on stopping network

expect fork

exec /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid

xinetd ends up forking, and from what I can tell, I can't prevent it. when attempting to start it i get:

2008-10-09T02:38:23.103593+00:00 fs03 init: xinetd goal changed from stop to start
2008-10-09T02:38:23.103593+00:00 fs03 init: xinetd state changed from waiting to starting
2008-10-09T02:38:23.103593+00:00 fs03 init: Handling starting event
2008-10-09T02:38:23.103593+00:00 fs03 init: xinetd state changed from starting to pre-start
2008-10-09T02:38:23.104057+00:00 fs03 init: xinetd state changed from pre-start to spawned
2008-10-09T02:38:23.104347+00:00 fs03 init: xinetd main process (18133)
2008-10-09T02:38:23.104481+00:00 fs03 init: xinetd main process (18133) killed by TRAP signal
2008-10-09T02:38:23.105130+00:00 fs03 init: xinetd goal changed from start to stop
2008-10-09T02:38:23.105234+00:00 fs03 init: xinetd state changed from spawned to stopping
2008-10-09T02:38:23.105450+00:00 fs03 init: Handling stopping event
2008-10-09T02:38:23.105642+00:00 fs03 init: xinetd state changed from stopping to killed
2008-10-09T02:38:23.105680+00:00 fs03 init: xinetd state changed from killed to post-stop
2008-10-09T02:38:23.105680+00:00 fs03 init: xinetd state changed from post-stop to waiting
2008-10-09T02:38:23.106179+00:00 fs03 init: Handling stopped event

Which is all fine and good, except that the process was killed by SIGTRAP. It's almost like a signal handler is not installed like it ought to be.

Jeff Oliver (jeffrey-oliver) wrote :

Turns out you can prevent both dbus-daemon and xinetd from actually forking. dbus-daemon needs configuration change (/etc/dbus-1/system.conf) and xinetd needs a -dontfork added to the command line.

This doesn't fix the apparent incapability of upstart to properly trace a child process through a fork.

I added some debug statements to the code the help solve this problem. It appears there is some wrong information out there about how the waitid() syscall works. In many places i see the use of CLD_TRAPPED, however the man page wait(2) says nothing about it. I do see it mentioned in siginfo.h (kernel headers).

So I added a simple nih_warn() at the top of the nih_child_poll() function. This is what i see for my xinetd example:

2008-10-09T21:36:02.187937+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188896+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: xinetd goal changed from stop to start
2008-10-09T21:36:02.188931+00:00 fs03 init: xinetd state changed from waiting to starting
2008-10-09T21:36:02.188931+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.188931+00:00 fs03 init: Handling starting event
2008-10-09T21:36:02.189743+00:00 fs03 init: xinetd state changed from starting to pre-start
2008-10-09T21:36:02.189778+00:00 fs03 init: xinetd state changed from pre-start to spawned
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process (19246)
2008-10-09T21:36:02.189837+00:00 fs03 init: waitid - pid=19246, signo=17, code=2, status=5
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process (19246) killed by TRAP signal
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd main process ended, respawning
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd state changed from spawned to post-start
2008-10-09T21:36:02.189837+00:00 fs03 init: xinetd state changed from post-start to running
2008-10-09T21:36:02.190748+00:00 fs03 init: waitid - pid=19245, signo=17, code=1, status=0
2008-10-09T21:36:02.190748+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-09T21:36:02.190748+00:00 fs03 init: Handling started event
2008-10-09T21:36:02.191710+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0

This indicates that the code is a CLD_KILLED and the status is SIGTRAP. There is CLD_TRAPPED used here. The question I have, then, is this due to the kernel im using (2.6.24)? Is a newer kernel/older kernel behaving differently?

Jeff Oliver (jeffrey-oliver) wrote :

After playing around with it more and tracing through the code, the child fork state machine doesn't work right if the job file does not have the 'session leader' stanza.

Without session leader, the new process is not a directly forked from upstart, so the traps seem to get lost, or at least incorrectly interpreted. The trace gets setup on the correct pid, but the signals go to the parent, which is not upstart.

In any case, it looks like it'd be a mess to try and trace non-session leaders, if that was even a legitimate use case. Its just easier to set session leader on the jobs that need to be traced.

Is it possible to imply session leader if expect fork exists? Or is my logic all wrong?

Btw, a dump of what a working fork looks like: I used portmap this time:
2008-10-10T02:13:25.086613+00:00 fs03 init: portmap goal changed from stop to start
2008-10-10T02:13:25.086645+00:00 fs03 init: portmap state changed from waiting to starting
2008-10-10T02:13:25.086677+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-10T02:13:25.086709+00:00 fs03 init: Handling starting event
2008-10-10T02:13:25.086742+00:00 fs03 init: portmap state changed from starting to pre-start
2008-10-10T02:13:25.086774+00:00 fs03 init: portmap state changed from pre-start to spawned
2008-10-10T02:13:25.087224+00:00 fs03 init: setup trace on pid 0
2008-10-10T02:13:25.087400+00:00 fs03 init: portmap main process (3838)
2008-10-10T02:13:25.087514+00:00 fs03 init: waitid - pid=3838, signo=17, code=4, status=5
2008-10-10T02:13:25.087614+00:00 fs03 init: portmap main process (3838) trapped
2008-10-10T02:13:25.088015+00:00 fs03 init: waitid - pid=3839, signo=17, code=4, status=19
2008-10-10T02:13:25.088114+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-10T02:13:25.088245+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-10T02:13:25.088358+00:00 fs03 init: waitid - pid=3838, signo=17, code=4, status=261
2008-10-10T02:13:25.088460+00:00 fs03 init: portmap main process (3838) became new process (3839)
2008-10-10T02:13:25.088560+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-10T02:13:25.088707+00:00 fs03 init: waitid - pid=3838, signo=17, code=1, status=0
2008-10-10T02:13:25.088810+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-10T02:13:50.086811+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0

Jeff Oliver (jeffrey-oliver) wrote :

Still something wrong with the tracing state machine.

From my previous log, it looks like things are coming out of order compared to how the state machine is expecting it. So here's my attempt of explaining the state machine...correct me if im wrong.

1) Upstart forks and sets up the trace on the new pid before exec'ing the new process. Set state to NEW.
2) Exec causes SIGTRAP, job_process_trace_new configures ptrace to look for fork's and exec's and continues the trace. Set state to NORMAL.
3) Process hits fork, upstart gets SIGTRAP with more information. job_process_trace_fork retrieves new pid info and saves it. Detaches from previous pid. Set state to NEW_CHILD.
4) The new child causes SIGSTOP. job_process_trace_new_child ultimately detaches from the new child. Set state to NONE.

What I'm seeing however, as seen in the previous log, is that the SIGSTOP and SIGTRAP at the point where the new fork happened is coming out of order. Basically, SIGSTOP is first (status 19) then SIGTRAP (status 261) second. Which then breaks the statemachine. The end result is that the final process is never allowed to continue is stays in the trace stopped state.

I think the state machine needs to be rewritten to assume those signals could happen in any order. The problem there is now you have to be careful of the variable trace_state, it needs to be protected.

Jeff Oliver (jeffrey-oliver) wrote :
Download full text (4.9 KiB)

Ok, i think I have something here. I feel like its a major hack, but as far as I can tell it works. Here's what I did:

I used xinetd as my sample job with the following job file:

# xinetd -
#

start on started network
stop on stopping network

respawn
session leader
expect fork

exec /usr/sbin/xinetd -stayalive -pidfile /var/run/xinetd.pid

***********

I made some changes to the code to handle what I was seeing when attempting to run this job:

1) Changed the tracing state machine.
Instead of using distinct states, i changes the trace_state variable to be a set of bits that indicate the progress through the state machine. The field is zero if the state machine is inactive, bit 0 on if active, bit 1 on if we've hit the first exec, bit 2 on if we've hit the fork in the parent process, and bit 3 on if we've hit the fork in the child process. When bit2 and bit 3 are both on, i figure the process has forked enough and we continue on setting the job state to the next state. That seems to work fine, with only 1 really visible problem....

2) Changes how to search for jobs in the pid list. The job hash table has a list of pids that upstart is currently keeping track of. When a signal is caught from a child, if the pid is not in the list, it ignores it. Unfortunately, in my situation, i see a signal for a pid we're not keeping track of yet, since the signal to transition to the new pid has not been processed yet. As a result, I added a function that searches for a pid's parent pid in the job list. If that is found, then we'll handle that signal as well, since it's related to the job we're working on. There is another catch to that...

3) A place to keep the pid of the newly forked process...for that i added another job->pid[] entry to keep track of the newly forked pid. This seems awefully kludgey, but at the moment, I can't think of a better way to deal with the problem. Somehow, the tracing state machine needs to anticipate the pid of the new process, and it obviously cannot do that...it can only anticipate that there will be a pid at some point.

In any case, it seems to work the way it ought to. I can start a forking process, let it fork, then kill it and see it get restarted automatically:

2008-10-15T21:23:44.061227+00:00 fs03 xinetd[19501]: Exiting...
2008-10-15T21:23:44.061624+00:00 fs03 init: waitid - pid=19501, signo=17, code=1, status=0
2008-10-15T21:23:44.061658+00:00 fs03 init: xinetd main process (19501) exited normally
2008-10-15T21:23:44.061691+00:00 fs03 init: xinetd main process ended, respawning
2008-10-15T21:23:44.061724+00:00 fs03 init: xinetd state changed from running to stopping
2008-10-15T21:23:44.061756+00:00 fs03 init: waitid - pid=0, signo=0, code=0, status=0
2008-10-15T21:23:44.061787+00:00 fs03 init: Handling stopping event
2008-10-15T21:23:44.061820+00:00 fs03 init: xinetd state changed from stopping to killed
2008-10-15T21:23:44.061853+00:00 fs03 init: xinetd state changed from killed to post-stop
2008-10-15T21:23:44.061885+00:00 fs03 init: xinetd state changed from post-stop to starting
2008-10-15T21:23:44.061916+00:00 fs03 init: Handling starting event
2008-10-15T21:23:44.061948+00:00 fs03 init: xi...

Read more...

Changed in upstart:
importance: Undecided → Medium

Thanks for all your work Jeff, looks like you've managed to get all of this nailed.

With your branch, do you find that it works for non-session leaders as well?

Changed in upstart:
status: New → Triaged
summary: - expect fork/daemon do not work as expected
+ init: expect fork/daemon do not work as expected
Jeff Oliver (jeffrey-oliver) wrote :

I don't know. Once i had figured out that the process had to be a session leader in order to get the fork tracking to work, I ended up changing my job files to ensure that it did what it was supposed to. My environment is very controlled, so I don't have to deal with anyone messing with writing job files that don't ultimately work.

I figure I could do some more testing. Also, is there some unit test code that deals with this section of the code? Maybe updating the unit test code would be in order. I haven't looked at this stuff in many months...i'll have to get my head wrapped around it again.

Jeff Oliver (jeffrey-oliver) wrote :

Do you want me to post a patch? That way people can comment on the patch instead of having to dig in my tree.

On Thu, 2009-06-18 at 20:20 +0000, Jeff Oliver wrote:

> I don't know. Once i had figured out that the process had to be a
> session leader in order to get the fork tracking to work, I ended up
> changing my job files to ensure that it did what it was supposed to. My
> environment is very controlled, so I don't have to deal with anyone
> messing with writing job files that don't ultimately work.
>
I suspect I simply never tested it - one of the problems with the test
suite is that, because it's not pid 1, it actually has to run all the
tests as "session leader"!

In theory, pid 1 can ptrace() any process it feels like, but it might be
failing to follow the double-fork that init does itself to fork
non-session-leaders.

The rationale for why init double-forks its children where sysvinit
doesn't is to make processes more comfortable about staying in the
foreground. Without the double-fork, the process would be a session
leader, so if it opened any tty it would control it.

This is a side-effect of bad daemon support.

Now, both problems can be killed with a nice stone - dropping the
ptrace() madness and instead switching to use the netlink proc connector
as planned.

This means Upstart can freely supervise any process on the system
without tracing, and it also means the test suite can do so as well
(albeit with a setuid() helper right now).

Plus this code branch would mean that there would be no hassle with any
process becoming a daemon, Upstart would follow anyway. This means that
we wouldn't need to enforce "stay in the foreground", and it'd make
sense again for the default to be services being session leaders.

> I figure I could do some more testing. Also, is there some unit test
> code that deals with this section of the code? Maybe updating the unit
> test code would be in order. I haven't looked at this stuff in many
> months...i'll have to get my head wrapped around it again.
>
There's a lot of unit tests for this section of code, I try and keep
Upstart as near to 100% coverage as possible. See
init/tests/test_job_process.c test_handler()

Scott
--
Scott James Remnant
<email address hidden>

On Thu, 2009-06-18 at 20:21 +0000, Jeff Oliver wrote:

> Do you want me to post a patch? That way people can comment on the
> patch instead of having to dig in my tree.
>
If you've got a patch handy.

Don't spend too much time though, unless you think otherwise, I'm
increasingly convinced that the ptrace() approach was the wrong road to
follow.

Scott
--
Scott James Remnant
<email address hidden>

Jeff Oliver (jeffrey-oliver) wrote :

I attached my patch for this "fix".

I'm not sure that changing from ptrace() to the netlink connector changes the complexity of process management. You still need to watch for the same things and make sure the pid table is kept up to date as a process is followed along. In other words, you still need to look for a fork (or two, or three), look for an exec, etc. Hopefully the netlink information is more predictable than the ptrace() information.

You're definitely right that the SIGTRAP and SIGSTOP across the fork() are coming in out of order, I see the child event before the TRAP event as well.

I think the fix is even easier than using state flags. At the point we get trap the fork, one of the two following is true:

 - the child process does not yet fully exist and trapping its initial SIGSTOP has yet to happen

 - the child process already exists, and we've missed its initial SIGSTOP

In the former case, we don't need to do anything; in the latter case we need to call job_process_trace_new_child() for the process.

But that's easy; we just try and call ptrace() on the new pid, if it returns with an error we can assume that the process doesn't exist yet, and return. If ptrace() succeeds, we just fall out of the bottom of job_process_trace_fork() into job_process_trace_new_child() and handle the already missed case.

The patch is just to add the following to the bottom of job_process_trace_fork ()

    if (ptrace (PTRACE_SET_OPTIONS, job->pid[process], NULL, 0) < 0))
            return;

    job_process_trace_new_child (job, process);

Changed in upstart:
status: Triaged → Fix Committed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package upstart - 0.6.0-5

---------------
upstart (0.6.0-5) karmic; urgency=low

  * Cherry-pick patch from -r1188 to fix "expect fork" and "expect daemon"
    LP: #264711.

 -- Scott James Remnant <email address hidden> Tue, 14 Jul 2009 15:19:17 +0100

Changed in upstart (Ubuntu):
status: New → Fix Released
Jeff Oliver (jeffrey-oliver) wrote :

Is the plan still to rework this code to use the netlink connector instead of using ptrace()?

On Wed, 2009-07-15 at 00:00 +0000, Jeff Oliver wrote:

> Is the plan still to rework this code to use the netlink connector
> instead of using ptrace()?
>
Absolutely!

Scott
--
Scott James Remnant
<email address hidden>

Changed in upstart:
assignee: nobody → Scott James Remnant (scott)
milestone: none → 0.6.1

Fixed in Upstart 0.6.1

0.6.1 2009-07-16 "Born in the wagon of a travelling show"

        * D-Bus 1.2.16 is now required.

        * Updated ptrace() code to handle receiving the trapped stopped
          child _before_ we receive the trapped fork() syscall, which
          seems to be the most common way the race goes when running
          as init. This seems to make "expect fork/daemon" largely work.
          (Bug: #264711)

        * runlevel will now output "unknown" and exit with an error when
          called before the first telinit call. (Bug: #400248)

        * runlevel will prefix error messages with the utmp filename being
          used. (Bug: #400241)

Changed in upstart:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers