init: restart command fails to restart main process when pre-stop stanza exists
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| upstart |
Medium
|
Unassigned | ||
| upstart (Ubuntu) |
Medium
|
James Hunt |
Bug Description
While testing the portmap daemon's recent changes in Ubuntu which cause statd to properly follow it on stop/start, I tried to restart portmap, only to find that it was not in fact restarted.
This happens with any job that has a pre-stop. The reason is that in job_restart(), the code does this:
job_change_goal (job, JOB_STOP);
job_change_goal (job, JOB_START);
job_change_goal will return as soon as the *first* state change has been completed. If there is no pre-stop, that is the change from JOB_RUNNING to JOB_KILLED, which does dutifully kill the main process.
However, if there is a pre-stop, the pre-stop is run, but then job_change_state returns to job_change_goal, which then returns to job_restart, which then changes the goal back to start, which makes the new job_next_state() one that will bypass the change to JOB_KILLED.
This was found on upstart v0.6.7-3 in Ubuntu, but also exists in the code in the current trunk.
TEST CASE
1. create job file /etc/init/
# test-restart-
pre-stop exec /bin/true
exec /bin/sleep 3600
2. run "start test-restart-
3. immediately run "restart test-restart-
Clint Byrum (clint-fewbar) wrote : | #1 |
tags: | added: glucid lucid |
I can confirm I see this behavior in upstart (0.6.5-8) on lucid. On several of our custom tasks, if I change pre-stop to post-stop, then restart behaves normally. I will see if I can craft an upstart job to mimic this behavior using stock tools.
Scott James Remnant (scott) wrote : Re: [Bug 703800] Re: restart command fails to restart main process when pre-stop stanza exists | #3 |
Yeah there are quite a few bugs related to pre-stop scripts, they don't
really work like any other, so there are definite gaps in the coverage
On Thu, Feb 24, 2011 at 8:00 AM, scm <email address hidden> wrote:
> I can confirm I see this behavior in upstart (0.6.5-8) on lucid. On
> several of our custom tasks, if I change pre-stop to post-stop, then
> restart behaves normally. I will see if I can craft an upstart job to
> mimic this behavior using stock tools.
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https:/
>
> Title:
> restart command fails to restart main process when pre-stop stanza
> exists
>
> Status in Upstart:
> New
> Status in “upstart” package in Ubuntu:
> New
>
> Bug description:
> While testing the portmap daemon's recent changes in Ubuntu which
> cause statd to properly follow it on stop/start, I tried to restart
> portmap, only to find that it was not in fact restarted.
>
> This happens with any job that has a pre-stop. The reason is that in
> job_restart(), the code does this:
>
>
> job_change_goal (job, JOB_STOP);
> job_change_goal (job, JOB_START);
>
> job_change_goal will return as soon as the *first* state change has
> been completed. If there is no pre-stop, that is the change from
> JOB_RUNNING to JOB_KILLED, which does dutifully kill the main process.
>
> However, if there is a pre-stop, the pre-stop is run, but then
> job_change_state returns to job_change_goal, which then returns to
> job_restart, which then changes the goal back to start, which makes
> the new job_next_state() one that will bypass the change to
> JOB_KILLED.
>
> This was found on upstart v0.6.7-3 in Ubuntu, but also exists in the
> code in the current trunk.
>
> TEST CASE
>
> 1. create job file /etc/init/
> # test-restart-
>
> pre-stop exec /bin/true
>
> exec /bin/sleep 3600
> 2. run "start test-restart-
> 3. immediately run "restart test-restart-
> pid remains the same when it should be a new pid.
>
Clint Byrum (clint-fewbar) wrote : Re: restart command fails to restart main process when pre-stop stanza exists | #4 |
Having found another instance of this problem, the powernap package in Ubuntu, I am marking this as Confirmed in Upstart, and Triaged in Ubuntu.
Changed in upstart: | |
status: | New → Confirmed |
Changed in upstart (Ubuntu): | |
status: | New → Confirmed |
importance: | Undecided → Medium |
status: | Confirmed → Triaged |
summary: |
- restart command fails to restart main process when pre-stop stanza + init: restart command fails to restart main process when pre-stop stanza exists |
Changed in upstart: | |
status: | Confirmed → Triaged |
importance: | Undecided → Medium |
Clint Byrum (clint-fewbar) wrote : | #5 |
FYI, xinetd in Ubuntu is also affected by this.
tags: | added: testcase |
Changed in upstart (Ubuntu): | |
status: | Triaged → Incomplete |
status: | Incomplete → Confirmed |
David Weber (wb-munzinger) wrote : | #6 |
Any timeline to fix this? It's still in Precise...
Changed in upstart (Ubuntu): | |
status: | Confirmed → Triaged |
Changed in upstart (Ubuntu): | |
assignee: | nobody → James Hunt (jamesodhunt) |
Patrick Hemmer (78luphr0rnk2nuqimstywepozxn9kl19tqh0tx66b5dki1xxsh5mkz9gl-launchpad-a811i2i3ytqlsztthjth0svbccw8inm65tmkqp9sarr553jq53in4xm1m) wrote : | #7 |
It is now 2014. Any chance on getting this fixed?
Mike Power (mpower) wrote : | #8 |
Is there a workaround to get pre-stop and restart to work together? I want to use pre-stop to leave a pid file behind should the process die. I can't do that with a post-stop because there is no way to tell if the post-stop is being called because a stop was issued or if the respawn limit was reached.
Ákos (axos88) wrote : | #9 |
Guys, it has been 5(!!!) years, since this bug has been reported. Any chance for a fix?
Please raise severity to major, to reflect that this is a MAJOR issue with a CORE component in ubuntu, that affects a lot of services that would use pre-stop
Workaround-ish: service xxx restart works, because it invokes stop and then start.
Patrick Hemmer (78luphr0rnk2nuqimstywepozxn9kl19tqh0tx66b5dki1xxsh5mkz9gl-launchpad-a811i2i3ytqlsztthjth0svbccw8inm65tmkqp9sarr553jq53in4xm1m) wrote : | #10 |
Given the age of this bug and the fact that the switch is being made to systemd, I doubt this will ever be fixed.
Paul Draper (paulddraper) wrote : | #11 |
How is systemd relevant?
Precise is still supported. Trusty will be supported until 2019.
Neither of those have systemd.
tags: |
added: trusty removed: glucid lucid |
On further investigation, this does *not* affect jobs that block the 'stopping' event. So I believe this may be related more to the way job_process_run treates pre-stop. According to comments in init/job_process.c:
/* We don't change the state if we're in post-start and there's >process[ PROCESS_ POST_START] PROCESS_ POST_START] > 0)) { >process[ PROCESS_ PRE_STOP] PROCESS_ PRE_STOP] > 0)) {
* a post-start process running, or if we're in pre-stop and
* there's a pre-stop process running; we wait for those to
* finish instead.
*/
if ((job->state == JOB_POST_START)
&& job->class-
&& (job->pid[
state = FALSE;
} else if ((job->state == JOB_PRE_STOP)
&& job->class-
&& (job->pid[
state = FALSE;
}
Later on, if (!state) calls job_change_ goal(job, JOB_RESPAWN) ..
Digging into job_next_state a bit, I found that a goal of JOB_RESPAWN causes job_next_state to change the goal only if state is JOB_POST_START or JOB_PRE_STOP. JOB_POST_START changes the goal to JOB_START, which makes sense. But JOB_PRE_STOP changes the goal to JOB_START as well! This doesn't make much sense to me, and I thought the obvious fix would be to change it to be JOB_STOP, however this caused test_job to fail here:
...with post-start job and a goal of respawn job.c:4129 (test_next_state).
BAD: wrong value for job->goal, expected 1 got 0
at tests/test_
/bin/bash: line 5: 29784 Aborted ${dir}$tst
FAIL: test_job
This was confusing until I realized that the test code is wrong:
/* Check that the next state if we're respawning after a pre-stop
* job is stopping with the goal changed.
*/
TEST_FEATURE ("with post-start job and a goal of respawn");
job->goal = JOB_RESPAWN;
job->state = JOB_PRE_STOP;
TEST_EQ (job_next_state (job), JOB_STOPPING);
TEST_EQ (job->goal, JOB_START);
That should read 'with pre-stop job and a goal of respawn'.
So, this brings to the edge case that I think is causing this problem, and my perceived fix.
If you want a running job to be restarted, it should move from wherever it was, through stopping- >pre-stop- >killed- >post-stop- >starting- > ...
However, this doesn't happen. Instead the goal is changed to STOP, but job_change_goal() does not wait for that to happen, which its docblock would suggest.
Instead, it returns with the state set to JOB_PRE_STOP ..
This is my proposed fix.. though I'm not entirely confident this is the right way to handle it, and I haven't had a chance to test this yet.
=== modified file 'init/job.c'
--- init/job.c 2010-12-14 15:32:41 +0000
+++ init/job.c 2011-01-17 07:56:24 +0000
@@ -1265,6 +1265,12 @@
nih_list_add (&job->blocking, &blocked->entry);
job_change_goal (job, JOB_STOP);
+ if (job->state == JOB_PRE_STOP)
+ {
+ /* hack: should not happen but does */
+ job->goal = JOB_RESPAWN;
+ job_change_goal (job, JOB_STOP);
+ }
job_change_goal (job, JOB_START);
if (! wait)