service becomes hanging in state start/running without any section forever
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
upstart |
New
|
Undecided
|
Unassigned |
Bug Description
Was reproduced with upstart 1.5 and upstart 1.10 (linux mint 16).
If call `initctl start` on job (in my case service) which has a goal "stop" and only one section "post-start" then after terminating of "post-start" section the job will be in state "start/ready" without any section forever.
Maybe it is the same issue that is triaged in 712351.
Way to reproduce :
> cat /etc/init/test.conf
script
sleep 1
exit 1
end script
post-start script
sleep 30
end script
> initctl start test
initctl: Job failed to start
> initctl status test
test stop/post-start, (post-start) process 24658 # goal is stop, only post-start is running
> start test # should start it before post-start exits, `initct start` will hang here till post-start is running
test start/running
> status test
test start/running
...
> initctl stop test
test stop/waiting
# logs with `initctl log-priority info`
# job is started with `initctl start test`
[20521.715706] init: test goal changed from stop to start
[20521.715763] init: test state changed from waiting to starting
[20521.715932] init: test state changed from starting to security
[20521.717156] init: test state changed from security to pre-start
[20521.717400] init: test state changed from pre-start to spawned
[20521.718937] init: test main process (24657)
[20521.718994] init: test state changed from spawned to post-start
[20521.721216] init: test post-start process (24658)
# `exit 1` in main
[20522.723356] init: test main process (24657) terminated with status 1
[20522.723885] init: test goal changed from start to stop
# `initctl start test` is called again
[20536.879340] init: test goal changed from stop to start
# post-start terminates
[20551.722367] init: test post-start process (24658) exited normally
# all sections are terminated, goal is start, state is running
[20551.722584] init: test state changed from post-start to running
[20551.723122] init: startpar-bridge (test--started) goal changed from stop to start
[20551.723314] init: startpar-bridge (test--started) state changed from waiting to starting
[20551.723827] init: startpar-bridge (test--started) state changed from starting to security
[20551.723961] init: startpar-bridge (test--started) state changed from security to pre-start
[20551.724024] init: startpar-bridge (test--started) state changed from pre-start to spawned
[20551.730947] init: startpar-bridge (test--started) main process (24663)
[20551.731033] init: startpar-bridge (test--started) state changed from spawned to post-start
[20551.731579] init: startpar-bridge (test--started) state changed from post-start to running
[20551.732965] init: startpar-bridge (test--started) main process (24663) exited normally
[20551.733048] init: startpar-bridge (test--started) goal changed from start to stop
[20551.733255] init: startpar-bridge (test--started) state changed from running to stopping
[20551.733539] init: startpar-bridge (test--started) state changed from stopping to killed
[20551.733724] init: startpar-bridge (test--started) state changed from killed to post-stop
[20551.733910] init: startpar-bridge (test--started) state changed from post-stop to waiting
# it's still in this state until isn't stopped manually
[21435.203642] init: job_class_register: Registered job /com/ubuntu/
[21435.203661] init: job_register: Registered instance /com/ubuntu/
[21435.206662] init: test goal changed from start to stop
Thanks.
summary: |
- service is in state start/running without any section + service is hanging in state start/running without any section forever |
summary: |
- service is hanging in state start/running without any section forever + service becomes hanging in state start/running without any section + forever |
I checked code of upstart 1.10, but upstart 1.11 has the same code in init part though.
Guess here some problems with a finite state of job.
Here is my investigations based on upstart 1.10 code and logs from bug description.
# goal - start, stop/post-start
[20551.722367] init: test post-start process (24658) exited normally
job_process. c/job_process_ terminated
1688| case PROCESS_POST_START:
1689| nih_assert (job->state == JOB_POST_START);
...
1810| if (state)
1811| job_change_state (job, job_next_state (job));
job. c/job_next_ state not_reached ();
651| case JOB_POST_START:
652| switch (job->goal) {
653| case JOB_STOP:
654| return JOB_STOPPING;
655| case JOB_START:
656| return JOB_RUNNING; <---
657| case JOB_RESPAWN:
658| job_change_goal (job, JOB_START);
659| return JOB_STOPPING;
660| default:
661| nih_assert_
662| }
job.c/job_ change_ state (job * JOB_POST_START, JOB_RUNNING)
--> well, but we haven't started main
351| nih_info (_("%s state changed from %s to %s"), job_name (job),
352| job_state_name (job->state), job_state_name (state));
--> test state changed from post-start to running
...
354| old_state = job->state; --> JOB_POST_START
355| job->state = state; --> JOB_RUNNING
...
460| case JOB_RUNNING:
461| nih_assert (job->goal == JOB_START);
462| nih_assert ((old_state == JOB_POST_START)
463| || (old_state == JOB_PRE_STOP));
464|
465| if (old_state == JOB_PRE_STOP) { <-- no, POST_START
466| /* Throw away the stop environment */
467| if (job->stop_env) {
468| nih_unref (job->stop_env, job);
469| job->stop_env = NULL;
470| }
471|
472| /* Cancel the stop attempt */
473| job_finished (job, FALSE);
474| } else {
475| job_emit_event (job); --> JOB_STARTED_EVENT
476|
477| /* If we're not a task, our goal is to be
478| * running.
479| */
480| if (! job->class->task)
481| job_finished (job, FALSE);
482| }