service becomes hanging in state start/running without any section forever

Bug #1268029 reported by Alex Petrov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart
New
Undecided
Unassigned

Bug Description

Was reproduced with upstart 1.5 and upstart 1.10 (linux mint 16).
If call `initctl start` on job (in my case service) which has a goal "stop" and only one section "post-start" then after terminating of "post-start" section the job will be in state "start/ready" without any section forever.
Maybe it is the same issue that is triaged in 712351.

Way to reproduce :
> cat /etc/init/test.conf
script
    sleep 1
    exit 1
end script

post-start script
    sleep 30
end script

> initctl start test
initctl: Job failed to start
> initctl status test
test stop/post-start, (post-start) process 24658 # goal is stop, only post-start is running
> start test # should start it before post-start exits, `initct start` will hang here till post-start is running
test start/running
> status test
test start/running
...
> initctl stop test
test stop/waiting

# logs with `initctl log-priority info`
# job is started with `initctl start test`
[20521.715706] init: test goal changed from stop to start
[20521.715763] init: test state changed from waiting to starting
[20521.715932] init: test state changed from starting to security
[20521.717156] init: test state changed from security to pre-start
[20521.717400] init: test state changed from pre-start to spawned
[20521.718937] init: test main process (24657)
[20521.718994] init: test state changed from spawned to post-start
[20521.721216] init: test post-start process (24658)
# `exit 1` in main
[20522.723356] init: test main process (24657) terminated with status 1
[20522.723885] init: test goal changed from start to stop
# `initctl start test` is called again
[20536.879340] init: test goal changed from stop to start
# post-start terminates
[20551.722367] init: test post-start process (24658) exited normally
# all sections are terminated, goal is start, state is running
[20551.722584] init: test state changed from post-start to running
[20551.723122] init: startpar-bridge (test--started) goal changed from stop to start
[20551.723314] init: startpar-bridge (test--started) state changed from waiting to starting
[20551.723827] init: startpar-bridge (test--started) state changed from starting to security
[20551.723961] init: startpar-bridge (test--started) state changed from security to pre-start
[20551.724024] init: startpar-bridge (test--started) state changed from pre-start to spawned
[20551.730947] init: startpar-bridge (test--started) main process (24663)
[20551.731033] init: startpar-bridge (test--started) state changed from spawned to post-start
[20551.731579] init: startpar-bridge (test--started) state changed from post-start to running
[20551.732965] init: startpar-bridge (test--started) main process (24663) exited normally
[20551.733048] init: startpar-bridge (test--started) goal changed from start to stop
[20551.733255] init: startpar-bridge (test--started) state changed from running to stopping
[20551.733539] init: startpar-bridge (test--started) state changed from stopping to killed
[20551.733724] init: startpar-bridge (test--started) state changed from killed to post-stop
[20551.733910] init: startpar-bridge (test--started) state changed from post-stop to waiting
# it's still in this state until isn't stopped manually
[21435.203642] init: job_class_register: Registered job /com/ubuntu/Upstart/jobs/test
[21435.203661] init: job_register: Registered instance /com/ubuntu/Upstart/jobs/test/_
[21435.206662] init: test goal changed from start to stop

Thanks.

Alex Petrov (alexxxbt)
summary: - service is in state start/running without any section
+ service is hanging in state start/running without any section forever
summary: - service is hanging in state start/running without any section forever
+ service becomes hanging in state start/running without any section
+ forever
Revision history for this message
Alex Petrov (alexxxbt) wrote :

I checked code of upstart 1.10, but upstart 1.11 has the same code in init part though.
Guess here some problems with a finite state of job.
Here is my investigations based on upstart 1.10 code and logs from bug description.

# goal - start, stop/post-start
[20551.722367] init: test post-start process (24658) exited normally

job_process.c/job_process_terminated
 1688| case PROCESS_POST_START:
 1689| nih_assert (job->state == JOB_POST_START);
...
 1810| if (state)
 1811| job_change_state (job, job_next_state (job));

    job.c/job_next_state
      651| case JOB_POST_START:
      652| switch (job->goal) {
      653| case JOB_STOP:
      654| return JOB_STOPPING;
      655| case JOB_START:
      656| return JOB_RUNNING; <---
      657| case JOB_RESPAWN:
      658| job_change_goal (job, JOB_START);
      659| return JOB_STOPPING;
      660| default:
      661| nih_assert_not_reached ();
      662| }

job.c/job_change_state (job * JOB_POST_START, JOB_RUNNING)
  351| nih_info (_("%s state changed from %s to %s"), job_name (job),
  352| job_state_name (job->state), job_state_name (state));
    --> test state changed from post-start to running
...
  354| old_state = job->state; --> JOB_POST_START
  355| job->state = state; --> JOB_RUNNING
...
  460| case JOB_RUNNING:
  461| nih_assert (job->goal == JOB_START);
  462| nih_assert ((old_state == JOB_POST_START)
  463| || (old_state == JOB_PRE_STOP));
  464|
  465| if (old_state == JOB_PRE_STOP) { <-- no, POST_START
  466| /* Throw away the stop environment */
  467| if (job->stop_env) {
  468| nih_unref (job->stop_env, job);
  469| job->stop_env = NULL;
  470| }
  471|
  472| /* Cancel the stop attempt */
  473| job_finished (job, FALSE);
  474| } else {
  475| job_emit_event (job); --> JOB_STARTED_EVENT
                                             --> well, but we haven't started main
  476|
  477| /* If we're not a task, our goal is to be
  478| * running.
  479| */
  480| if (! job->class->task)
  481| job_finished (job, FALSE);
  482| }

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.