init: main process can respawn even when the job is stopping

Reported by Tvrtko Ursulin on 2010-04-22
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
upstart
Medium
Dimitri John Ledkov
upstart (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: upstart

Version: 10.4 Beta, latest as of 21.04.2010.

Sometimes Upstart seems to get confused when "respawn" statement is used for a service and respawns when process exits normally on explicit stop event. Log then looks like this:

Apr 22 09:39:08 ubuntu1004alpha3crescendo init: Connection from private client
Apr 22 09:39:08 ubuntu1004alpha3crescendo init: sav-protect goal changed from start to stop
Apr 22 09:39:08 ubuntu1004alpha3crescendo init: sav-protect state changed from running to pre-stop
Apr 22 09:39:08 ubuntu1004alpha3crescendo init: sav-protect pre-stop process (3415)
Apr 22 09:39:09 ubuntu1004alpha3crescendo savd: savd.daemon: Sophos Anti-Virus daemon stopped.
Apr 22 09:39:09 ubuntu1004alpha3crescendo init: sav-protect main process (3379) exited normally
Apr 22 09:39:09 ubuntu1004alpha3crescendo init: sav-protect main process ended, respawning
Apr 22 09:39:09 ubuntu1004alpha3crescendo init: sav-protect goal changed from stop to respawn
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect pre-stop process (3415) exited normally
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect goal changed from respawn to start
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect state changed from pre-stop to stopping
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: Handling stopping event
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect state changed from stopping to killed
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect state changed from killed to post-stop
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect state changed from post-stop to starting
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: Handling starting event
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect state changed from starting to pre-start
Apr 22 09:39:10 ubuntu1004alpha3crescendo init: sav-protect pre-start process (3431)
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect pre-start process (3431) exited normally
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect state changed from pre-start to spawned
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect main process (3453)
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect state changed from spawned to post-start
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect post-start process (3454)
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect post-start process (3454) exited normally
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: sav-protect state changed from post-start to running
Apr 22 09:39:11 ubuntu1004alpha3crescendo init: Handling started event
Apr 22 09:39:15 ubuntu1004alpha3crescendo savd: savd.daemon: Sophos Anti-Virus daemon started.

According to my observations this happens most of the time but not always. Just this morning I got it to stop the service cleanly, then I started, killed the service abnormally and after it respawned stop from then on only respawns it. I though it may be related to abnormal killing before, but even after fresh boot stopping the service only respawns now.

This looks to be the case if you have a pre-stop script; the state will not be "stopping" so the respawn code is allowed to run (it never checks the goal)

Obviously we should never respawn if the goal is "stop"

Changed in upstart (Ubuntu):
status: New → Invalid
summary: - Upstart respawn handling seems sometimes broken
+ init: main process can respawn even when the job is stopping
Changed in upstart:
status: New → Triaged
importance: Undecided → Medium
Tvrtko Ursulin (tvrtko-ursulin) wrote :

Or should it even respawn at all if main process has exited normally? I guess it depends on whether you want service to ever be able to exit without respawnig if goal is not stop.

On Fri, 2010-04-23 at 08:13 +0000, Tvrtko Ursulin wrote:

> Or should it even respawn at all if main process has exited normally? I
> guess it depends on whether you want service to ever be able to exit
> without respawnig if goal is not stop.
>
No, but "normal exit" already covers that - the problem here is that it
checks the state of the job rather than the goal to determine whether to
inhibit respawn

Scott
--
Have you ever, ever felt like this?
Had strange things happen? Are you going round the twist?

Pascal Hartig (passy) wrote :

Apparently, this is the desired behavior. There is a test for this exact case:

 /* Check that we can handle the running process of a respawn job
  * exiting before the pre-stop process finishes. This should
  * mark the job to be respawned when the pre-stop script finishes
  * instead of making any state change.
  */
 TEST_FEATURE ("with respawn of running while pre-stop process");

I don't understand how this is beneficial, though. Isn't pre-stop the right way to stop a process that does not correctly reacts to signals?

alexius ludeman (lexinator) wrote :

This bug is affecting our application.

We configure upstart to respawn so that in production it will automatically restart the application if it crashes. We also use pre-stop to request over http to ask the process to exit. The pre-stop procedure allows the process to finish serving current clients requests, and other clean up duties before it exits. However the combination of these two options then cause the application to be restarted upon running "stop <service>". This seems like an unusual behavior to us.

So now we are faced with either abandoning upstart or put the respawn logic somewhere else.

thanks

Marcus Sundberg (adamel) wrote :

The attached patch makes stopping a job from pre-stop work properly.

Both stop and restart actions work as intended, and the test case in comment #4 still succeeds.

Scott James Remnant (scott) wrote :

Here's a far simpler patch - it just skips the respawn handling entirely (including all the "failed" stuff) if the job is being stopped anyway.

I think this is probably "right"

Changed in upstart:
assignee: nobody → Dmitrijs Ledkovs (xnox)
Matthew Hall (mhall-9) wrote :

This bug is 3 years old and has a patch. Can we patch it and get this over with?

Changed in upstart:
status: Triaged → Fix Committed
Dimitri John Ledkov (xnox) wrote :

Part of upstart 1.10 release.

Changed in upstart:
status: Fix Committed → Fix Released
Changed in upstart (Ubuntu):
status: Invalid → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers