upstart should log better when respawning to help the admin understand what's happening

Bug #1210242 reported by Dan Kegel
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
upstart (Ubuntu)
New
Wishlist
Unassigned

Bug Description

This is an enhancement request. I have not yet checked 13.04 or upstream to see if things have changes since 12.04.

Consider the job false.conf:

respawn
#respawn limit 10 15
script
  sleep 1
  false
end script

First problem:
After "initctl start false", this job will respawn forever, because the default spawn limit of "10 5"
does not consider jobs that take one second to be runaway jobs, even if they always fail.

Second problem:
with that job still respawning, uncomment the respawn limit line.
It will continue respawning forever. Wasn't upstart supposed to watch config files?

Third problem:
watching syslog only tells you that a job ended with status 1 and is being respawned.
It would be lovely if that line showed how much of the spawn limit was left, to give upstart novices a little help in understanding what's going on.

All three of these issues hit me, a relative upstart novice, this morning. It took about an hour to get unconfused.

ProblemType: Bug
DistroRelease: Ubuntu 12.04
Package: upstart 1.5-0ubuntu7.2
ProcVersionSignature: Ubuntu 3.2.0-49.75-generic 3.2.46
Uname: Linux 3.2.0-49-generic x86_64
NonfreeKernelModules: nvidia
ApportVersion: 2.0.1-0ubuntu17.3
Architecture: amd64
Date: Thu Aug 8 10:46:26 2013
MarkForUpload: True
ProcEnviron:
 LANGUAGE=en_US:
 TERM=xterm
 PATH=(custom, no user)
 LANG=en_US.UTF-8
 SHELL=/bin/bash
SourcePackage: upstart
UpgradeStatus: No upgrade log present (probably fresh install)

Revision history for this message
Dan Kegel (dank) wrote :
Revision history for this message
Steve Langasek (vorlon) wrote :

> After "initctl start false", this job will respawn forever, because
> the default spawn limit of "10 5" does not consider jobs that
> take one second to be runaway jobs, even if they always fail.

Yes, because there's no magic value for the respawn limit that would allow upstart to automatically detect failing jobs in all cases. Thus, the current value is equally correct as any other for this purpose, and won't be changed.

The primary purpose of having a default respawn limit is to put the brakes on runaway jobs. A job that sleeps for a second and then respawns is not going to kill your system, so upstart doesn't need to intervene.

> with that job still respawning, uncomment the respawn limit line.
> It will continue respawning forever. Wasn't upstart supposed to watch config files?

Upstart does watch the config files, but the new config is only *applied* once the job is *stopped* (you wouldn't want a rewrite of a job file while the job is running to result in the wrong post-stop script being run for the current service). Since this job is respawning, it's never actually stopped and the config changes are not applied. 'stop $service && start $service' should be sufficient to reset.

> watching syslog only tells you that a job ended with status 1 and is being respawned.
> It would be lovely if that line showed how much of the spawn limit was left, to give
> upstart novices a little help in understanding what's going on.

This is the only part that seems to be a real actionable issue in upstart.

Changed in upstart (Ubuntu):
importance: Undecided → Wishlist
summary: - respawn behavior confusing, default limit insufficient
+ upstart should log better when respawning to help the admin understand
+ what's happening
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.