Comment 3 for bug 693264

Revision history for this message
Scott James Remnant (scott) wrote : Re: [Bug 693264] Re: Restricted oom_adj causes job to fail starting completely

That's probably a better solution; for things like this having a hard
value and a soft value seems to make sense.

On Wed, Dec 22, 2010 at 2:07 PM, Stéphane Graber <email address hidden> wrote:
> Hmm, ok ... I still think that oom_adj isn't something that should make
> the whole job to fail as its goal is to make the job more robust (by not
> getting killed), having it block the whole job seems a bit weird to me.
>
> I guess I won't have much choice then but to update the jobs who show
> this issue to handle the oom_adj as post-start and not using oom_adj.
>
> Would it be possible to implement some way of having oom_adj being an optional attribute ?
> Something like "oom never soft" which would ignore a failure to set oom_adj ?
> Or alternatively something like "oom -17 -15" meaning that oom_adj should be -17 but can be set up to -15 if -17 and -16 both fail ?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/693264
>
> Title:
>  Restricted oom_adj causes job to fail starting completely
>
> Status in Upstart:
>  Opinion
>
> Bug description:
>  The current version of upstart implements a great feature to avoid having critical services to be killed by the Out Of Memory killer.
> The issue with the oom_adj option is that if setting the priority fails, the whole job fails.
>
> This doesn't happen in most cases as upstart runs as root (obviously) and so should have access to values from -17 (== never) to 15.
> On containers (at least with OpenVZ), oom_adj is restricted so a container can't start processes that won't be killed by the OOM killer, that's in order to avoid a container to bring down the host. In this case, the value "-17" is invalid and return "Operation not permitted" when the user tries to set it.
>
> In the case of the current ssh (and others) job in Ubuntu, this basically means that the container will start just fine but sshd will never start as setting oom_adj fails and therefore the whole job does.
>
> The attached branch changes oom_adj handling a bit so that if it gets a "Operation not permitted", it'll increase the score and try again, until it reaches 15 in which case it'll just fail as it currently does. Every-time the score is increased, it logs the old and new score as warning.
>
> I've been testing this change in a Ubuntu 10.04 container and it works as expected.
>
>
>