init: failure to set oom_adj fails process (implement soft value?)

Reported by Stéphane Graber on 2010-12-22
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
upstart
Low
Unassigned

Bug Description

The current version of upstart implements a great feature to avoid having critical services to be killed by the Out Of Memory killer.
The issue with the oom_adj option is that if setting the priority fails, the whole job fails.

This doesn't happen in most cases as upstart runs as root (obviously) and so should have access to values from -17 (== never) to 15.
On containers (at least with OpenVZ), oom_adj is restricted so a container can't start processes that won't be killed by the OOM killer, that's in order to avoid a container to bring down the host. In this case, the value "-17" is invalid and return "Operation not permitted" when the user tries to set it.

In the case of the current ssh (and others) job in Ubuntu, this basically means that the container will start just fine but sshd will never start as setting oom_adj fails and therefore the whole job does.

The attached branch changes oom_adj handling a bit so that if it gets a "Operation not permitted", it'll increase the score and try again, until it reaches 15 in which case it'll just fail as it currently does. Every-time the score is increased, it logs the old and new score as warning.

I've been testing this change in a Ubuntu 10.04 container and it works as expected.

Scott James Remnant (scott) wrote :

Disagree here, if something is listed in an Upstart job it's not advisory - Upstart should not start the job if it is unable to provide the requested environment

Changed in upstart:
status: New → Opinion
Stéphane Graber (stgraber) wrote :

Hmm, ok ... I still think that oom_adj isn't something that should make the whole job to fail as its goal is to make the job more robust (by not getting killed), having it block the whole job seems a bit weird to me.

I guess I won't have much choice then but to update the jobs who show this issue to handle the oom_adj as post-start and not using oom_adj.

Would it be possible to implement some way of having oom_adj being an optional attribute ?
Something like "oom never soft" which would ignore a failure to set oom_adj ?
Or alternatively something like "oom -17 -15" meaning that oom_adj should be -17 but can be set up to -15 if -17 and -16 both fail ?

That's probably a better solution; for things like this having a hard
value and a soft value seems to make sense.

On Wed, Dec 22, 2010 at 2:07 PM, Stéphane Graber <email address hidden> wrote:
> Hmm, ok ... I still think that oom_adj isn't something that should make
> the whole job to fail as its goal is to make the job more robust (by not
> getting killed), having it block the whole job seems a bit weird to me.
>
> I guess I won't have much choice then but to update the jobs who show
> this issue to handle the oom_adj as post-start and not using oom_adj.
>
> Would it be possible to implement some way of having oom_adj being an optional attribute ?
> Something like "oom never soft" which would ignore a failure to set oom_adj ?
> Or alternatively something like "oom -17 -15" meaning that oom_adj should be -17 but can be set up to -15 if -17 and -16 both fail ?
>
> --
> You received this bug notification because you are a member of Upstart
> Developers, which is subscribed to upstart .
> https://bugs.launchpad.net/bugs/693264
>
> Title:
>  Restricted oom_adj causes job to fail starting completely
>
> Status in Upstart:
>  Opinion
>
> Bug description:
>  The current version of upstart implements a great feature to avoid having critical services to be killed by the Out Of Memory killer.
> The issue with the oom_adj option is that if setting the priority fails, the whole job fails.
>
> This doesn't happen in most cases as upstart runs as root (obviously) and so should have access to values from -17 (== never) to 15.
> On containers (at least with OpenVZ), oom_adj is restricted so a container can't start processes that won't be killed by the OOM killer, that's in order to avoid a container to bring down the host. In this case, the value "-17" is invalid and return "Operation not permitted" when the user tries to set it.
>
> In the case of the current ssh (and others) job in Ubuntu, this basically means that the container will start just fine but sshd will never start as setting oom_adj fails and therefore the whole job does.
>
> The attached branch changes oom_adj handling a bit so that if it gets a "Operation not permitted", it'll increase the score and try again, until it reaches 15 in which case it'll just fail as it currently does. Every-time the score is increased, it logs the old and new score as warning.
>
> I've been testing this change in a Ubuntu 10.04 container and it works as expected.
>
>
>

Changing back to "New". It should become Triaged/Wishlist probably, given the request for a new feature (soft and hard values).

Changed in upstart:
status: Opinion → New
summary: - Restricted oom_adj causes job to fail starting completely
+ Restricted oom_adj causes job to fail starting completely; add support
+ for hard/soft value
Changed in upstart:
status: New → Triaged
importance: Undecided → Low
summary: - Restricted oom_adj causes job to fail starting completely; add support
- for hard/soft value
+ init: failure to set oom_adj fails process
summary: - init: failure to set oom_adj fails process
+ init: failure to set oom_adj fails process (implement soft value?)
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers