twistd executive can be paged out by memory-heavy builds leading to timeouts and the build being killed

Bug #677069 reported by Julian Edwards on 2010-11-18
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
launchpad-buildd
Low
Unassigned

Bug Description

If there's a memory-heavy build going on like the kernel, there's a high chance that the manager will get swapped out. This can cause undue delays on the manager side when it's trying to poll the slave.

mlockall() prevents paging out but I don't know how to do this in Python.

Changed in launchpad-buildd:
status: New → Triaged
importance: Undecided → High

> mlockall() prevents paging out but I don't know how to do this in
> Python.

ctypes

(read up on that, it will be straight forward.

-Rob

Julian Edwards (julian-edwards) wrote :

This works:
{{{
In [1]: from ctypes import cdll
In [2]: libc = cdll.LoadLibrary("libc.so.6")
In [3]: libc.mlockall(3)
Out[3]: 0
}}}

However it requires the CAP_IPC_LOCK capability.

Robert Collins (lifeless) wrote :

So the patch is straight forward, just need to ensure the script will run with sufficient perms, which is an RT level issue.

tags: added: easy
summary: - The twistd executive manager process should never get paged out
+ twistd executive can be paged out by memory-heavy builds leading to
+ timeouts and the build being killed

On Wed Jan 11 20:12:57 2012, Robert Collins wrote:
> ** Summary changed:
>
> - The twistd executive manager process should never get paged out
> + twistd executive can be paged out by memory-heavy builds leading to timeouts and the build being killed
>

This is getting more important to fix. To work around the problem that
exists mostly on arm buildds the buildd-manager's (global) timeout is
now set to be days. This is suboptimal when we need to clear up stuck
buildds of other architectures!

Julian Edwards (julian-edwards) wrote :

Bumping to critical due to increased problems lately, mostly caused by the 2-day timeout configured on buildd-manager. Webops are getting frustrated.

Changed in launchpad-buildd:
importance: High → Critical
Tom Haddon (mthaddon) on 2012-01-24
tags: added: canonical-losa-lp
Adam Conrad (adconrad) wrote :

While this bug/misfeature is still around, its impact in production was mitigated last year via kernel tweaks on the ARM buildds. We're about to get a whole new set of ARM builders with different kernels, and this may or may not crop back up. If it does, I'll see what I can do to make it go away but I'm inclined right now to drop the priority of this bug.

Changed in launchpad-buildd:
importance: Critical → Low
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers