twistd executive can be paged out by memory-heavy builds leading to timeouts and the build being killed

Bug #677069 reported by Julian Edwards
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
launchpad-buildd
Triaged
Low
Unassigned

Bug Description

If there's a memory-heavy build going on like the kernel, there's a high chance that the manager will get swapped out. This can cause undue delays on the manager side when it's trying to poll the slave.

mlockall() prevents paging out but I don't know how to do this in Python.

Changed in launchpad-buildd:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Robert Collins (lifeless) wrote : Re: [Bug 677069] [NEW] The twistd executive manager process should never get paged out

> mlockall() prevents paging out but I don't know how to do this in
> Python.

ctypes

(read up on that, it will be straight forward.

-Rob

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: The twistd executive manager process should never get paged out

See also bug 586359 which talks about the timeout problems.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

This works:
{{{
In [1]: from ctypes import cdll
In [2]: libc = cdll.LoadLibrary("libc.so.6")
In [3]: libc.mlockall(3)
Out[3]: 0
}}}

However it requires the CAP_IPC_LOCK capability.

Revision history for this message
Robert Collins (lifeless) wrote :

So the patch is straight forward, just need to ensure the script will run with sufficient perms, which is an RT level issue.

tags: added: easy
summary: - The twistd executive manager process should never get paged out
+ twistd executive can be paged out by memory-heavy builds leading to
+ timeouts and the build being killed
Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 677069] Re: twistd executive can be paged out by memory-heavy builds leading to timeouts and the build being killed

On Wed Jan 11 20:12:57 2012, Robert Collins wrote:
> ** Summary changed:
>
> - The twistd executive manager process should never get paged out
> + twistd executive can be paged out by memory-heavy builds leading to timeouts and the build being killed
>

This is getting more important to fix. To work around the problem that
exists mostly on arm buildds the buildd-manager's (global) timeout is
now set to be days. This is suboptimal when we need to clear up stuck
buildds of other architectures!

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Bumping to critical due to increased problems lately, mostly caused by the 2-day timeout configured on buildd-manager. Webops are getting frustrated.

Changed in launchpad-buildd:
importance: High → Critical
Tom Haddon (mthaddon)
tags: added: canonical-losa-lp
Revision history for this message
Adam Conrad (adconrad) wrote :

While this bug/misfeature is still around, its impact in production was mitigated last year via kernel tweaks on the ARM buildds. We're about to get a whole new set of ARM builders with different kernels, and this may or may not crop back up. If it does, I'll see what I can do to make it go away but I'm inclined right now to drop the priority of this bug.

Changed in launchpad-buildd:
importance: Critical → Low
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.