Bazaar Version Control System

fails to build recipe with "bzr: out of memory"

Reported by Martin Pool on 2011-03-31
60
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Bazaar
High
Unassigned
Launchpad itself
Critical
Unassigned

Bug Description

https://launchpadlibrarian.net/67824936/buildlog.txt.gz

 > Retrieving 'lp:maria' to put at '/home/buildd/build-e0819dac0d749b3228f2dfbb915ad4c43c9a16fe/chroot-autobuild/home/buildd/work/tree/recipe-{debupstream}-0~{revno}'.
> bzr: out of memory

This is like a repeat of bug 681582, but I guess an additional fix is needed beyond what has been done there.

Martin Pool (mbp) on 2011-03-31
Changed in bzr:
status: New → Confirmed
importance: Undecided → High
Ian Booth (wallyworld) on 2011-04-01
Changed in launchpad:
importance: Undecided → High
status: New → Triaged
Martin Pool (mbp) on 2011-04-01
tags: added: memory performance
Steve Magoun (smagoun) on 2011-04-04
tags: added: oem-services
Jelmer Vernooij (jelmer) wrote :

Some other things that we can do to improve the memory usage:

 * deploy a newer version of bzr which uses less memory (this is in the pipeline)
 * increase the ulimits for the recipe build jobs on the buildds; if I remember correctly they were set pretty low because of responsiveness issues with the buildds that should be less of an issue now the buildd-manager has moved to twisted

On Wed, Jun 8, 2011 at 9:07 PM, Jelmer Vernooij
<email address hidden> wrote:
> Some other things that we can do to improve the memory usage:
>
>  * deploy a newer version of bzr which uses less memory (this is in the pipeline)

+1

>  * increase the ulimits for the recipe build jobs on the buildds; if I remember correctly they were set pretty low because of responsiveness issues with the buildds that should be less of an issue now the buildd-manager has moved to twisted

The buildd manager was always twistd. The buildds need to be
responsive or the manager will consider them dead.

Jelmer Vernooij (jelmer) wrote :

On 08/06/11 10:26, Robert Collins wrote:
> On Wed, Jun 8, 2011 at 9:07 PM, Jelmer Vernooij
> <email address hidden> wrote:
>> Some other things that we can do to improve the memory usage:
>>
>> * deploy a newer version of bzr which uses less memory (this is in the pipeline)
> +1
I'll have a look at this - bzr-builder also needs to be updated, might
as well do both.

>> * increase the ulimits for the recipe build jobs on the buildds; if I
> remember correctly they were set pretty low because of responsiveness
> issues with the buildds that should be less of an issue now the buildd-
> manager has moved to twisted
>
> The buildd manager was always twistd. The buildds need to be
> responsive or the manager will consider them dead.
Sorry, I mean until the buildd manager was made properly twisted by
bigjools and jml last year.

I recall there were issues with builders going AWOL when they were
building large recipe builds, and taking a really long time (> 30
seconds or something) to respond to simple "are you alive" query ? The
strict ulimits were put in place specifically to try to mitigate that,
but the issue later disappeared because of another fix, roughly around
the same time the buildd manager was made fully twisted. IMBW.

I don't see any ulimits in place (in the lib/canonical/buildd scripts)
for non-recipe builds; I would expect the limits to be the same for both
regular and recipe builds.

Cheers,

Jelmer

Scott Ritchie (scottritchie) wrote :

Any updates? My daily recipes just started failing 2 weeks ago (but were succeeding before then) so I seem to have crossed into the threshold (or available memory on the builders decreased).

This is still quite annoying.

Jelmer Vernooij (jelmer) wrote :

We're still waiting for a deployment of a newer bzr-builder and bzr to the builders.

Martin Pool (mbp) wrote :

Is there an RT or some similar handle for the deployment?

Jelmer Vernooij (jelmer) wrote :

This is RT #46345

Martin Pool (mbp) wrote :

I've asked for the rt to be completed.

Scott Ritchie (scottritchie) wrote :

This is your 3 week nag reminder politely asking this be completed as it's still blocking my work :)

Martin Pool (mbp) wrote :

Thanks for the reminder, Scott. I'm sorry it is taking so long.

This is still waiting on the RT, which is apparently waiting on Jelmer reworking the packages to not require an updated quilt. Jelmer, if you can push that I will try to get IS to actually install them.

Martin Pool (mbp) wrote :

We think this is now fixed, as a more efficient version of bzr has been rolled out to the buildds. I have retried some of the builds that were previously reported to have failed and they either passed or failed for non-bzr-related reasons during the actual build.

(Retrospectively critical as a stakeholder escalation.)

Please let us know if this works well or not on your jobs.

Changed in launchpad:
importance: High → Critical
status: Triaged → Fix Released
Changed in bzr:
status: Confirmed → Fix Released
tags: added: affects-linaro
Changed in launchpad:
status: Fix Released → Triaged
Francis J. Lacoste (flacoste) wrote :

Reopening in Launchpad, should probably be reopened in bzr too. It's failling on 4 different builders at least (actinium, dubnium, lemon, uranium). Not sure if it works on any.

I'll investigate the RAM available on these builders.

Martin Pool (mbp) wrote :

There's some suggestion it's not actually deployed on all buildds,
which I'm investigating. Otherwise we'll have to see about
reproducing this within a limited amount of memory locally and
investigating more.

Changed in bzr:
status: Fix Released → Confirmed
Martin Pool (mbp) wrote :

It looks like the deployment of this was buggy (causing bug 884516) and the setup on qastaging was not reproduced on lpnet. So, the deployment to lpnet was inconsistent across machines and also inconsistent with what was done on qas.

We have gone back to bzr 2.4.0 on lpnet, as used last week.

We will try again to deploy the bzr* updates on qastaging, test it properly, then again deploy to lpnet.

see also bug 693524 which is a different out-of-memory error not apparently related to bzr.

Martin Pool (mbp) wrote :

I've tested 'bzr build' on the projectneon recipes locally in a 1GB ulimit, and it worked, so I think this is ok in bzr and just needs to be actually rolled out.

Changed in bzr:
status: Confirmed → Fix Released
Changed in launchpad:
status: Triaged → In Progress

local test:

=time -v bzr2.4 build project-neon-kdesdk.recipe ./projectneon-build-2 -v
Building tree.
Retrieving 'lp:kdesdk' to put at './projectneon-build-2'.
Retrieving 'lp:~neon/project-neon/kdesdk-ubuntu' to put at
'./projectneon-build-2/debian'.
       Command being timed: "bzr2.4 build project-neon-kdesdk.recipe
./projectneon-build-2 -v"
       User time (seconds): 0.51
       System time (seconds): 0.27
       Percent of CPU this job got: 1%
       Elapsed (wall clock) time (h:mm:ss or m:ss): 0:53.15
       Average shared text size (kbytes): 0
       Average unshared data size (kbytes): 0
       Average stack size (kbytes): 0
       Average total size (kbytes): 0
       Maximum resident set size (kbytes): 164800
       Average resident set size (kbytes): 0
       Major (requiring I/O) page faults: 1
       Minor (reclaiming a frame) page faults: 26367
       Voluntary context switches: 103
       Involuntary context switches: 97
       Swaps: 0
       File system inputs: 56
       File system outputs: 6448
       Socket messages sent: 0
       Socket messages received: 0
       Signals delivered: 0
       Page size (bytes): 4096
       Exit status: 0

so, the maximum resident memory was 164MB and we should be safe within
the 1GB limit for buildds.

Marcin Juszkiewicz (hrw) wrote :

15:56 hrw@puchatek:ci$ /usr/bin/time -v bzr build recipes/gcc-linaro.bzr build/gcc-linaro -v
Building tree.
Retrieving '/home/hrw/devel/canonical/2011-oneiric/ci/bazary/packaging-gcc-linaro/' to put at 'build/gcc-linaro'.
Retrieving '/home/hrw/devel/canonical/2011-oneiric/ci/bazary/gcc-linaro/' to put at 'build/gcc-linaro/src'.
        Command being timed: "bzr build recipes/gcc-linaro.bzr build/gcc-linaro -v"
        User time (seconds): 47.65
        System time (seconds): 9.59
        Percent of CPU this job got: 18%
        Elapsed (wall clock) time (h:mm:ss or m:ss): 5:16.39
        Average shared text size (kbytes): 0
        Average unshared data size (kbytes): 0
        Average stack size (kbytes): 0
        Average total size (kbytes): 0
        Maximum resident set size (kbytes): 2893248
        Average resident set size (kbytes): 0
        Major (requiring I/O) page faults: 25
        Minor (reclaiming a frame) page faults: 275194
        Voluntary context switches: 19156
        Involuntary context switches: 2809
        Swaps: 0
        File system inputs: 1590280
        File system outputs: 320
        Socket messages sent: 0
        Socket messages received: 0
        Signals delivered: 0
        Page size (bytes): 4096
        Exit status: 0

hrw@puchatek:ci$ bzr --version
Bazaar (bzr) 2.4.1
  Python interpreter: /usr/bin/python 2.7.2
  Python standard library: /usr/lib/python2.7
  Platform: Linux-3.0.0-12-generic-x86_64-with-Ubuntu-11.10-oneiric
  bzrlib: /usr/lib/python2.7/dist-packages/bzrlib
  Bazaar configuration: /home/hrw/.bazaar
  Bazaar log file: /home/hrw/.bzr.log

Jelmer Vernooij (jelmer) wrote :

Please note that launchpad actually uses "bzr dailydeb", rather than "bzr build". The former uses stacked branches, which can have a significant impact on performance.

Good point, I'll re-test with dailydeb.

Martin Pool (mbp) wrote :

dailydeb on the projectneon recipe uses

Maximum resident set size (kbytes): 623696

and works correctly under a 700MB ulimit, which is more strict than on the buildds.

Martin Pool (mbp) wrote :

follow on bug 884997

tags: added: escalated
Martin Pool (mbp) wrote :

qa investigations:

one bzr daily recipe which was previously passing now fails because of
bug 885497, which does seem to be an actual integration bug rather
than a deployment issue.

we could possibly fix this bug 746822 and avoid bug 885497 by
upgrading only bzr and not bzr-builder so we're going to try that.

our next attempt (with only bzr and not bzr-builder) has worked ok on qa,
and it should be rolled to lpnet next week, after uds

--
Martin

Jelmer Vernooij (jelmer) wrote :

This has now been rolled out, thanks to Lamont and Martin.

I've confirmed this fixes the issue for the Samba 4 recipes. The widelands recipe mentioned above seems to be working now too.

Jelmer Vernooij (jelmer) wrote :

The Linaro gcc source package built fine too, with this recipe: https://code.launchpad.net/~linaro-pkg/+recipe/gcc-linaro-native-daily

https://code.launchpad.net/~linaro-pkg/+archive/testing-daily-builds/+recipebuild/115540

So I think this can be considered fixed now.

Jelmer Vernooij (jelmer) on 2011-11-09
Changed in launchpad:
status: In Progress → Fix Released
Marcin Juszkiewicz (hrw) wrote :

Thanks everyone for making it fixed. Now I can go with my blueprints and make gcc-linaro daily/request working ;)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers