Launchpad itself

sendbranchmail with lp:~vcs-imports/linux/trunk is eating memory

Bug #585126 reported by Steve McInerney on 2010-05-24

This bug affects 5 people

	Status	Importance	Assigned to
Bazaar	Confirmed	High	Unassigned
Launchpad itself	Triaged	Critical	Unassigned
nodejs (Ubuntu)	New	Undecided	Unassigned

Bug Description

Found an oops for this: OOPS-1612BM1
branch: ~vcs-imports/linux/trunk
branch_job_id: 1658563

We've had to kill off a sendbranchmail run, as it was driving the server into swap/eating all of it.

bzrsyncd 1629 0.0 0.0 3944 468 ? Ss 20:52 0:00 /bin/sh -c $LP_PY /srv/bzrsyncd.launchpad.net/production/launchpad/cronscripts/sendbranchmail.py >> /srv/bzrsyncd.launchpad.net/production-logs/sendbranchmail.log 2>&1
bzrsyncd 1630 53.0 61.4 6871152 5033008 ? Dl 20:52 6:43 \_ /usr/bin/python2.5 /srv/bzrsyncd.launchpad.net/production/launchpad/cronscripts/sendbranchmail.py

5Gb RSS, is rather uncool.

See original description

Tags:

Related branches

lp:~abentley/launchpad/memlimit-sendbranchmail

Merged into lp:launchpad at revision 13420

Aaron Bentley (community): Approve on 2011-07-11

Steve McInerney (spm) on 2010-05-24

Changed in launchpad-code:
importance:	Undecided → Critical

Revision history for this message

Tim Penhey (thumper) wrote on 2010-05-25:

OK, first step is to identify the branch culprit. This is almost certainly a bzr issue too, but lets work out the branch first.

Given the way jobs work, the same job would have been run shortly after, so we can't tell exactly which job it was, or that there may have been multiple jobs. We need to add a line to the job script that prints out the job it is running... although I have a gut feeling that I added this info before and we may just need an extra '-v' parameter.

Tim Penhey (thumper) on 2010-05-25

Changed in launchpad-code:
status:	New → Triaged
assignee:	nobody → Tim Penhey (thumper)

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-25:

Perhaps we should run this with a ulimit set to something the machine can tolerate.

To get a view of where it's using memory, try following http://jam-bazaar.blogspot.com/2009/11/memory-debugging-with-meliae.html - though this may be a bit hard if it's run non-interactively.

Revision history for this message

Tim Penhey (thumper) wrote on 2010-05-25:

Hmm... as well as adding '-v' to the mail job, we need the following cowboy. We will then get information about the jobs as they are being run in the log file. With that we can debug more.

=== modified file 'lib/lp/code/model/branchjob.py'
--- lib/lp/code/model/branchjob.py 2010-04-23 05:29:30 +0000
+++ lib/lp/code/model/branchjob.py 2010-05-25 03:56:09 +0000
@@ -183,6 +183,13 @@
def __init__(self, branch_job):
self.context = branch_job

+ def __repr__(self):
+ branch = self.branch
+ return '<%(job_type)s job for %(branch)s>' % {
+ 'job_type': self.context.job_type.name,
+ 'branch': branch.unique_name,
+ }
+
     # XXX: henninge 2009-02-20 bug=331919: These two standard operators
     # should be implemented by delegates().
     def __eq__(self, other):

Revision history for this message

Tim Penhey (thumper) wrote on 2010-05-25:

Spectacular formatting fail there.

Revision history for this message

Tim Penhey (thumper) wrote on 2010-05-25:

__repr__ method for BranchJobDerived Edit (641 bytes, text/plain)

Tim Penhey (thumper) on 2010-05-26

Changed in launchpad-code:
importance:	Critical → High

Revision history for this message

Steve McInerney (spm) wrote on 2010-05-26:

Hrm. Didn't realise you could set a hard memory limit thru ulimit. have done so via a funky wrapper script:
ulimit -v 1843200
fwiw.

Have added the -v in the script as well; but need approval for the cowboy to go ahead, then we can roll that.

Pls do be ware; we have had at least 2 repeat instances where this has nearly caused loganberry to faceplant.

Revision history for this message

Martin Pool (mbp) wrote on 2010-05-26: Re: [Bug 585126] Re: sendbranchmail is eating memory

On 27 May 2010 06:24, Steve McInerney <email address hidden> wrote:
> Hrm. Didn't realise you could set a hard memory limit thru ulimit. have done so via a funky wrapper script:
> ulimit -v 1843200
> fwiw.
>
> Have added the -v in the script as well; but need approval for the
> cowboy to go ahead, then we can roll that.
>
> Pls do be ware; we have had at least 2 repeat instances where this has
> nearly caused loganberry to faceplant.

'this' meaning this bug, or using ulimit?

--
Martin <http://launchpad.net/~mbp/>

Revision history for this message

Steve McInerney (spm) wrote on 2010-05-26: Re: sendbranchmail is eating memory

"'this' meaning this bug, or using ulimit?"

Oops. This meaning the bug.

Tim, you mentioned that the same job runs on the next iteration of sendbranchmail?
If so, we aren't seeing a repeat of this memory gobble, until a day ish later - so far. I have no idea if that's sheer co-incidence or expected. AFAIA,we've had 3 instances of this gobbling incident.

But will do a scan thru our ps(1) history looking for a more complete analysis. Details to follow.

Revision history for this message

Steve McInerney (spm) wrote on 2010-05-26:

The memory jump is generally fairly sudden and quite rapid, as seen from ~ minute to minute.
a given process will be happily doing it's thing for a while (== minutes) then within 120 seconds has gone from ~ 500Mb RSS, to 5+ Gb RSS.

A longer history look (ifs buts maybes here) over ~ 9300 records of the sendbranchmail script tells us:
~ 97.5% of the time, it's using < 300Mb RSS.
~ 2% of the time, it's using > 1Gb RSS, most of which is > 2Gb.

be ware as I'd suggest that those higher numbers are slanted by the bigger gobbles running for longer periods; also the lower ones slanted in their favour by multiple attempts to startup when a gobbling is in progress.

Not sure if this'll help in *solving* but perhaps in defining the impact. :-)

Revision history for this message

Tim Penhey (thumper) wrote on 2010-05-27: Re: [Bug 585126] Re: sendbranchmail is eating memory

#10

On Thu, 27 May 2010 10:26:40 you wrote:
> The memory jump is generally fairly sudden and quite rapid, as seen from ~
> minute to minute. a given process will be happily doing it's thing for a
> while (== minutes) then within 120 seconds has gone from ~ 500Mb RSS, to
> 5+ Gb RSS.
>
> A longer history look (ifs buts maybes here) over ~ 9300 records of the
> sendbranchmail script tells us: ~ 97.5% of the time, it's using < 300Mb
> RSS.
> ~ 2% of the time, it's using > 1Gb RSS, most of which is > 2Gb.
>
> be ware as I'd suggest that those higher numbers are slanted by the
> bigger gobbles running for longer periods; also the lower ones slanted
> in their favour by multiple attempts to startup when a gobbling is in
> progress.
>
> Not sure if this'll help in *solving* but perhaps in defining the
> impact. :-)

The additional logging will at least tell us which branches were being
processed when the memory jump occurs.

There are some known big memory requirements for diff, like when the underlying
changed file is very large. There may also be leaks. Until we can get some
sample branches that are causing problems, it is hard to diagnose.

Tom Haddon (mthaddon) on 2010-05-28

tags:

added: canonical-losa-lp

Revision history for this message

Tim Penhey (thumper) wrote on 2010-06-01: Re: sendbranchmail is eating memory

#11

Lets get the details of branch job 1658563:

select json_data from branchjob where id = 1658563

This will give us the revisions it was trying to generate diffs for.

With this info, we can pass this on to the bazaar team to investigate.

tags:	added: oops
description:	updated
description:	updated

Revision history for this message

Tom Haddon (mthaddon) wrote on 2010-06-01:

#12

https://pastebin.canonical.com/32808/

Robert Collins (lifeless) on 2010-06-24

Changed in bzr:
status:	New → Incomplete

Martin Pool (mbp) on 2010-06-24

Changed in bzr:
status:	Incomplete → Confirmed
importance:	Undecided → Medium
importance:	Medium → High

Revision history for this message

Robert Collins (lifeless) wrote on 2010-06-24:

#13

So, what branch as revisions {"last_revision_id": "git-v1:67a3e12b05e055c0415c556a315a3d3eb637e29e", "last_scanned_id": "git-v1:b3f2f6cd1ff935ecac9a5346904b899d7af689fe", "from_address": "<email address hidden>"}
(1 row) ?

I'm guessing linux.

Revision history for this message

Tim Penhey (thumper) wrote on 2010-06-28:

#14

Rob, the description was changed when I found the branch.

+ Found an oops for this: OOPS-1612BM1
+ branch: ~vcs-imports/linux/trunk
+ branch_job_id: 1658563

Robert Collins (lifeless) on 2010-06-28

summary:

- sendbranchmail is eating memory
+ sendbranchmail with lp:~vcs-imports/linux/trunk is eating memory

Robert Collins (lifeless) on 2011-01-12

Changed in launchpad:
importance:	High → Critical

Tim Penhey (thumper) on 2011-02-24

Changed in launchpad:
assignee:	Tim Penhey (thumper) → nobody

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-05-27:

#15

A quick note: "Tim, you mentioned that the same job runs on the next iteration of sendbranchmail?"

Actually, no. That job will be in the RUNNING state. (Or, if killed nicely with SIGINT, in the FAILED state.) Only WAITING jobs will be run.

Aaron Bentley (abentley) on 2011-06-23

Changed in launchpad:
assignee:	nobody → Aaron Bentley (abentley)

Revision history for this message

Launchpad QA Bot (lpqabot) wrote on 2011-06-24:

#16

Fixed in stable r13292 <http://bazaar.launchpad.net/~launchpad-pqm/launchpad/stable/revision/13292>.

tags:	added: qa-needstesting
Changed in launchpad:
status:	Triaged → Fix Committed

Aaron Bentley (abentley) on 2011-06-24

tags:

added: qa-untestable
removed: qa-needstesting

William Grant (wgrant) on 2011-06-27

Changed in launchpad:
status:	Fix Committed → Fix Released

Aaron Bentley (abentley) on 2011-07-18

tags:

added: qa-ok
removed: qa-untestable

Revision history for this message

Aaron Bentley (abentley) wrote on 2011-07-28:

#17

The branch that landed a fix for Launchpad was rolled back.

Changed in launchpad:
status:	Fix Released → Triaged

Curtis Hovey (sinzui) on 2012-09-11

Changed in launchpad:
assignee:	Aaron Bentley (abentley) → nobody

William Grant (wgrant) on 2012-11-20

tags:

added: bzr

Wojciech (wojaugustow) on 2015-12-07

Changed in bzr:
status:	Confirmed → Fix Committed

Colin Watson (cjwatson) on 2015-12-07

Changed in bzr:
status:	Fix Committed → Confirmed

Jelmer Vernooij (jelmer) on 2017-11-08

tags:

added: check-for-breezy

Jelmer Vernooij (jelmer) on 2017-11-12

tags:

removed: check-for-breezy

Juhász Dominika (dominika92) on 2020-08-09

summary:	- sendbranchmail with lp:~vcs-imports/linux/trunk is eating memory + sendbranchmail lp-vel: ~ vcs-import / linux / trunk memóriát eszik
information type:	Public → Private

Colin Watson (cjwatson) on 2020-08-09

information type:	Private → Public
summary:	- sendbranchmail lp-vel: ~ vcs-import / linux / trunk memóriát eszik + sendbranchmail with lp:~vcs-imports/linux/trunk is eating memory

Gazaliy Alade (al42925) on 2021-04-04

Changed in bzr:
status:	Confirmed → Fix Committed

Colin Watson (cjwatson) on 2021-04-04

Changed in bzr:
status:	Fix Committed → Confirmed

Gazaliy Alade (al42925) on 2021-04-29

Changed in bzr:
status:	Confirmed → Fix Committed

Colin Watson (cjwatson) on 2021-05-05

Changed in bzr:
status:	Fix Committed → Confirmed

Revision history for this message

Ubuntu Foundations Team Bug Bot (crichton) wrote on 2024-08-03:

#18

The attachment "__repr__ method for BranchJobDerived" seems to be a patch. If it isn't, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are a member of the ~ubuntu-reviewers, unsubscribe the team.

[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issues please contact him.]

tags:

added: patch