Bug #416990 “Memory usage of codehosting processes is excessive” : Bugs : Launchpad itself

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-21:

#1

Currently seeing one of the bzr lp-serve processes using 1.3G RES, 2.5G VIRT:

https://pastebin.canonical.com/21400/

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2009-08-26:

#2

The problem in the description is really a bzr issue. We just run the smart server.

If the memory usage per process doesn't go down (and I don't think it will significantly in the near term) and usage increases (as we surely hope it will), then we'll need to start using multiple front-end boxes (something I've been banging on about for months). I don't think we have much choice really.

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-26: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#3

We know bzr can use a lot of memory, but telling us that won't help us
fix it - we already know.

To fix it we need to know what *that specific process was doing*, so
that we can reproduce, test, analyse and fix.

Getting:
- the branch being operated on
- how many packs and indices the branch had, its format
- what smart server verbs were called

would be great.

The first is essential to have any hope of debugging. the latter is also
extremely important. The middle detail would be gravy.

-Rob

tags memory
importance high
status confirmed
done

Changed in bzr:
importance:	Undecided → High
status:	New → Confirmed

Jonathan Lange (jml) on 2009-08-26

Changed in launchpad-code:
status:	Confirmed → Triaged

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-26:

#4

On Wed, 2009-08-26 at 04:49 +0000, Robert Collins wrote:
> We know bzr can use a lot of memory, but telling us that won't help us
> fix it - we already know.
>
> To fix it we need to know what *that specific process was doing*, so
> that we can reproduce, test, analyse and fix.
>
> Getting:
> - the branch being operated on
> - how many packs and indices the branch had, its format

Here's an example (taking the first process listing in that top output
as an example):

https://pastebin.canonical.com/21515/

> - what smart server verbs were called

I asked Aaron about this and he said he doesn't think we log what verbs
were used. Let me know if that's wrong and how to find it.

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-26: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#5

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Haddon wrote:
> On Wed, 2009-08-26 at 04:49 +0000, Robert Collins wrote:
>> We know bzr can use a lot of memory, but telling us that won't help us
>> fix it - we already know.
>>
>> To fix it we need to know what *that specific process was doing*, so
>> that we can reproduce, test, analyse and fix.
>>
>> Getting:
>> - the branch being operated on
>> - how many packs and indices the branch had, its format
>
> Here's an example (taking the first process listing in that top output
> as an example):
>
> https://pastebin.canonical.com/21515/
>
>> - what smart server verbs were called
>
> I asked Aaron about this and he said he doesn't think we log what verbs
> were used. Let me know if that's wrong and how to find it.
>

Well, you could run with "-Dhpss" which would log them. I'm not sure how
to inject that via the setup. Perhaps setting the config entry:

debug_flags = hpss

In ~/.bazaar/bazaar.conf

However, when you encounter one of these processes, if you could a
terminal attached, then you could use SIGQUIT to break into the
debugger, and figure out what is currently happening.

I wonder if we would want to hook up SIGHUP or something along those
lines to just dump the current backtrace into .bzr.log, as a method for
debugging service processes...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVjk0ACgkQJdeBCYSNAAPcCACfUMTWljiDKh1WIgortHZxjJhZ
TiwAn0ai68WwQbLY7UN7Y8i/lq2r+A8T
=heI6
-----END PGP SIGNATURE-----

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-26:

#6

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> Getting:
>> - the branch being operated on
>> - how many packs and indices the branch had, its format
>
> Here's an example (taking the first process listing in that top output
> as an example):
>
> https://pastebin.canonical.com/21515/

So that tells us:

1) It is a mysql maria branch, presumably: ~maria-captains/maria/1.5
2) It is a 1.9 format branch (so not --2a)
3) It currently has open one of the very large text index files:

python2.4 13024 codehost 4r REG 254,0 14913650 97550603
/srv.../indices/05ff4b81be41c0ce966dc98a45e87208.tix
python2.4 13024 codehost 12r FIFO 0,5 325267728 pipe

But that seems to be the only file open as part of the actual branch.
All the rest appear to just be python eggs or source, or extensions, etc.

4) You are running on python2.4

I know of at least one fairly serious memory related fix in python 2.5:
http://evanjones.ca/python-memory-part3.html

Whether that is at play here or not, I don't know. But long-lived
processes and python2.4 don't play nicely with memory.

5) Individually no process seems to be more that 7-800MB, but since you
have several running at once, it adds up.

6) The process has been running for ~6min. My guess is that it is
someone doing a full checkout (from scratch) of the code.

7) 28046 codehost 20 0 410m 96m 2300 R 44 1.2 293:03.42
/usr/bin/python2.4
/srv/bazaar.launchpad.net/production/launchpad/bin/twistd --pidfile=/srv/baz

The service process that is spawning all of this has a surprising (to
me) amount of memory consumed. I honestly don't know what it is doing,
but it has allocated 410MB of ram. (If I understand VIRT correctly.)
96MB resident is reasonable, though.

8) 542MB resident == 6.8% ram, so you have approx 8GB of ram on the
system. So you can run <20 of these processes before having very serious
issues. (And probably more like <10 before you are no longer caching
disk buffers, and starting to have problems.) (I guess that was also in
the meminfo at the beginning.)

9) I assume 'bzr lp-serve' is used to send/receive data from the outside
world, not for the mirroring process?

I'll try a little bit of experimenting here, to see if I can pinpoint
the memory consumption a little bit better. It certainly would help if
we knew what actual action is being performed...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVmCsACgkQJdeBCYSNAAPRawCdEdgYGU7QfxpRrDQ2RtyOrOE6
hfcAn2Vp3We4SmZ6q3LmEdNLTDpWC6kM
=g2sQ
-----END PGP SIGNATURE-----

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

...

>> Getting:
>>  - the branch being operated on
>>  - how many packs and indices the branch had, its format
> 
> Here's an example (taking the first process listing in that top output
> as an example):
> 
> https://pastebin.canonical.com/21515/

So that tells us:

1) It is a mysql maria branch, presumably: ~maria-captains/maria/1.5
2) It is a 1.9 format branch (so not --2a)
3) It currently has open one of the very large text index files:

python2.4 13024 codehost    4r   REG  254,0 14913650  97550603
   /srv.../indices/05ff4b81be41c0ce966dc98a45e87208.tix
python2.4 13024 codehost   12r  FIFO    0,5          325267728 pipe

But that seems to be the only file open as part of the actual branch.
All the rest appear to just be python eggs or source, or extensions, etc.

4) You are running on python2.4

I know of at least one fairly serious memory related fix in python 2.5:
http://evanjones.ca/python-memory-part3.html

Whether that is at play here or not, I don't know. But long-lived
processes and python2.4 don't play nicely with memory.

5) Individually no process seems to be more that 7-800MB, but since you
have several running at once, it adds up.

6) The process has been running for ~6min. My guess is that it is
someone doing a full checkout (from scratch) of the code.

7) 28046 codehost  20   0  410m  96m 2300 R   44  1.2 293:03.42
/usr/bin/python2.4
/srv/bazaar.launchpad.net/production/launchpad/bin/twistd --pidfile=/srv/baz

The service process that is spawning all of this has a surprising (to
me) amount of memory consumed. I honestly don't know what it is doing,
but it has allocated 410MB of ram. (If I understand VIRT correctly.)
96MB resident is reasonable, though.

8) 542MB resident == 6.8% ram, so you have approx 8GB of ram on the
system. So you can run <20 of these processes before having very serious
issues. (And probably more like <10 before you are no longer caching
disk buffers, and starting to have problems.) (I guess that was also in
the meminfo at the beginning.)

9) I assume 'bzr lp-serve' is used to send/receive data from the outside
world, not for the mirroring process?

I'll try a little bit of experimenting here, to see if I can pinpoint
the memory consumption a little bit better. It certainly would help if
we knew what actual action is being performed...

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVmCsACgkQJdeBCYSNAAPRawCdEdgYGU7QfxpRrDQ2RtyOrOE6
hfcAn2Vp3We4SmZ6q3LmEdNLTDpWC6kM
=g2sQ
-----END PGP SIGNATURE-----

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-26: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#7

On Wed, 2009-08-26 at 20:16 +0000, John A Meinel wrote:
> >> Getting:
> >> - the branch being operated on
> >> - how many packs and indices the branch had, its format
> >
> > Here's an example (taking the first process listing in that top output
> > as an example):
> >
> > https://pastebin.canonical.com/21515/
>
>
> So that tells us:
>
> 1) It is a mysql maria branch, presumably: ~maria-captains/maria/1.5
> 2) It is a 1.9 format branch (so not --2a)
> 3) It currently has open one of the very large text index files:
>
> python2.4 13024 codehost 4r REG 254,0 14913650 97550603
> /srv.../indices/05ff4b81be41c0ce966dc98a45e87208.tix
> python2.4 13024 codehost 12r FIFO 0,5 325267728 pipe
>
>
> But that seems to be the only file open as part of the actual branch.
> All the rest appear to just be python eggs or source, or extensions, etc.
>
> 4) You are running on python2.4
>
> I know of at least one fairly serious memory related fix in python 2.5:
> http://evanjones.ca/python-memory-part3.html
>
> Whether that is at play here or not, I don't know. But long-lived
> processes and python2.4 don't play nicely with memory.

Ouch. All of LP runs on 2.4...

> 5) Individually no process seems to be more that 7-800MB, but since you
> have several running at once, it adds up.

I'm not sure 7-800MB is reasonable for even a single process, but fwiw,
we've seen individual processes with up to 1.4GB before.

> 6) The process has been running for ~6min. My guess is that it is
> someone doing a full checkout (from scratch) of the code.
>
>
> 7) 28046 codehost 20 0 410m 96m 2300 R 44 1.2 293:03.42
> /usr/bin/python2.4
> /srv/bazaar.launchpad.net/production/launchpad/bin/twistd --pidfile=/srv/baz
>
> The service process that is spawning all of this has a surprising (to
> me) amount of memory consumed. I honestly don't know what it is doing,
> but it has allocated 410MB of ram. (If I understand VIRT correctly.)
> 96MB resident is reasonable, though.
>
> 8) 542MB resident == 6.8% ram, so you have approx 8GB of ram on the
> system. So you can run <20 of these processes before having very serious
> issues. (And probably more like <10 before you are no longer caching
> disk buffers, and starting to have problems.) (I guess that was also in
> the meminfo at the beginning.)

Yep, the server has 8GB RAM.

> 9) I assume 'bzr lp-serve' is used to send/receive data from the outside
> world, not for the mirroring process?
>
>
> I'll try a little bit of experimenting here, to see if I can pinpoint
> the memory consumption a little bit better. It certainly would help if
> we knew what actual action is being performed...
>
> John
> =:->

On Wed, 2009-08-26 at 20:16 +0000, John A Meinel wrote:
> >> Getting:
> >>  - the branch being operated on
> >>  - how many packs and indices the branch had, its format
> > 
> > Here's an example (taking the first process listing in that top output
> > as an example):
> > 
> > https://pastebin.canonical.com/21515/
> 
> 
> So that tells us:
> 
> 1) It is a mysql maria branch, presumably: ~maria-captains/maria/1.5
> 2) It is a 1.9 format branch (so not --2a)
> 3) It currently has open one of the very large text index files:
> 
> python2.4 13024 codehost    4r   REG  254,0 14913650  97550603
>    /srv.../indices/05ff4b81be41c0ce966dc98a45e87208.tix
> python2.4 13024 codehost   12r  FIFO    0,5          325267728 pipe
> 
> 
> But that seems to be the only file open as part of the actual branch.
> All the rest appear to just be python eggs or source, or extensions, etc.
> 
> 4) You are running on python2.4
> 
> I know of at least one fairly serious memory related fix in python 2.5:
> http://evanjones.ca/python-memory-part3.html
> 
> Whether that is at play here or not, I don't know. But long-lived
> processes and python2.4 don't play nicely with memory.

Ouch. All of LP runs on 2.4...

> 5) Individually no process seems to be more that 7-800MB, but since you
> have several running at once, it adds up.

I'm not sure 7-800MB is reasonable for even a single process, but fwiw,
we've seen individual processes with up to 1.4GB before.

> 6) The process has been running for ~6min. My guess is that it is
> someone doing a full checkout (from scratch) of the code.
> 
> 
> 7) 28046 codehost  20   0  410m  96m 2300 R   44  1.2 293:03.42
> /usr/bin/python2.4
> /srv/bazaar.launchpad.net/production/launchpad/bin/twistd --pidfile=/srv/baz
> 
> The service process that is spawning all of this has a surprising (to
> me) amount of memory consumed. I honestly don't know what it is doing,
> but it has allocated 410MB of ram. (If I understand VIRT correctly.)
> 96MB resident is reasonable, though.
> 
> 8) 542MB resident == 6.8% ram, so you have approx 8GB of ram on the
> system. So you can run <20 of these processes before having very serious
> issues. (And probably more like <10 before you are no longer caching
> disk buffers, and starting to have problems.) (I guess that was also in
> the meminfo at the beginning.)

Yep, the server has 8GB RAM.

> 9) I assume 'bzr lp-serve' is used to send/receive data from the outside
> world, not for the mirroring process?
> 
> 
> I'll try a little bit of experimenting here, to see if I can pinpoint
> the memory consumption a little bit better. It certainly would help if
> we knew what actual action is being performed...
> 
> John
> =:->

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-26: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#8

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

John A Meinel wrote:
...

> 5) Individually no process seems to be more that 7-800MB, but since you
> have several running at once, it adds up.

So in doing a local bzr:// branch of a mysql-5.1 repo, I see the server
hit a peak of 299MB, and the client hit a peak of 178MB. (using
python2.5 and bzr.dev 4595.

Some other things come to mind, like they are using an older bzr client
which is causing more VFS style operations than expected.

299 is still a bit higher than "strictly necessary", though I wonder by
how much...

John
=:->

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqVnpsACgkQJdeBCYSNAAMhfQCgvep99aSn6hGneliEsBMZh/X8
U+AAn2X4LGPgDTdzXsmCXRIBnWinsnj9
=+bQ8
-----END PGP SIGNATURE-----

Revision history for this message

Jonathan Lange (jml) wrote on 2009-08-26:

#9

John, 'bzr lp-serve' is almost exactly the same thing as 'bzr serve'. The difference is that we set up the smart server to have our virtual file system.

The process that spawns these bzr processes is also an SFTP server. This might explain the high memory usage.

I'd be more than happy to talk about ways to get better logging.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-26: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#10

On Wed, 2009-08-26 at 20:44 +0000, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> John A Meinel wrote:
> ...
>
> > 5) Individually no process seems to be more that 7-800MB, but since you
> > have several running at once, it adds up.
>
> So in doing a local bzr:// branch of a mysql-5.1 repo, I see the server
> hit a peak of 299MB, and the client hit a peak of 178MB. (using
> python2.5 and bzr.dev 4595.

Can you do the same with the versions we're using for LP (of python and
bzr) to confirm there's a difference and not just that it's a difference
of environment?

> Some other things come to mind, like they are using an older bzr client
> which is causing more VFS style operations than expected.
>
> 299 is still a bit higher than "strictly necessary", though I wonder by
> how much...
>
> John
> =:->
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkqVnpsACgkQJdeBCYSNAAMhfQCgvep99aSn6hGneliEsBMZh/X8
> U+AAn2X4LGPgDTdzXsmCXRIBnWinsnj9
> =+bQ8
> -----END PGP SIGNATURE-----
>

Revision history for this message

Robert Collins (lifeless) wrote on 2009-08-26:

#11

Also, do you have the C extensions built for bzr?

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-26:

#12

We do have the C extensions built:

https://pastebin.canonical.com/21534/

Revision history for this message

John A Meinel (jameinel) wrote on 2009-08-27: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#13

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Tom Haddon wrote:
> On Wed, 2009-08-26 at 20:44 +0000, John A Meinel wrote:
> John A Meinel wrote:
> ...
>
>>>> 5) Individually no process seems to be more that 7-800MB, but since you
>>>> have several running at once, it adds up.
> So in doing a local bzr:// branch of a mysql-5.1 repo, I see the server
> hit a peak of 299MB, and the client hit a peak of 178MB. (using
> python2.5 and bzr.dev 4595.
>
>> Can you do the same with the versions we're using for LP (of python and
>> bzr) to confirm there's a difference and not just that it's a difference
>> of environment?

I cannot trivially do so.

Assuming you are using bzr 1.17 on LP, we don't really have any idea
what version the client is, without a log. But we could also assume 1.17
for now.

Going further, I'm also not using the exact same branch. I'm using a
mysql-5.1 not mysql-maria. Nor will it have exactly the same
packs-on-disk layout, etc.

But the real reason is that for whatever reason python2.4 doesn't like
my bzr. In fact my python 2.4 has some real problems:

$ python2.4 -c 'import locale'
Traceback (most recent call last):
File "<string>", line 1, in ?
ImportError: No module named locale

Maybe I'll look closer at it tomorrow, but I'd rather just look at the
current situation.

I just realized... if you have 8GB of ram you almost certainly are
running a 64-bit platform. And memory consumption on 64-bit python is
pretty much always 2x. Not exactly, but it gives a really good ballpark.
(Almost every object is a pointer or a 'long' both of which go from
32-bit => 64-bit...)

So probably my 299MB fits the 540MB resident that you were seeing.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkqV9C4ACgkQJdeBCYSNAAO8EQCePHpdDzrWIAbjJbGzAhWfyqRN
dGMAoIjB1xNucqbZqm/o3C/HZaZP9t/p
=CY3B
-----END PGP SIGNATURE-----

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-08-27: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#14

On Thu, 2009-08-27 at 02:49 +0000, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Tom Haddon wrote:
> > On Wed, 2009-08-26 at 20:44 +0000, John A Meinel wrote:
> > John A Meinel wrote:
> > ...
> >
> >>>> 5) Individually no process seems to be more that 7-800MB, but since you
> >>>> have several running at once, it adds up.
> > So in doing a local bzr:// branch of a mysql-5.1 repo, I see the server
> > hit a peak of 299MB, and the client hit a peak of 178MB. (using
> > python2.5 and bzr.dev 4595.
> >
> >> Can you do the same with the versions we're using for LP (of python and
> >> bzr) to confirm there's a difference and not just that it's a difference
> >> of environment?
>
> I cannot trivially do so.
>
> Assuming you are using bzr 1.17 on LP, we don't really have any idea
> what version the client is, without a log. But we could also assume 1.17
> for now.
>
> Going further, I'm also not using the exact same branch. I'm using a
> mysql-5.1 not mysql-maria. Nor will it have exactly the same
> packs-on-disk layout, etc.
>
> But the real reason is that for whatever reason python2.4 doesn't like
> my bzr. In fact my python 2.4 has some real problems:
>
> $ python2.4 -c 'import locale'
> Traceback (most recent call last):
> File "<string>", line 1, in ?
> ImportError: No module named locale
>
> Maybe I'll look closer at it tomorrow, but I'd rather just look at the
> current situation.
>
> I just realized... if you have 8GB of ram you almost certainly are
> running a 64-bit platform.

Yes, pretty much all our servers are 64-bit.

> And memory consumption on 64-bit python is
> pretty much always 2x. Not exactly, but it gives a really good ballpark.
> (Almost every object is a pointer or a 'long' both of which go from
> 32-bit => 64-bit...)
>
> So probably my 299MB fits the 540MB resident that you were seeing.

In that case, that suggests that there's no improvement from moving from
python2.4 -> python2.5, right?

>
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAkqV9C4ACgkQJdeBCYSNAAO8EQCePHpdDzrWIAbjJbGzAhWfyqRN
> dGMAoIjB1xNucqbZqm/o3C/HZaZP9t/p
> =CY3B
> -----END PGP SIGNATURE-----
>

On Thu, 2009-08-27 at 02:49 +0000, John A Meinel wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Tom Haddon wrote:
> > On Wed, 2009-08-26 at 20:44 +0000, John A Meinel wrote:
> > John A Meinel wrote:
> > ...
> > 
> >>>> 5) Individually no process seems to be more that 7-800MB, but since you
> >>>> have several running at once, it adds up.
> > So in doing a local bzr:// branch of a mysql-5.1 repo, I see the server
> > hit a peak of 299MB, and the client hit a peak of 178MB. (using
> > python2.5 and bzr.dev 4595.
> > 
> >> Can you do the same with the versions we're using for LP (of python and
> >> bzr) to confirm there's a difference and not just that it's a difference
> >> of environment?
> 
> I cannot trivially do so.
> 
> Assuming you are using bzr 1.17 on LP, we don't really have any idea
> what version the client is, without a log. But we could also assume 1.17
> for now.
> 
> Going further, I'm also not using the exact same branch. I'm using a
> mysql-5.1 not mysql-maria. Nor will it have exactly the same
> packs-on-disk layout, etc.
> 
> But the real reason is that for whatever reason python2.4 doesn't like
> my bzr. In fact my python 2.4 has some real problems:
> 
> $ python2.4 -c 'import locale'
> Traceback (most recent call last):
>   File "<string>", line 1, in ?
> ImportError: No module named locale
> 
> Maybe I'll look closer at it tomorrow, but I'd rather just look at the
> current situation.
> 
> I just realized... if you have 8GB of ram you almost certainly are
> running a 64-bit platform.

Yes, pretty much all our servers are 64-bit.

> And memory consumption on 64-bit python is
> pretty much always 2x. Not exactly, but it gives a really good ballpark.
> (Almost every object is a pointer or a 'long' both of which go from
> 32-bit => 64-bit...)
> 
> So probably my 299MB fits the 540MB resident that you were seeing.

In that case, that suggests that there's no improvement from moving from
python2.4 -> python2.5, right?

> 
> John
> =:->
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.9 (Cygwin)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkqV9C4ACgkQJdeBCYSNAAO8EQCePHpdDzrWIAbjJbGzAhWfyqRN
> dGMAoIjB1xNucqbZqm/o3C/HZaZP9t/p
> =CY3B
> -----END PGP SIGNATURE-----
>

Revision history for this message

Andrew Bennetts (spiv) wrote on 2009-10-13:

#15

John's just landed some work that cuts down memory use a fair bit for large branches. IIRC we expect savings of roughly 20MB for a branch the size of Launchpad, depending on factors like exactly how much of the repository indices are loaded. So if we're lucky it might cut loggerhead's memory consumption by 10%.

So this bug certainly isn't fixed yet, but hopefully when Launchpad rolls out 2.1 the problem will be eased a little bit. It would be interesting to know just how much the new version helps, I guess if we monitor the lp graphs we'll be able to see what difference (if any) it makes?

Revision history for this message

John A Meinel (jameinel) wrote on 2009-10-13: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#16

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andrew Bennetts wrote:
> John's just landed some work that cuts down memory use a fair bit for
> large branches. IIRC we expect savings of roughly 20MB for a branch the
> size of Launchpad, depending on factors like exactly how much of the
> repository indices are loaded. So if we're lucky it might cut
> loggerhead's memory consumption by 10%.
>
> So this bug certainly isn't fixed yet, but hopefully when Launchpad
> rolls out 2.1 the problem will be eased a little bit. It would be
> interesting to know just how much the new version helps, I guess if we
> monitor the lp graphs we'll be able to see what difference (if any) it
> makes?
>

Just to mention, the fixes won't be in the 2.1.0b1 release, but will be
in 2.1.0b2. There has been a fair amount of fallout between different
platforms, etc. So I'm going to be conservative and land it at the
beginning of 2.1.0b2, rather than at the end of 2.1.0b1.

John
=:->
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Cygwin)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAkrUl/IACgkQJdeBCYSNAAOYZQCg0yBZ/TrOfveTPRumXokVEtHZ
B34An01X/VQBCh1AKNJAqDxfcwICMld4
=8xSl
-----END PGP SIGNATURE-----

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2009-11-23:

#17

We've landed the change to use bzr 2.1b3 so this should improve a bunch after the next rollout.

We can also probably put a cap in place on the consumption of each process like we do for the puller. The user experience won't be great when their pull which will presumably have been running for a long time suddenly croaks because the process at the far end got aborted though.

I don't see the issue that each process takes a noticeable hunk of ram and we run one process per connection going away.

The long term solution still has to be running the bzr lp-serve process on multiple hosts.

Revision history for this message

Tom Haddon (mthaddon) wrote on 2009-12-07:

#18

This is becoming more of a problem - see https://pastebin.canonical.com/25430/. We've had numerous swap alerts for crowberry over the last week or so.

Revision history for this message

Chris Jones (cmsj) wrote on 2009-12-07:

#19

We have actually seen a few instances now of the OOM Killer stepping in:

Dec 4 22:08:49 crowberry kernel: [1335795.418553] Out of memory: kill process 16826 (python2.5) score 661096 or a child
Dec 6 14:17:47 crowberry kernel: [1480269.669812] Out of memory: kill process 25020 (python2.5) score 483323 or a child
Dec 7 09:48:23 crowberry kernel: [1550427.592971] Out of memory: kill process 28630 (python2.5) score 382509 or a child
Dec 7 09:50:54 crowberry kernel: [1550630.331084] Out of memory: kill process 1500 (python2.5) score 270699 or a child
Dec 7 09:54:51 crowberry kernel: [1550855.178426] Out of memory: kill process 26136 (twistd) score 270708 or a child

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2009-12-07:

#20

Ugh, I had hoped that the situation wasn't this bad yet :(

I had also hoped that the last rollout would help.

Can we add more RAM to the box? I realize this is Not A Solution, but it might give us some breathing room.

I sense a reprioritization in my future...

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-07:

#21

Obviously the bzr version looks like it includes the latest memory improvements. Are we sure that all of the extensions got compiled properly? I wouldn't be surprised if memory consumption was >= before if _static_tuple_c.c didn't get compiled properly. There were more fixes than just that, which should have reduced memory consumption.

As for OOM... isn't this python in 64bit mode? If it was 32-bit python then I could certainly see us hitting the limit. If this is python2.4, then I'll also not guarantee too much. SInce in python2.4 allocations of "integer" objects are never released back to the system. And I would offer that there are probably some other memory allocation improvements in python 2.5 and newer.

Also, is there any way to probe what actual command is running on a process that hits 1GB of RAM? (What branch is it accessing? What action is it performing, etc.)

I wonder if it would be useful to hook up a SIGHUP handler that would dump the current stack trace into .bzr.log. Do you think it would help debugging this? (It seems like something that we could do in a small plugin if it would make it easier.)

Revision history for this message

Michael Hudson-Doyle (mwhudson) wrote on 2009-12-07:

#22

John A Meinel wrote:
> Obviously the bzr version looks like it includes the latest memory
> improvements. Are we sure that all of the extensions got compiled
> properly? I wouldn't be surprised if memory consumption was >= before if
> _static_tuple_c.c didn't get compiled properly. There were more fixes
> than just that, which should have reduced memory consumption.

It looks like everything got built correctly:
https://pastebin.canonical.com/25447/

> As for OOM... isn't this python in 64bit mode? If it was 32-bit python
> then I could certainly see us hitting the limit.

The machine ran out of swap.

> If this is python2.4,
> then I'll also not guarantee too much.

You don't have to look very hard at the log of the OOM killer Chris
provided to see that it's clearly python 2.5 being killed.

> SInce in python2.4 allocations of
> "integer" objects are never released back to the system. And I would
> offer that there are probably some other memory allocation improvements
> in python 2.5 and newer.
>
> Also, is there any way to probe what actual command is running on a
> process that hits 1GB of RAM? (What branch is it accessing? What action
> is it performing, etc.)

No :( Unless bzr serve does some of this. Is there a server side
equivalent of -Dhpss?

> I wonder if it would be useful to hook up a SIGHUP handler that would
> dump the current stack trace into .bzr.log. Do you think it would help
> debugging this? (It seems like something that we could do in a small
> plugin if it would make it easier.)

It would be nice if we could have each lp-serve process not dump
everything into the same .bzr.log file I guess, but otherwise that
sounds helpful.

Cheers,
mwh

Revision history for this message

John A Meinel (jameinel) wrote on 2009-12-07: Re: [Bug 416990] Re: Memory usage of codehosting processes is excessive

#23

Download full text (3.5 KiB)

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Michael Hudson wrote:
> John A Meinel wrote:
>> Obviously the bzr version looks like it includes the latest memory
>> improvements. Are we sure that all of the extensions got compiled
>> properly? I wouldn't be surprised if memory consumption was >= before if
>> _static_tuple_c.c didn't get compiled properly. There were more fixes
>> than just that, which should have reduced memory consumption.
>
> It looks like everything got built correctly:
> https://pastebin.canonical.com/25447/

This doesn't look right to me: vv
- -rwxr-xr-x 1 ... 285 ... _simple_set_pyx.py*
- -rw-r--r-- 1 ... 699 ... _simple_set_pyx.pyc
- -rw-r--r-- 1 ... 699 ... _simple_set_pyx.pyo
- -rwxr-xr-x 1 ... 99534 ... _simple_set_pyx.so*

On my system I see:
$ ls bzrlib/_simple_set_pyx*

bzrlib/_simple_set_pyx.c
bzrlib/_simple_set_pyx.pxd
bzrlib/_simple_set_pyx.pyx
bzrlib/_simple_set_pyx.h
bzrlib/_simple_set_pyx.pyd
bzrlib/_simple_set_pyx_api.h

Similarly here:
- -rwxr-xr-x 1 ... 285 ... _static_tuple_c.py*
- -rw-r--r-- 1 ... 699 ... _static_tuple_c.pyc
- -rw-r--r-- 1 ... 699 ... _static_tuple_c.pyo
- -rwxr-xr-x 1 ... 56062 ... _static_tuple_c.so*
- -rwxr-xr-x 1 ... 1557 ... static_tuple.py*
- -rw-r--r-- 1 ... 1301 ... static_tuple.pyc
- -rw-r--r-- 1 ... 1301 ... static_tuple.pyo
- -rwxr-xr-x 1 ... 2760 ... _static_tuple_py.py*
- -rw-r--r-- 1 ... 3529 ... _static_tuple_py.pyc
- -rw-r--r-- 1 ... 3529 ... _static_tuple_py.pyo

I have 0 idea what is creating a 285 byte .py file for a compiled
extension. However, it could very easily cause problems using the .so...

>
>> As for OOM... isn't this python in 64bit mode? If it was 32-bit python
>> then I could certainly see us hitting the limit.
>
> The machine ran out of swap.
>
>> If this is python2.4,
>> then I'll also not guarantee too much.
>
> You don't have to look very hard at the log of the OOM killer Chris
> provided to see that it's clearly python 2.5 being killed.
>
>> SInce in python2.4 allocations of
>> "integer" objects are never released back to the system. And I would
>> offer that there are probably some other memory allocation improvements
>> in python 2.5 and newer.
>>
>> Also, is there any way to probe what actual command is running on a
>> process that hits 1GB of RAM? (What branch is it accessing? What action
>> is it performing, etc.)
>
> No :( Unless bzr serve does some of this. Is there a server side
> equivalent of -Dhpss?

No, and it has been something we wanted to get for the lp guys. I think
jml was originally wanting more than that. (It is one thing to be able
to tell that 'get_stream' was requested, it is another to know that this
is caused by 'bzr branch', versus 'bzr pull', versus ...)

>
>> I wonder if it would be useful to hook up a SIGHUP handler that would
>> dump the current stack trace into .bzr.log. Do you think it would help
>> debugging this? (It seems like something that we could do in a small
>> plugin if it would make it easier.)
>
> It would be nice if we could have each lp-serve process not dump
> everything into the same .bzr.log file I guess, but otherwise that
> sounds helpful.
...

Affects		Status	Importance	Assigned to	Milestone
	Bazaar	Confirmed	High	Unassigned
	Launchpad itself	Triaged	High	Unassigned

Launchpad itself

Memory usage of codehosting processes is excessive

Bug Description

Other bug subscribers

Remote bug watches