Launchpad itself

celery branch scanners memory usage keeps growing

Bug #1017754 reported by Haw Loeung on 2012-06-26

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Launchpad itself	Fix Released	Critical	Unassigned

Bug Description

Hi,

As per https://pastebin.canonical.com/68835/, it seems the celery branch scanner workers' memory usage continues to grow. The LP incident logs shows that it was restarted once on the 22nd. lifeless suggests that this is a regression as the previous branch scanners have memory caps in place.

https://pastebin.canonical.com/68836/ shows the current limits of one of the celery worker processes. Notice that the 'Max resident set' is unlimited?

Could you please look into this?

Thanks,

Haw

Tags:

Haw Loeung (hloeung) on 2012-06-26

tags:

added: canonical-losa-lp

Revision history for this message

Haw Loeung (hloeung) wrote on 2012-06-26:

11:07 <hloeung> right, and where is it done for the existing scan_branches.py?
11:07 <hloeung> I've tried grepping for 'ulimit' in the whole source tree
11:09 <wgrant> hloeung: Hahaha
11:09 <wgrant> It's in a wrapper
11:09 <wgrant> I'm pretty sure LP doesn't do it
11:09 <wgrant> Ah no
11:09 <wgrant> There we are
11:10 <wgrant> JobRunnerProcess.runJobCommand
11:10 <wgrant> if self.job_source.memory_limit is not None:
11:10 <wgrant> soft_limit, hard_limit = getrlimit(RLIMIT_AS)
11:10 <wgrant> if soft_limit != self.job_source.memory_limit:
11:10 <wgrant> limits = (self.job_source.memory_limit,
hard_limit)
11:10 <wgrant> setrlimit(RLIMIT_AS, limits)

Laura Czajkowski (czajkowski) on 2012-06-26

Changed in launchpad:
status:	New → Triaged
importance:	Undecided → Critical

Curtis Hovey (sinzui) on 2012-10-03

tags:

added: celeryd

Revision history for this message

Haw Loeung (hloeung) wrote on 2012-11-21:

Download full text (4.5 KiB)

Still happening.

Before:

hloeung@ackee:~$ top
top - 08:04:32 up 112 days, 2:58, 1 user, load average: 2.15, 2.45, 2.41
Tasks: 240 total, 2 running, 238 sleeping, 0 stopped, 0 zombie
Cpu(s): 5.8%us, 0.3%sy, 5.6%ni, 87.7%id, 0.0%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 6112648k total, 4666020k used, 1446628k free, 16620k buffers
Swap: 2964472k total, 1391676k used, 1572796k free, 225752k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12526 bzrsyncd 20 0 1693m 1.2g 3384 S 0 21.3 20:12.90 [celeryd@ackee:
12528 bzrsyncd 20 0 1754m 862m 3420 S 0 14.5 17:24.91 [celeryd@ackee:
15598 launchpa 20 0 991m 641m 3300 R 39 10.7 17:03.14 python2.6
12527 bzrsyncd 20 0 813m 358m 3424 S 0 6.0 17:03.34 [celeryd@ackee:
25404 launchpa 36 16 626m 278m 9588 S 46 4.7 6:12.32 python2.6
28488 launchpa 20 0 608m 189m 9632 S 0 3.2 0:12.74 python2.6
13642 launchpa 20 0 608m 182m 3308 S 0 3.1 0:13.05 python2.6
13130 rabbitmq 20 0 471m 174m 1296 S 0 2.9 248:42.16 beam.smp
12484 bzrsyncd 20 0 418m 15m 2092 S 0 0.3 0:48.97 [celeryd@ackee:
1544 launchpa 20 0 162m 10m 1116 S 0 0.2 94:26.06 txlongpoll: acc
15876 bzrsyncd 20 0 317m 9.9m 1944 S 0 0.2 0:03.09 [celerybeat] --
21252 launchpa 20 0 646m 7476 2040 S 0 0.1 13:46.76 python2.6
12503 bzrsyncd 20 0 346m 6164 1940 S 0 0.1 0:34.49 [celeryd@ackee:
30661 hloeung 20 0 29164 4748 2172 S 0 0.1 0:00.22 bash

After restarting bzrsyncd celeryd:

top - 08:08:38 up 112 days, 3:02, 1 user, load average: 2.26, 2.47, 2.43
Tasks: 228 total, 1 running, 227 sleeping, 0 stopped, 0 zombie
Cpu(s): 6.1%us, 0.1%sy, 5.5%ni, 87.7%id, 0.2%wa, 0.0%hi, 0.3%si, 0.0%st
Mem: 6112648k total, 2719780k used, 3392868k free, 21220k buffers
Swap: 2964472k total, 491552k used, 2472920k free, 233112k cached

Still happening.

Before:

hloeung@ackee:~$ top
top - 08:04:32 up 112 days,  2:58,  1 user,  load average: 2.15, 2.45, 2.41
Tasks: 240 total,   2 running, 238 sleeping,   0 stopped,   0 zombie
Cpu(s):  5.8%us,  0.3%sy,  5.6%ni, 87.7%id,  0.0%wa,  0.0%hi,  0.5%si,  0.0%st
Mem:   6112648k total,  4666020k used,  1446628k free,    16620k buffers
Swap:  2964472k total,  1391676k used,  1572796k free,   225752k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
12526 bzrsyncd  20   0 1693m 1.2g 3384 S    0 21.3  20:12.90 [celeryd@ackee:    
12528 bzrsyncd  20   0 1754m 862m 3420 S    0 14.5  17:24.91 [celeryd@ackee:    
15598 launchpa  20   0  991m 641m 3300 R   39 10.7  17:03.14 python2.6          
12527 bzrsyncd  20   0  813m 358m 3424 S    0  6.0  17:03.34 [celeryd@ackee:    
25404 launchpa  36  16  626m 278m 9588 S   46  4.7   6:12.32 python2.6          
28488 launchpa  20   0  608m 189m 9632 S    0  3.2   0:12.74 python2.6          
13642 launchpa  20   0  608m 182m 3308 S    0  3.1   0:13.05 python2.6          
13130 rabbitmq  20   0  471m 174m 1296 S    0  2.9 248:42.16 beam.smp           
12484 bzrsyncd  20   0  418m  15m 2092 S    0  0.3   0:48.97 [celeryd@ackee:    
 1544 launchpa  20   0  162m  10m 1116 S    0  0.2  94:26.06 txlongpoll: acc    
15876 bzrsyncd  20   0  317m 9.9m 1944 S    0  0.2   0:03.09 [celerybeat] --    
21252 launchpa  20   0  646m 7476 2040 S    0  0.1  13:46.76 python2.6          
12503 bzrsyncd  20   0  346m 6164 1940 S    0  0.1   0:34.49 [celeryd@ackee:    
30661 hloeung   20   0 29164 4748 2172 S    0  0.1   0:00.22 bash

After restarting bzrsyncd celeryd:

top - 08:08:38 up 112 days,  3:02,  1 user,  load average: 2.26, 2.47, 2.43
Tasks: 228 total,   1 running, 227 sleeping,   0 stopped,   0 zombie
Cpu(s):  6.1%us,  0.1%sy,  5.5%ni, 87.7%id,  0.2%wa,  0.0%hi,  0.3%si,  0.0%st
Mem:   6112648k total,  2719780k used,  3392868k free,    21220k buffers
Swap:  2964472k total,   491552k used,  2472920k free,   233112k cached

PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                                                                                                             
15598 launchpa  20   0  991m 641m 3300 S   47 10.7  18:46.45 python2.6                                                                                                                           
25404 launchpa  36  16  626m 277m 9588 S   44  4.6   7:44.93 python2.6                                                                                                                           
31006 bzrsyncd  20   0  502m 200m 5852 S    0  3.4   0:07.68 [celeryd@ackee:                                                                                                                     
31008 bzrsyncd  20   0  501m 200m 5696 S    0  3.4   0:07.96 [celeryd@ackee:                                                                                                                     
31007 bzrsyncd  20   0  499m 197m 5792 S    0  3.3   0:06.45 [celeryd@ackee:                                                                                                                     
30715 launchpa  20   0  608m 189m 9672 S    0  3.2   0:12.38 python2.6                                                                                                                           
13130 rabbitmq  20   0  472m 176m 1300 S    0  2.9 248:43.05 beam.smp                                                                                                                            
30977 bzrsyncd  20   0  418m 104m 7628 S    0  1.7   0:02.76 [celeryd@ackee:                                                                                                                     
31023 bzrsyncd  20   0  410m 104m 7580 S    0  1.7   0:02.52 [celeryd@ackee:                                                                                                                     
31038 bzrsyncd  20   0  304m  97m 1136 S    0  1.6   0:00.01 [celeryd@ackee:                                                                                                                     
 1544 launchpa  20   0  162m  10m 1116 S    0  0.2  94:26.20 txlongpoll: acc                                                                                                                     
15876 bzrsyncd  20   0  317m   9m 1948 S    0  0.2   0:03.09 [celerybeat] --                                                                                                                     
21252 launchpa  20   0  646m 7476 2040 S    0  0.1  13:46.90 python2.6

Revision history for this message

Colin Watson (cjwatson) wrote on 2023-01-23:

I'm not sure exactly when this was fixed, or if it's just become very much less noticeable with the scripts unit having more memory; but https://grafana.admin.canonical.com/d/000000044/telegraf-host?orgId=1&var-juju_controller=All&var-juju_model=All&var-service=launchpad-scripts&var-juju_unit=All&var-host=All&var-mountpoint=All&from=now-30d&to=now&viewPanel=4 looks flat enough to me to suggest that this isn't actually a problem in practice any more. I'm therefore going to close this bug.

Changed in launchpad:
status:	Triaged → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.