Bug #1772236 “rabbit died and everything else died” : Bugs : rabbitmq-server package : Ubuntu

Revision history for this message

Iain Lane (laney) wrote on 2018-05-20:

#1

Download full text (25.8 KiB)

OK, it got OOM killed...

[Sun May 20 03:58:24 2018] snmpd invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[Sun May 20 03:58:24 2018] snmpd cpuset=/ mems_allowed=0
[Sun May 20 03:58:24 2018] CPU: 0 PID: 27747 Comm: snmpd Tainted: G OE 4.4.0-116-generic #140-Ubuntu
[Sun May 20 03:58:24 2018] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[Sun May 20 03:58:24 2018] 0000000000000286 e1b1ea220281c262 ffff8800785079d8 ffffffff813ffc13
[Sun May 20 03:58:24 2018] ffff880078507b90 ffff88007b410e00 ffff880078507a48 ffffffff8121012e
[Sun May 20 03:58:24 2018] ffff880011910e00 ffff88007b411150 ffff88007b410e00 ffff880078507b90
[Sun May 20 03:58:24 2018] Call Trace:
[Sun May 20 03:58:24 2018] [<ffffffff813ffc13>] dump_stack+0x63/0x90
[Sun May 20 03:58:24 2018] [<ffffffff8121012e>] dump_header+0x5a/0x1c5
[Sun May 20 03:58:24 2018] [<ffffffff811968f2>] oom_kill_process+0x202/0x3c0
[Sun May 20 03:58:24 2018] [<ffffffff81196d19>] out_of_memory+0x219/0x460
[Sun May 20 03:58:24 2018] [<ffffffff8119cd45>] __alloc_pages_slowpath.constprop.88+0x965/0xb00
[Sun May 20 03:58:24 2018] [<ffffffff8119d168>] __alloc_pages_nodemask+0x288/0x2a0
[Sun May 20 03:58:24 2018] [<ffffffff811e6ccc>] alloc_pages_current+0x8c/0x110
[Sun May 20 03:58:24 2018] [<ffffffff81192e0b>] __page_cache_alloc+0xab/0xc0
[Sun May 20 03:58:24 2018] [<ffffffff81195390>] filemap_fault+0x150/0x400
[Sun May 20 03:58:24 2018] [<ffffffff812a81d6>] ext4_filemap_fault+0x36/0x50
[Sun May 20 03:58:24 2018] [<ffffffff811c2216>] __do_fault+0x56/0xf0
[Sun May 20 03:58:24 2018] [<ffffffff811c5d75>] handle_mm_fault+0xfa5/0x1820
[Sun May 20 03:58:24 2018] [<ffffffff811bf088>] ? list_lru_add+0x58/0x120
[Sun May 20 03:58:24 2018] [<ffffffff81215c33>] ? __fput+0x193/0x230
[Sun May 20 03:58:24 2018] [<ffffffff8106c747>] __do_page_fault+0x197/0x400
[Sun May 20 03:58:24 2018] [<ffffffff8106ca17>] trace_do_page_fault+0x37/0xe0
[Sun May 20 03:58:24 2018] [<ffffffff81064fb9>] do_async_page_fault+0x19/0x70
[Sun May 20 03:58:24 2018] [<ffffffff81851a08>] async_page_fault+0x28/0x30
[Sun May 20 03:58:24 2018] Mem-Info:
[Sun May 20 03:58:24 2018] active_anon:464532 inactive_anon:6207 isolated_anon:0
                            active_file:94 inactive_file:82 isolated_file:0
                            unevictable:913 dirty:0 writeback:0 unstable:0
                            slab_reclaimable:6604 slab_unreclaimable:4637
                            mapped:855 shmem:6557 pagetables:2796 bounce:0
                            free:14212 free_pcp:113 free_cma:0
[Sun May 20 03:58:24 2018] Node 0 DMA free:10180kB min:356kB low:444kB high:532kB active_anon:5320kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:4kB slab_reclaimable:40kB slab_unreclaimable:76kB kernel_stack:32kB pagetables:76kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Sun May 20 03:58:24 2018] lowmem_reserve[]: 0 1945 1945 1945 1945
[Sun May 20 03:58:24 2018] Node 0 DMA32 fr...

OK, it got OOM killed...

[Sun May 20 03:58:24 2018] snmpd invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[Sun May 20 03:58:24 2018] snmpd cpuset=/ mems_allowed=0
[Sun May 20 03:58:24 2018] CPU: 0 PID: 27747 Comm: snmpd Tainted: G           OE   4.4.0-116-generic #140-Ubuntu
[Sun May 20 03:58:24 2018] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[Sun May 20 03:58:24 2018]  0000000000000286 e1b1ea220281c262 ffff8800785079d8 ffffffff813ffc13
[Sun May 20 03:58:24 2018]  ffff880078507b90 ffff88007b410e00 ffff880078507a48 ffffffff8121012e
[Sun May 20 03:58:24 2018]  ffff880011910e00 ffff88007b411150 ffff88007b410e00 ffff880078507b90
[Sun May 20 03:58:24 2018] Call Trace:
[Sun May 20 03:58:24 2018]  [<ffffffff813ffc13>] dump_stack+0x63/0x90
[Sun May 20 03:58:24 2018]  [<ffffffff8121012e>] dump_header+0x5a/0x1c5
[Sun May 20 03:58:24 2018]  [<ffffffff811968f2>] oom_kill_process+0x202/0x3c0
[Sun May 20 03:58:24 2018]  [<ffffffff81196d19>] out_of_memory+0x219/0x460
[Sun May 20 03:58:24 2018]  [<ffffffff8119cd45>] __alloc_pages_slowpath.constprop.88+0x965/0xb00
[Sun May 20 03:58:24 2018]  [<ffffffff8119d168>] __alloc_pages_nodemask+0x288/0x2a0
[Sun May 20 03:58:24 2018]  [<ffffffff811e6ccc>] alloc_pages_current+0x8c/0x110
[Sun May 20 03:58:24 2018]  [<ffffffff81192e0b>] __page_cache_alloc+0xab/0xc0
[Sun May 20 03:58:24 2018]  [<ffffffff81195390>] filemap_fault+0x150/0x400
[Sun May 20 03:58:24 2018]  [<ffffffff812a81d6>] ext4_filemap_fault+0x36/0x50
[Sun May 20 03:58:24 2018]  [<ffffffff811c2216>] __do_fault+0x56/0xf0
[Sun May 20 03:58:24 2018]  [<ffffffff811c5d75>] handle_mm_fault+0xfa5/0x1820
[Sun May 20 03:58:24 2018]  [<ffffffff811bf088>] ? list_lru_add+0x58/0x120
[Sun May 20 03:58:24 2018]  [<ffffffff81215c33>] ? __fput+0x193/0x230
[Sun May 20 03:58:24 2018]  [<ffffffff8106c747>] __do_page_fault+0x197/0x400
[Sun May 20 03:58:24 2018]  [<ffffffff8106ca17>] trace_do_page_fault+0x37/0xe0
[Sun May 20 03:58:24 2018]  [<ffffffff81064fb9>] do_async_page_fault+0x19/0x70
[Sun May 20 03:58:24 2018]  [<ffffffff81851a08>] async_page_fault+0x28/0x30
[Sun May 20 03:58:24 2018] Mem-Info:
[Sun May 20 03:58:24 2018] active_anon:464532 inactive_anon:6207 isolated_anon:0
                            active_file:94 inactive_file:82 isolated_file:0
                            unevictable:913 dirty:0 writeback:0 unstable:0
                            slab_reclaimable:6604 slab_unreclaimable:4637
                            mapped:855 shmem:6557 pagetables:2796 bounce:0
                            free:14212 free_pcp:113 free_cma:0
[Sun May 20 03:58:24 2018] Node 0 DMA free:10180kB min:356kB low:444kB high:532kB active_anon:5320kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:4kB slab_reclaimable:40kB slab_unreclaimable:76kB kernel_stack:32kB pagetables:76kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Sun May 20 03:58:24 2018] lowmem_reserve[]: 0 1945 1945 1945 1945
[Sun May 20 03:58:24 2018] Node 0 DMA32 free:46668kB min:44696kB low:55868kB high:67044kB active_anon:1852808kB inactive_anon:24828kB active_file:376kB inactive_file:328kB unevictable:3652kB isolated(anon):0kB isolated(file):0kB present:2080760kB managed:2032356kB mlocked:3652kB dirty:0kB writeback:0kB mapped:3412kB shmem:26224kB slab_reclaimable:26376kB slab_unreclaimable:18472kB kernel_stack:4704kB pagetables:11108kB unstable:0kB bounce:0kB free_pcp:452kB local_pcp:452kB free_cma:0kB writeback_tmp:0kB pages_scanned:6632 all_unreclaimable? yes
[Sun May 20 03:58:24 2018] lowmem_reserve[]: 0 0 0 0 0
[Sun May 20 03:58:24 2018] Node 0 DMA: 11*4kB (MEH) 13*8kB (EH) 7*16kB (MEH) 6*32kB (MEH) 4*64kB (UMEH) 6*128kB (UMEH) 4*256kB (UEH) 3*512kB (UMH) 2*1024kB (UE) 0*2048kB 1*4096kB (M) = 10180kB
[Sun May 20 03:58:24 2018] Node 0 DMA32: 375*4kB (UME) 504*8kB (ME) 515*16kB (ME) 304*32kB (UMEH) 158*64kB (UMEH) 72*128kB (UME) 13*256kB (UME) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 46668kB
[Sun May 20 03:58:24 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Sun May 20 03:58:24 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sun May 20 03:58:24 2018] 7361 total pagecache pages
[Sun May 20 03:58:24 2018] 0 pages in swap cache
[Sun May 20 03:58:24 2018] Swap cache stats: add 0, delete 0, find 0/0
[Sun May 20 03:58:24 2018] Free swap  = 0kB
[Sun May 20 03:58:24 2018] Total swap = 0kB
[Sun May 20 03:58:24 2018] 524188 pages RAM
[Sun May 20 03:58:24 2018] 0 pages HighMem/MovableOnly
[Sun May 20 03:58:24 2018] 12122 pages reserved
[Sun May 20 03:58:24 2018] 0 pages cma reserved
[Sun May 20 03:58:24 2018] 0 pages hwpoisoned
[Sun May 20 03:58:24 2018] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Sun May 20 03:58:24 2018] [  369]     0   369     9560     1571      23       3        0             0 systemd-journal
[Sun May 20 03:58:24 2018] [  407]     0   407    25742       47      17       3        0             0 lvmetad
[Sun May 20 03:58:24 2018] [  859]     0   859     4030      550      12       3        0             0 dhclient
[Sun May 20 03:58:24 2018] [ 1014]     0  1014     6511      405      18       3        0             0 atd
[Sun May 20 03:58:24 2018] [ 1016]     0  1016     7007      427      18       3        0             0 cron
[Sun May 20 03:58:24 2018] [ 1020]     0  1020     1099      303       8       3        0             0 acpid
[Sun May 20 03:58:24 2018] [ 1023]     0  1023   161181     1176      32       4        0             0 lxcfs
[Sun May 20 03:58:24 2018] [ 1037]     0  1037    68717      218      37       3        0             0 accounts-daemon
[Sun May 20 03:58:24 2018] [ 1042]     0  1042     7248      462      19       3        0             0 systemd-logind
[Sun May 20 03:58:24 2018] [ 1187]     0  1187    14553      384      33       3        0             0 lldpd
[Sun May 20 03:58:24 2018] [ 1188]   104  1188    64098      441      28       3        0             0 rsyslogd
[Sun May 20 03:58:24 2018] [ 1190]     0  1190     1305       28       9       3        0             0 iscsid
[Sun May 20 03:58:24 2018] [ 1191]     0  1191     1430      876      10       3        0           -17 iscsid
[Sun May 20 03:58:24 2018] [ 1201]   107  1201    10754      565      25       4        0          -900 dbus-daemon
[Sun May 20 03:58:24 2018] [ 1221]     0  1221    16378      438      34       4        0         -1000 sshd
[Sun May 20 03:58:24 2018] [ 1231]     0  1231     3343       35      11       3        0             0 mdadm
[Sun May 20 03:58:24 2018] [ 1236]     0  1236    69271      200      39       3        0             0 polkitd
[Sun May 20 03:58:24 2018] [ 1357]     0  1357     3693      404      12       3        0             0 agetty
[Sun May 20 03:58:24 2018] [ 1363]     0  1363     3739      343      12       3        0             0 agetty
[Sun May 20 03:58:24 2018] [ 1500]   115  1500    14553      140      30       3        0             0 lldpd
[Sun May 20 03:58:24 2018] [ 1626]     0  1626    16352      225      22       3        0             0 master
[Sun May 20 03:58:24 2018] [ 1628]   113  1628    16931      274      25       3        0             0 qmgr
[Sun May 20 03:58:24 2018] [ 8193]   112  8193    27508      448      25       3        0             0 ntpd
[Sun May 20 03:58:24 2018] [21233]     0 21233     4999      437      14       3        0             0 bash
[Sun May 20 03:58:24 2018] [21243]     0 21243   161743     2478      60       5        0             0 jujud
[Sun May 20 03:58:24 2018] [21247]     0 21247     4999      480      13       3        0             0 bash
[Sun May 20 03:58:24 2018] [21259]     0 21259    79816     1621      46       5        0             0 jujud
[Sun May 20 03:58:24 2018] [21263]     0 21263     4999      441      14       3        0             0 bash
[Sun May 20 03:58:24 2018] [21276]     0 21276     4998      435      14       3        0             0 bash
[Sun May 20 03:58:24 2018] [21288]     0 21288    77767     1533      45       5        0             0 jujud
[Sun May 20 03:58:24 2018] [21292]     0 21292    94151     1530      46       5        0             0 jujud
[Sun May 20 03:58:24 2018] [27747]   114 27747    16278     1446      33       3        0             0 snmpd
[Sun May 20 03:58:24 2018] [21253]     0 21253    11061      550      23       3        0         -1000 systemd-udevd
[Sun May 20 03:58:24 2018] [11614]     0 11614    54309     3139      33       5        0          -900 snapd
[Sun May 20 03:58:24 2018] [28631]     0 28631    80033    63165     158       3        0             0 landscape-clien
[Sun May 20 03:58:24 2018] [28632]   118 28632   146433    44539     171       4        0             0 landscape-broke
[Sun May 20 03:58:24 2018] [28633]   118 28633   113785    34742     156       3        0             0 landscape-monit
[Sun May 20 03:58:24 2018] [28634]     0 28634    59208    34125     118       3        0             0 landscape-manag
[Sun May 20 03:58:24 2018] [14730]   117 14730    11319      244      27       3        0             0 systemd
[Sun May 20 03:58:24 2018] [14735]   117 14735    15298      473      31       3        0             0 (sd-pam)
[Sun May 20 03:58:24 2018] [14795]   117 14795     7088      798      17       3        0             0 epmd
[Sun May 20 03:58:24 2018] [18864]   117 18864     1126      158       8       3        0             0 rabbitmq-server
[Sun May 20 03:58:24 2018] [18879]   117 18879     1126      407       8       3        0             0 rabbitmq-server
[Sun May 20 03:58:24 2018] [19495]   117 19495   375498   223557     492       5        0             0 beam
[Sun May 20 03:58:24 2018] [21147]   117 21147     1876       33       9       3        0             0 inet_gethost
[Sun May 20 03:58:24 2018] [21173]   117 21173     2406       31      10       3        0             0 inet_gethost
[Sun May 20 03:58:24 2018] [24842]   113 24842    16869      126      24       3        0             0 pickup
[Sun May 20 03:58:24 2018] [28731]   118 28731    66531    38805     136       3        0             0 landscape-packa
[Sun May 20 03:58:24 2018] [28754]     0 28754    12310      504      28       3        0             0 cron
[Sun May 20 03:58:24 2018] [28756]     0 28756     1126      179       8       3        0             0 sh
[Sun May 20 03:58:24 2018] [28758]     0 28758     2886      411      10       3        0             0 collect_rabbitm
[Sun May 20 03:58:24 2018] [28772]   113 28772    16895      525      24       3        0             0 cleanup
[Sun May 20 03:58:24 2018] [28773]   113 28773    16871      315      25       3        0             0 trivial-rewrite
[Sun May 20 03:58:24 2018] [28775]     0 28775     2886      195      10       3        0             0 collect_rabbitm
[Sun May 20 03:58:24 2018] [28777]     0 28777    16879      353      22       4        0             0 local
[Sun May 20 03:58:24 2018] [28786]   113 28786    20741      589      30       4        0             0 smtp
[Sun May 20 03:58:24 2018] [28829]   113 28829    16877      154      25       3        0             0 bounce
[Sun May 20 03:58:24 2018] [28864]   113 28864    20741      291      33       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28865]   113 28865    16868      345      24       3        0             0 scache
[Sun May 20 03:58:24 2018] [28868]   113 28868    20741      439      32       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28870]   113 28870    16877      158      25       3        0             0 bounce
[Sun May 20 03:58:24 2018] [28873]   113 28873    20741      584      32       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28874]     0 28874     1126      300       8       3        0             0 rabbitmqctl
[Sun May 20 03:58:24 2018] [28875]     0 28875     5943      314      16       3        0             0 awk
[Sun May 20 03:58:24 2018] [28886]   113 28886    20741      325      31       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28895]     0 28895    12826      334      29       3        0             0 su
[Sun May 20 03:58:24 2018] [28896]   113 28896    20741      398      32       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28898]   117 28898     1126      137       8       3        0             0 sh
[Sun May 20 03:58:24 2018] [28899]   117 28899    76068     2386      28       3        0             0 beam
[Sun May 20 03:58:24 2018] [28936]   113 28936    20741      530      33       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28947]   113 28947    20741      538      31       3        0             0 smtp
[Sun May 20 03:58:24 2018] [28950]   117 28950    58660     2215      25       3        0             0 beam
[Sun May 20 03:58:24 2018] Out of memory: Kill process 19495 (beam) score 437 or sacrifice child
[Sun May 20 03:58:24 2018] Killed process 21147 (inet_gethost) total-vm:7504kB, anon-rss:80kB, file-rss:52kB
[Sun May 20 03:58:25 2018] smtp invoked oom-killer: gfp_mask=0x24201ca, order=0, oom_score_adj=0
[Sun May 20 03:58:25 2018] smtp cpuset=/ mems_allowed=0
[Sun May 20 03:58:25 2018] CPU: 0 PID: 28936 Comm: smtp Tainted: G           OE   4.4.0-116-generic #140-Ubuntu
[Sun May 20 03:58:25 2018] Hardware name: OpenStack Foundation OpenStack Nova, BIOS Bochs 01/01/2011
[Sun May 20 03:58:25 2018]  0000000000000286 5029c97da2c199a0 ffff8800168239d8 ffffffff813ffc13
[Sun May 20 03:58:25 2018]  ffff880016823b90 ffff88007b410e00 ffff880016823a48 ffffffff8121012e
[Sun May 20 03:58:25 2018]  0000000000000015 0000000000000000 ffff880076770240 ffff880076736200
[Sun May 20 03:58:25 2018] Call Trace:
[Sun May 20 03:58:25 2018]  [<ffffffff813ffc13>] dump_stack+0x63/0x90
[Sun May 20 03:58:25 2018]  [<ffffffff8121012e>] dump_header+0x5a/0x1c5
[Sun May 20 03:58:25 2018]  [<ffffffff81397c44>] ? apparmor_capable+0xc4/0x1b0
[Sun May 20 03:58:25 2018]  [<ffffffff811968f2>] oom_kill_process+0x202/0x3c0
[Sun May 20 03:58:25 2018]  [<ffffffff81196d19>] out_of_memory+0x219/0x460
[Sun May 20 03:58:25 2018]  [<ffffffff8119cd45>] __alloc_pages_slowpath.constprop.88+0x965/0xb00
[Sun May 20 03:58:25 2018]  [<ffffffff8119d168>] __alloc_pages_nodemask+0x288/0x2a0
[Sun May 20 03:58:25 2018]  [<ffffffff811e6ccc>] alloc_pages_current+0x8c/0x110
[Sun May 20 03:58:25 2018]  [<ffffffff81192e0b>] __page_cache_alloc+0xab/0xc0
[Sun May 20 03:58:25 2018]  [<ffffffff81195390>] filemap_fault+0x150/0x400
[Sun May 20 03:58:25 2018]  [<ffffffff812a81d6>] ext4_filemap_fault+0x36/0x50
[Sun May 20 03:58:25 2018]  [<ffffffff811c2216>] __do_fault+0x56/0xf0
[Sun May 20 03:58:25 2018]  [<ffffffff811c5d75>] handle_mm_fault+0xfa5/0x1820
[Sun May 20 03:58:25 2018]  [<ffffffff8184e359>] ? schedule_hrtimeout_range_clock+0xb9/0x1b0
[Sun May 20 03:58:25 2018]  [<ffffffff8125db5b>] ? ep_poll+0x37b/0x3d0
[Sun May 20 03:58:25 2018]  [<ffffffff8106c747>] __do_page_fault+0x197/0x400
[Sun May 20 03:58:25 2018]  [<ffffffff810aec00>] ? wake_up_q+0x70/0x70
[Sun May 20 03:58:25 2018]  [<ffffffff8106ca17>] trace_do_page_fault+0x37/0xe0
[Sun May 20 03:58:25 2018]  [<ffffffff81064fb9>] do_async_page_fault+0x19/0x70
[Sun May 20 03:58:25 2018]  [<ffffffff81851a08>] async_page_fault+0x28/0x30
[Sun May 20 03:58:25 2018] Mem-Info:
[Sun May 20 03:58:25 2018] active_anon:464521 inactive_anon:6207 isolated_anon:0
                            active_file:138 inactive_file:212 isolated_file:0
                            unevictable:913 dirty:0 writeback:0 unstable:0
                            slab_reclaimable:6604 slab_unreclaimable:4637
                            mapped:1015 shmem:6557 pagetables:2780 bounce:0
                            free:14215 free_pcp:72 free_cma:0
[Sun May 20 03:58:25 2018] Node 0 DMA free:10180kB min:356kB low:444kB high:532kB active_anon:5320kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:8kB shmem:4kB slab_reclaimable:40kB slab_unreclaimable:76kB kernel_stack:32kB pagetables:76kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
[Sun May 20 03:58:25 2018] lowmem_reserve[]: 0 1945 1945 1945 1945
[Sun May 20 03:58:25 2018] Node 0 DMA32 free:46680kB min:44696kB low:55868kB high:67044kB active_anon:1852764kB inactive_anon:24828kB active_file:552kB inactive_file:848kB unevictable:3652kB isolated(anon):0kB isolated(file):0kB present:2080760kB managed:2032356kB mlocked:3652kB dirty:0kB writeback:0kB mapped:4052kB shmem:26224kB slab_reclaimable:26376kB slab_unreclaimable:18472kB kernel_stack:4704kB pagetables:11044kB unstable:0kB bounce:0kB free_pcp:288kB local_pcp:288kB free_cma:0kB writeback_tmp:0kB pages_scanned:47088 all_unreclaimable? yes
[Sun May 20 03:58:25 2018] lowmem_reserve[]: 0 0 0 0 0
[Sun May 20 03:58:25 2018] Node 0 DMA: 13*4kB (MEH) 13*8kB (EH) 7*16kB (MEH) 6*32kB (MEH) 4*64kB (UMEH) 6*128kB (UMEH) 4*256kB (UEH) 3*512kB (UMH) 2*1024kB (UE) 0*2048kB 1*4096kB (M) = 10188kB
[Sun May 20 03:58:25 2018] Node 0 DMA32: 354*4kB (UMEH) 502*8kB (UMEH) 516*16kB (UME) 309*32kB (UMEH) 157*64kB (UMEH) 72*128kB (UME) 13*256kB (UME) 1*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 46680kB
[Sun May 20 03:58:25 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB
[Sun May 20 03:58:25 2018] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
[Sun May 20 03:58:25 2018] 7521 total pagecache pages
[Sun May 20 03:58:25 2018] 0 pages in swap cache
[Sun May 20 03:58:25 2018] Swap cache stats: add 0, delete 0, find 0/0
[Sun May 20 03:58:25 2018] Free swap  = 0kB
[Sun May 20 03:58:25 2018] Total swap = 0kB
[Sun May 20 03:58:25 2018] 524188 pages RAM
[Sun May 20 03:58:25 2018] 0 pages HighMem/MovableOnly
[Sun May 20 03:58:25 2018] 12122 pages reserved
[Sun May 20 03:58:25 2018] 0 pages cma reserved
[Sun May 20 03:58:25 2018] 0 pages hwpoisoned
[Sun May 20 03:58:25 2018] [ pid ]   uid  tgid total_vm      rss nr_ptes nr_pmds swapents oom_score_adj name
[Sun May 20 03:58:25 2018] [  369]     0   369     9560     1345      23       3        0             0 systemd-journal
[Sun May 20 03:58:25 2018] [  407]     0   407    25742       47      17       3        0             0 lvmetad
[Sun May 20 03:58:25 2018] [  859]     0   859     4030      550      12       3        0             0 dhclient
[Sun May 20 03:58:25 2018] [ 1014]     0  1014     6511      405      18       3        0             0 atd
[Sun May 20 03:58:25 2018] [ 1016]     0  1016     7007      427      18       3        0             0 cron
[Sun May 20 03:58:25 2018] [ 1020]     0  1020     1099      303       8       3        0             0 acpid
[Sun May 20 03:58:25 2018] [ 1023]     0  1023   161181     1176      32       4        0             0 lxcfs
[Sun May 20 03:58:25 2018] [ 1037]     0  1037    68717      218      37       3        0             0 accounts-daemon
[Sun May 20 03:58:25 2018] [ 1042]     0  1042     7248      462      19       3        0             0 systemd-logind
[Sun May 20 03:58:25 2018] [ 1187]     0  1187    14553      384      33       3        0             0 lldpd
[Sun May 20 03:58:25 2018] [ 1188]   104  1188    64098      453      28       3        0             0 rsyslogd
[Sun May 20 03:58:25 2018] [ 1190]     0  1190     1305       28       9       3        0             0 iscsid
[Sun May 20 03:58:25 2018] [ 1191]     0  1191     1430      876      10       3        0           -17 iscsid
[Sun May 20 03:58:25 2018] [ 1201]   107  1201    10754      565      25       4        0          -900 dbus-daemon
[Sun May 20 03:58:25 2018] [ 1221]     0  1221    16378      438      34       4        0         -1000 sshd
[Sun May 20 03:58:25 2018] [ 1231]     0  1231     3343       35      11       3        0             0 mdadm
[Sun May 20 03:58:25 2018] [ 1236]     0  1236    69271      200      39       3        0             0 polkitd
[Sun May 20 03:58:25 2018] [ 1357]     0  1357     3693      404      12       3        0             0 agetty
[Sun May 20 03:58:25 2018] [ 1363]     0  1363     3739      343      12       3        0             0 agetty
[Sun May 20 03:58:25 2018] [ 1500]   115  1500    14553      140      30       3        0             0 lldpd
[Sun May 20 03:58:25 2018] [ 1626]     0  1626    16352      225      22       3        0             0 master
[Sun May 20 03:58:25 2018] [ 1628]   113  1628    16931      274      25       3        0             0 qmgr
[Sun May 20 03:58:25 2018] [ 8193]   112  8193    27508      448      25       3        0             0 ntpd
[Sun May 20 03:58:25 2018] [21233]     0 21233     4999      437      14       3        0             0 bash
[Sun May 20 03:58:25 2018] [21243]     0 21243   161743     2478      60       5        0             0 jujud
[Sun May 20 03:58:25 2018] [21247]     0 21247     4999      480      13       3        0             0 bash
[Sun May 20 03:58:25 2018] [21259]     0 21259    79816     1621      46       5        0             0 jujud
[Sun May 20 03:58:25 2018] [21263]     0 21263     4999      441      14       3        0             0 bash
[Sun May 20 03:58:25 2018] [21276]     0 21276     4998      435      14       3        0             0 bash
[Sun May 20 03:58:25 2018] [21288]     0 21288    77767     1533      45       5        0             0 jujud
[Sun May 20 03:58:25 2018] [21292]     0 21292    94151     1530      46       5        0             0 jujud
[Sun May 20 03:58:25 2018] [27747]   114 27747    16278     1152      33       3        0             0 snmpd
[Sun May 20 03:58:25 2018] [21253]     0 21253    11061      550      23       3        0         -1000 systemd-udevd
[Sun May 20 03:58:25 2018] [11614]     0 11614    54309     3139      33       5        0          -900 snapd
[Sun May 20 03:58:25 2018] [28631]     0 28631    80033    63165     158       3        0             0 landscape-clien
[Sun May 20 03:58:25 2018] [28632]   118 28632   146433    44539     171       4        0             0 landscape-broke
[Sun May 20 03:58:25 2018] [28633]   118 28633   113785    34742     156       3        0             0 landscape-monit
[Sun May 20 03:58:25 2018] [28634]     0 28634    59208    34125     118       3        0             0 landscape-manag
[Sun May 20 03:58:25 2018] [14730]   117 14730    11319      244      27       3        0             0 systemd
[Sun May 20 03:58:25 2018] [14735]   117 14735    15298      473      31       3        0             0 (sd-pam)
[Sun May 20 03:58:25 2018] [14795]   117 14795     7088      798      17       3        0             0 epmd
[Sun May 20 03:58:25 2018] [18864]   117 18864     1126      158       8       3        0             0 rabbitmq-server
[Sun May 20 03:58:25 2018] [18879]   117 18879     1126      407       8       3        0             0 rabbitmq-server
[Sun May 20 03:58:25 2018] [19495]   117 19495   375420   223547     492       5        0             0 beam
[Sun May 20 03:58:25 2018] [24842]   113 24842    16869      126      24       3        0             0 pickup
[Sun May 20 03:58:25 2018] [28731]   118 28731    66531    39192     136       3        0             0 landscape-packa
[Sun May 20 03:58:25 2018] [28754]     0 28754    12310      504      28       3        0             0 cron
[Sun May 20 03:58:25 2018] [28756]     0 28756     1126      179       8       3        0             0 sh
[Sun May 20 03:58:25 2018] [28758]     0 28758     2886      411      10       3        0             0 collect_rabbitm
[Sun May 20 03:58:25 2018] [28772]   113 28772    16895      525      24       3        0             0 cleanup
[Sun May 20 03:58:25 2018] [28773]   113 28773    16871      315      25       3        0             0 trivial-rewrite
[Sun May 20 03:58:25 2018] [28775]     0 28775     2886      195      10       3        0             0 collect_rabbitm
[Sun May 20 03:58:25 2018] [28777]     0 28777    16879      353      22       4        0             0 local
[Sun May 20 03:58:25 2018] [28786]   113 28786    20741      589      30       4        0             0 smtp
[Sun May 20 03:58:25 2018] [28829]   113 28829    16877      154      25       3        0             0 bounce
[Sun May 20 03:58:25 2018] [28864]   113 28864    20741      291      33       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28865]   113 28865    16868      345      24       3        0             0 scache
[Sun May 20 03:58:25 2018] [28868]   113 28868    20741      439      32       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28870]   113 28870    16877      158      25       3        0             0 bounce
[Sun May 20 03:58:25 2018] [28873]   113 28873    20741      584      32       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28874]     0 28874     1126      300       8       3        0             0 rabbitmqctl
[Sun May 20 03:58:25 2018] [28875]     0 28875     5943      314      16       3        0             0 awk
[Sun May 20 03:58:25 2018] [28886]   113 28886    20741      325      31       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28895]     0 28895    12826      334      29       3        0             0 su
[Sun May 20 03:58:25 2018] [28896]   113 28896    20741      398      32       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28898]   117 28898     1126      137       8       3        0             0 sh
[Sun May 20 03:58:25 2018] [28899]   117 28899    76068     2386      28       3        0             0 beam
[Sun May 20 03:58:25 2018] [28936]   113 28936    20741      530      33       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28947]   113 28947    20741      538      31       3        0             0 smtp
[Sun May 20 03:58:25 2018] [28950]   117 28950    58660     2270      25       3        0             0 beam
[Sun May 20 03:58:25 2018] Out of memory: Kill process 19495 (beam) score 437 or sacrifice child
[Sun May 20 03:58:25 2018] Killed process 19495 (beam) total-vm:1501680kB, anon-rss:893148kB, file-rss:1040kB

Revision history for this message

Iain Lane (laney) wrote on 2018-05-20:

#2

also: we should have been notified other than the workers shitting themselves

maybe consider multiple servers as a mitigation: https://www.rabbitmq.com/ha.html

Revision history for this message

Iain Lane (laney) wrote on 2018-06-08:

#3

I've added delivery_mode=2 which is supposed to make the server write the messages to disk, so maybe next time we won't lose all queued requests.

Revision history for this message

Iain Lane (laney) wrote on 2018-06-08:

#4

Worth noting that we ran for many months before hitting this, and then:

ubuntu@juju-prod-ues-proposed-migration-machine-1:~$ dmesg -T | grep "Out of memory: Kill" | uniq
[Wed May 9 16:06:23 2018] Out of memory: Kill process 1408 (beam) score 215 or sacrifice child
[Sun May 20 03:58:24 2018] Out of memory: Kill process 19495 (beam) score 437 or sacrifice child
[Fri Jun 1 02:28:15 2018] Out of memory: Kill process 6569 (beam) score 428 or sacrifice child
[Fri Jun 8 05:37:54 2018] Out of memory: Kill process 1142 (beam) score 434 or sacrifice child

4 times in a month.

One thing that happened "sort of" around this time is

https://launchpad.net/ubuntu/+source/erlang/1:18.3-dfsg-1ubuntu3.1

but I have a vague correlation here, no causation.

Does seem like rabbit's memory usage grows over time until it's eventually killed.

Revision history for this message

Iain Lane (laney) wrote on 2018-06-08:

#5

For autopkgtest-cloud I just cowboyed a change to add Restart=on-failure to rabbitmq-server.service. Maybe that helps us mitigate, in combination with delivery_mode=2.

Revision history for this message

Christian Ehrhardt  (paelzer) wrote on 2018-06-15:

#6

I first thought we could log some data like:
$ rabbitmqctl list_queues name durable owner_pid messages_ready messages_unacknowledged messages messages_ready_ram messages_unacknowledged_ram messages_ram messages_persistent message_bytes message_bytes_ram message_bytes_persistent memory state
via cron.
But then we don't know what exactly we look for yet.

I found that the service oriented
$ rabbitmqctl report
has all the data you could want.
If we don't gather it too often, and maybe even gzip it

In my test it had 7.5k raw and 2.6k zipped.
A real case might be bigger, but if we do that hourly or so we would see which element grows over time.

Especially interesting is the definition of the base memory counter:
memory Bytes of memory consumed by the Erlang process associated with the queue, including
stack, heap and internal structures.

Yeah could be useful next time this happens.

Revision history for this message

Skia (hyask) wrote on 2024-07-29:

#7

There has been many various improvements on that front:
* RabbitMQ's memory consumption has been studied closely, bringing many fixes in autopkgtest-cloud.
* We have a watchdog restarting RabbitMQ when things go bad.
* This watchdog has been exercised a lot before the memory consumption fixes and the code is able to deal with a RabbitMQ restart correctly.

Changed in auto-package-testing:
status:	New → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	Auto Package Testing	Fix Released	Undecided	Unassigned
	rabbitmq-server (Ubuntu)	New	Undecided	Unassigned

Ubuntu
rabbitmq-server package

rabbit died and everything else died

Bug Description

Other bug subscribers

Remote bug watches

Ubunturabbitmq-server package

rabbit died and everything else died

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
rabbitmq-server package