gnocchi-metricd uses all memory and get killed by OOM killer
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Gnocchi |
Fix Released
|
Critical
|
Julien Danjou | ||
1.3 |
Fix Released
|
Critical
|
Julien Danjou |
Bug Description
When starting gnocchi-metricd, it begins using an ever increasing amount of memory up until it segfaults and exits, leaving behind defunct PIDs:
(...)
[60290.151162] gnocchi-
[60860.395947] ntpd invoked oom-killer: gfp_mask=0x201da, order=0, oom_score_adj=0
[60860.395956] ntpd cpuset=/ mems_allowed=0
[60860.395969] CPU: 0 PID: 1761 Comm: ntpd Tainted: G D 3.13.0-46-generic #75-Ubuntu
[60860.395983] Hardware name: OpenStack Foundation OpenStack Nova, BIOS 1.7.5-20150310_
[60860.395986] 0000000000000000 ffff880427b7f968 ffffffff817212c6 ffff88042793b000
[60860.395992] ffff880427b7f9f0 ffffffff8171bb81 0000000000000000 0000000000000000
[60860.395994] 0000000000000000 0000000000000000 0000000000000000 0000000000000000
[60860.395996] Call Trace:
[60860.396062] [<ffffffff81721
[60860.396074] [<ffffffff8171b
[60860.396091] [<ffffffff81152
[60860.396113] [<ffffffff812d7
[60860.396117] [<ffffffff81153
[60860.396127] [<ffffffff81159
[60860.396141] [<ffffffff81197
[60860.396148] [<ffffffff8114f
[60860.396151] [<ffffffff81150
[60860.396158] [<ffffffff81175
[60860.396173] [<ffffffff8108e
[60860.396176] [<ffffffff81179
[60860.396184] [<ffffffff81077
[60860.396190] [<ffffffff8172d
[60860.396194] [<ffffffff8107a
[60860.396197] [<ffffffff8107a
[60860.396212] [<ffffffff81013
[60860.396215] [<ffffffff81077
[60860.396219] [<ffffffff81077
[60860.396222] [<ffffffff8172d
[60860.396225] [<ffffffff8172c
[60860.396228] [<ffffffff81729
[60860.396231] Mem-Info:
[60860.396233] Node 0 DMA per-cpu:
[60860.396236] CPU 0: hi: 0, btch: 1 usd: 0
[60860.396238] CPU 1: hi: 0, btch: 1 usd: 0
[60860.396239] CPU 2: hi: 0, btch: 1 usd: 0
[60860.396241] CPU 3: hi: 0, btch: 1 usd: 0
[60860.396242] Node 0 DMA32 per-cpu:
[60860.396244] CPU 0: hi: 186, btch: 31 usd: 0
[60860.396246] CPU 1: hi: 186, btch: 31 usd: 11
[60860.396247] CPU 2: hi: 186, btch: 31 usd: 30
[60860.396249] CPU 3: hi: 186, btch: 31 usd: 0
[60860.396250] Node 0 Normal per-cpu:
[60860.396252] CPU 0: hi: 186, btch: 31 usd: 0
[60860.396253] CPU 1: hi: 186, btch: 31 usd: 79
[60860.396255] CPU 2: hi: 186, btch: 31 usd: 0
[60860.396256] CPU 3: hi: 186, btch: 31 usd: 0
[60860.396261] active_anon:3996921 inactive_anon:134 isolated_anon:0
[60860.396261] active_file:165 inactive_file:175 isolated_file:0
[60860.396261] unevictable:0 dirty:52 writeback:0 unstable:0
[60860.396261] free:33844 slab_reclaimabl
[60860.396261] mapped:206 shmem:160 pagetables:9651 bounce:0
[60860.396261] free_cma:0
[60860.396265] Node 0 DMA free:15908kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimabl
[60860.396272] lowmem_reserve[]: 0 2847 15902 15902
[60860.396275] Node 0 DMA32 free:64108kB min:12088kB low:15108kB high:18132kB active_
[60860.396281] lowmem_reserve[]: 0 0 13054 13054
[60860.396284] Node 0 Normal free:55360kB min:55424kB low:69280kB high:83136kB active_
[60860.396306] lowmem_reserve[]: 0 0 0 0
[60860.396309] Node 0 DMA: 1*4kB (U) 0*8kB 0*16kB 1*32kB (U) 2*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (R) 3*4096kB (M) = 15908kB
[60860.396322] Node 0 DMA32: 91*4kB (UE) 94*8kB (UEM) 65*16kB (UEM) 96*32kB (UEM) 70*64kB (UEM) 50*128kB (UEM) 32*256kB (U) 30*512kB (UE) 24*1024kB (UM) 0*2048kB 0*4096kB = 64236kB
[60860.396335] Node 0 Normal: 294*4kB (UEM) 183*8kB (UEM) 203*16kB (UE) 168*32kB (UEM) 93*64kB (UEM) 40*128kB (UE) 53*256kB (UEM) 35*512kB (UE) 2*1024kB (UM) 0*2048kB 0*4096kB = 55872kB
[60860.396347] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_
[60860.396349] 558 total pagecache pages
[60860.396350] 0 pages in swap cache
[60860.396351] Swap cache stats: add 0, delete 0, find 0/0
[60860.396352] Free swap = 0kB
[60860.396353] Total swap = 0kB
[60860.396354] 4194173 pages RAM
[60860.396355] 0 pages HighMem/MovableOnly
[60860.396355] 65843 pages reserved
[60860.396356] [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name
[60860.396363] [ 386] 0 386 4868 69 13 0 0 upstart-udev-br
[60860.396365] [ 402] 0 402 12443 197 27 0 -1000 systemd-udevd
[60860.396368] [ 543] 0 543 3814 57 12 0 0 upstart-socket-
[60860.396372] [ 583] 0 583 5855 74 17 0 0 rpcbind
[60860.396374] [ 612] 108 612 5385 123 16 0 0 rpc.statd
[60860.396376] [ 645] 0 645 2555 575 7 0 0 dhclient
[60860.396378] [ 895] 102 895 9804 94 23 0 0 dbus-daemon
[60860.396380] [ 911] 0 911 3818 64 12 0 0 upstart-file-br
[60860.396382] [ 942] 0 942 10862 96 26 0 0 systemd-logind
[60860.396384] [ 1041] 0 1041 3634 47 12 0 0 getty
[60860.396385] [ 1043] 0 1043 3634 45 12 0 0 getty
[60860.396387] [ 1045] 101 1045 65018 437 30 0 0 rsyslogd
[60860.396389] [ 1046] 0 1046 6926 58 18 0 0 rpc.idmapd
[60860.396391] [ 1051] 0 1051 3634 48 11 0 0 getty
[60860.396392] [ 1053] 0 1053 3634 48 12 0 0 getty
[60860.396394] [ 1055] 0 1055 3634 47 12 0 0 getty
[60860.396396] [ 1086] 0 1086 15342 178 33 0 -1000 sshd
[60860.396398] [ 1094] 0 1094 5913 62 18 0 0 cron
[60860.396399] [ 1095] 0 1095 4784 43 13 0 0 atd
[60860.396401] [ 1109] 0 1109 1091 42 8 0 0 acpid
[60860.396403] [ 1184] 0 1184 4796 65 15 0 0 irqbalance
[60860.396404] [ 1436] 109 1436 243611 1008 64 0 0 icinga2
[60860.396406] [ 1485] 107 1485 11417 694 26 0 0 snmpd
[60860.396408] [ 1569] 0 1569 3634 46 12 0 0 getty
[60860.396409] [ 1585] 0 1585 3196 46 12 0 0 getty
[60860.396411] [ 1761] 106 1761 6804 138 18 0 0 ntpd
[60860.396413] [13118] 0 13118 26408 251 55 0 0 sshd
[60860.396415] [21356] 1000 21356 26408 249 52 0 0 sshd
[60860.396424] [22434] 1000 22434 5390 562 15 0 0 bash
[60860.396426] [ 5471] 0 5471 17492 126 38 0 0 sudo
[60860.396428] [ 6430] 0 6430 16330 126 37 0 0 su
[60860.396429] [ 6467] 0 6467 5409 593 15 0 0 bash
[60860.396432] [26130] 0 26130 22118 438 46 0 0 apache2
[60860.396434] [27421] 1002 27421 1037105 373121 1000 0 0 apache2
[60860.396436] [27423] 1002 27423 1100679 405184 1052 0 0 apache2
[60860.396437] [27425] 33 27425 112125 2102 79 0 0 apache2
[60860.396439] [27426] 33 27426 111911 1736 77 0 0 apache2
[60860.396441] [14799] 0 14799 43879 11714 90 0 0 gnocchi-metricd
[60860.396442] [15022] 0 15022 3408476 3167088 6308 0 0 gnocchi-metricd
[60860.396445] [26084] 1002 26084 705711 33719 301 0 0 apache2
[60860.396446] Out of memory: Kill process 15022 (gnocchi-metricd) score 755 or sacrifice child
[60860.401764] Killed process 15022 (gnocchi-metricd) total-vm:
[60863.305984] init: gnocchi-metricd main process ended, respawning
This issue happens with 1, 2 and 3 workers in an instance with 16GB of RAM. I'm using CEPH for storage, please let me know if I can add any more information (I'm uploading gnocchi.conf).
Thanks, regards.
Changed in gnocchi: | |
milestone: | none → 2.0.0 |
status: | Fix Committed → Fix Released |
Which version of Gnocchi?