Comment 0 for bug 1598229

Revision history for this message
Richard (rkuo) wrote :

Bug Description:

Encountered a memory leak with corosync on all three nodes in a cluster:

Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.525991] Out of memory: Kill process 4846 (corosync) score 941 or sacrifice child
Jun 13 20:36:35 XXXXXXXXX1 kernel: [929808.620411] Killed process 4846 (corosync) total-vm:267928256kB, anon-rss:257475632kB, file-rss:37816kB
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.069557] Out of memory: Kill process 27791 (corosync) score 938 or sacrifice child
Jun 29 02:26:17 XXXXXXXXX1 kernel: [2247790.166524] Killed process 27791 (corosync) total-vm:265216168kB, anon-rss:255941644kB, file-rss:28580kB

Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.615377] Out of memory: Kill process 5167 (corosync) score 943 or sacrifice child
Jun 14 14:00:03 XXXXXXXXX2 kernel: [993027.709419] Killed process 5167 (corosync) total-vm:265023016kB, anon-rss:256668244kB, file-rss:33844kB
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.617203] Out of memory: Kill process 27073 (corosync) score 941 or sacrifice child
Jun 28 22:56:30 XXXXXXXXX2 kernel: [2235753.713521] Killed process 27073 (corosync) total-vm:261875792kB, anon-rss:255939160kB, file-rss:24760kB
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.096937] Out of memory: Kill process 5422 (corosync) score 942 or sacrifice child
Mar 21 22:19:17 XXXXXXXXX2 kernel: [956727.191025] Killed process 5422 (corosync) total-vm:264643868kB, anon-rss:256189360kB, file-rss:33976kB
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.359940] Out of memory: Kill process 5183 (corosync) score 927 or sacrifice child
Apr 26 00:30:04 XXXXXXXXX2 kernel: [1017203.455015] Killed process 5183 (corosync) total-vm:271136904kB, anon-rss:251953372kB, file-rss:33760kB

Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.347836] Out of memory: Kill process 24183 (corosync) score 937 or sacrifice child
Jun 29 09:00:02 XXXXXXXXX3 kernel: [2276334.444000] Killed process 24183 (corosync) total-vm:270476488kB, anon-rss:255257476kB, file-rss:32248kB
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.041372] Out of memory: Kill process 5088 (corosync) score 941 or sacrifice child
Mar 22 04:58:18 XXXXXXXXX3 kernel: [979377.135414] Killed process 5088 (corosync) total-vm:265582012kB, anon-rss:255851792kB, file-rss:36000kB
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.175029] Out of memory: Kill process 5255 (corosync) score 925 or sacrifice child
Apr 26 09:26:02 XXXXXXXXX3 kernel: [1014911.270203] Killed process 5255 (corosync) total-vm:269154272kB, anon-rss:251736288kB, file-rss:35740kB
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942502.987771] Out of memory: Kill process 5230 (corosync) score 940 or sacrifice child
Jun 13 22:46:23 XXXXXXXXX3 kernel: [942503.081826] Killed process 5230 (corosync) total-vm:265560916kB, anon-rss:256339740kB, file-rss:33788kB

The memory leak was confirmed through an analysis of atop logs where it was observed that memory utilization by corosync would go from 47% to 97% over the course of several days before corosync was then killed.

The are many memory leaks identified for the current version of corosync in MOS6.1

# dpkg -l | grep corosync
ii corosync 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync-common4 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework, common library

Steps to reproduce:

Unsure how to reproduce at this point, as logging is not detailed enough.

Expected results:

Impact:

corosync has crashed relatively frequently on all three nodes, however

Environment description:

- Operation system: Ubuntu 14.04.2 LTS - 3.13.0-61-generic
- Versions of components:

# dpkg -l | egrep 'corosync|pacemaker'
ii corosync 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework (daemon and modules)
ii crmsh 2.1.0-1~u14.04+mos1 all CRM shell for the pacemaker cluster manager
ii libcorosync-common4 2.3.4-0u~u14.04+mos1 amd64 Standards-based cluster framework, common library
ii pacemaker 1.1.12-0u~u14.04+mos6.1 amd64 HA cluster resource manager
ii pacemaker-cli-utils 1.1.12-0u~u14.04+mos6.1 amd64 Command line interface utilities for Pacemaker

# uname -r
3.13.0-61-generic
- Reference architecture:
MOS6.1 - unable to provide more information due to restrictions, but at scale
- Network model:
Neutron+GRE+vlan
- Related projects installed:
N/A