Memory Leak when new cluster configuration is formed.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
corosync (Ubuntu) |
Fix Released
|
High
|
Jorge Niedbalski | ||
Trusty |
Fix Released
|
High
|
Jorge Niedbalski | ||
Wily |
Won't Fix
|
High
|
Jorge Niedbalski |
Bug Description
[Environment]
Trusty 14.04.3
Packages:
ii corosync 2.3.3-1ubuntu1 amd64 Standards-based cluster framework (daemon and modules)
ii libcorosync-common4 2.3.3-1ubuntu1 amd64 Standards-based cluster framework, common library
[Reproducer]
1) I deployed an HA environment using this bundle (http://
with a 3 nodes installation of cinder related to an HACluster subordinate unit.
$ juju-deployer -c next-ha.yaml -w 600 trusty-kilo
2) I changed the default corosync transport mode to unicast.
$ juju set cinder-hacluster corosync_
3) I assured that the 3 units were quorated
cinder/0# corosync-quorumtool
Votequorum information
-------
Expected votes: 3
Highest expected: 3
Total votes: 3
Quorum: 2
Flags: Quorate
Membership information
-------
Nodeid Votes Name
1002 1 10.5.1.57 (local)
1001 1 10.5.1.58
1000 1 10.5.1.59
The primary unit was holding the VIP resource 10.5.105.1/16
root@juju-
2: eth0: <BROADCAST,
link/ether fa:16:3e:d2:19:6f brd ff:ff:ff:ff:ff:ff
inet 10.5.1.57/16 brd 10.5.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet 10.5.105.1/16 brd 10.5.255.255 scope global secondary eth0
valid_lft forever preferred_lft forever
4) I manually added a TC queue for the eth0 interface on the node holding the VIP resource, introducing a 350 ms delay.
$ sudo tc qdisc add dev eth0 root netem delay 350ms
5) Right after adding the 350ms on the cinder/0 unit, the corosync process informs that one of the processors failed, and is forming a new
cluster configuration.
Mar 28 21:57:41 juju-niedbalski
Mar 28 22:00:48 juju-niedbalski
Mar 28 22:00:48 juju-niedbalski
Mar 28 22:00:48 juju-niedbalski
This happens on all of the units.
6) After receiving this message, I remove the queue from eth0:
$ sudo tc qdisk del dev eth0 root netem
Then, the following statement is written in the master node:
Mar 28 22:00:48 juju-niedbalski
Mar 28 22:00:48 juju-niedbalski
Mar 28 22:00:48 juju-niedbalski
7) While executing 5 and 6 repeatedly, I ran the following command to track the VSZ and RSS memory usage of the
corosync process:
root@juju-
root@juju-
$ sudo while true; do ps -o vsz,rss -p $(pgrep corosync) 2>&1 | grep -E '.*[0-9]+.*' | tee -a memory-usage.log && sleep 1; done
The results shows that both vsz and rss are increased over time at a high ratio.
25476 4036
... (after 5 minutes).
135644 10352
[Fix]
So preliminary based on this reproducer, I think that this commit (https:/
is a good candidate to be backported in Ubuntu Trusty.
[Test Case]
* See reproducer
[Backport Impact]
* Not identified
summary: |
- Memory Leak when new configuration is formed. + Memory Leak when new cluster configuration is formed. |
tags: | added: sts-needs-review |
description: | updated |
Changed in corosync (Ubuntu): | |
status: | New → In Progress |
Changed in corosync (Ubuntu Trusty): | |
status: | New → In Progress |
Changed in corosync (Ubuntu): | |
importance: | Undecided → High |
Changed in corosync (Ubuntu Trusty): | |
importance: | Undecided → High |
Changed in corosync (Ubuntu): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in corosync (Ubuntu Trusty): | |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
Changed in corosync (Ubuntu Wily): | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
The attachment "Xenial Patch" seems to be a debdiff. The ubuntu-sponsors team has been subscribed to the bug report so that they can review and hopefully sponsor the debdiff. If the attachment isn't a patch, please remove the "patch" flag from the attachment, remove the "patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe the team.
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]