amqp_changed is invoked by update_clients on every run of update-status hook.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack RabbitMQ Server Charm |
Fix Released
|
High
|
Jorge Niedbalski |
Bug Description
[Environment]
Xenial
Charms 17.02 (upgraded from 16.07)
[Description]
After chasing down a problem with a long-running update-status hook (for reference
the update-status hook process was spawned at 21:42:08 wait4(194023, [{WIFEXITED(s) && WEXITSTATUS(s) == 0}], 0, NULL) = 194023 and terminated nearly 1 hour after 22:40:33 exit_group(0)).
I traced the process using this script [0] and identified a continuous loop over the following system calls:
21:53:28 fcntl(3, F_SETLK, {l_type=F_WRLCK, l_whence=SEEK_SET, l_start=1073741825, l_len=1}) = 0
21:53:28 stat("/
21:53:28 stat("/
21:53:28 open("/
21:53:28 fstat(4, {st_mode=
21:53:28 geteuid() = 0
21:53:28 fchown(4, 0, 0) = 0
21:53:28 fstat(4, {st_mode=
21:53:28 lseek(4, 0, SEEK_SET) = 0
--
21:53:28 unlink(
21:53:28 fcntl(3, F_SETLK, {l_type=F_RDLCK, l_whence=SEEK_SET, l_start=1073741826, l_len=510}) = 0
21:53:28 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=1073741824, l_len=2}) = 0
21:53:28 fcntl(3, F_SETLK, {l_type=F_UNLCK, l_whence=SEEK_SET, l_start=0, l_len=0}) = 0
21:53:28 stat("/
21:53:28 pipe([4, 5]) = 0
--
The only part of the code making extensive use of the peer store/retrieve semantics
is located here [1].
Also, the following log entry are displayed every 5 secs or so:
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
unit-rabbitmq-
Which are being printed on this line of code [2]
[Observations and possible fix]
The commit [3] added a explicit call to update_clients which is executed
on every hook run (including update-status) , the update_clients function iterates
over all the units related through the amqp relation and makes a call to the amqp_changed
function, then it iterates over the peer units setting the relation_settings for
each relation_id.
As the update_status hook interval is set to run every 5 minutes , isn't
convenient to do this iteration on this hook, particularly in large deployments
when the amqp relation is shared among multiple units and this behavior might
cause an unwanted workload overhead on the cloud.
For reference during the execution of this update-status hook 504 relation-set
executions were recorded:
$ ack-grep 'relation-set' strace.194010 | wc -l
504
[0] https:/
[1] https:/
[2] https:/
[3] https:/
tags: | added: sts |
Changed in charm-rabbitmq-server: | |
status: | New → In Progress |
importance: | Undecided → High |
assignee: | nobody → Jorge Niedbalski (niedbalski) |
tags: | added: backport-potential |
Changed in charm-rabbitmq-server: | |
milestone: | none → 17.11 |
Changed in charm-rabbitmq-server: | |
status: | Fix Committed → Fix Released |
Fix proposed to branch: master /review. openstack. org/504511
Review: https:/