Number of heat queues will keep growing forever after restart heat-engine

Bug #1925436 reported by Hua Zhang
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Heat Charm
Fix Released
Undecided
Hua Zhang
OpenStack RabbitMQ Server Charm
Fix Released
Undecided
Hua Zhang

Bug Description

Both lp:1599104 [1] and lp:1414674 [2] don't fix the problem.

and the problem can be easily reproduced according to their bug description or one comment [3].

- Launch a heat test env

- Get queue number before the test

# rabbitmqctl list_queues -p openstack | grep -E 'engine_worker|heat-engine-listener' |wc -l
100

- restart heat-engine to trigger the problem

juju ssh heat/0 -- sudo systemctl restart heat-engine

- queue number will increase to 108

rabbitmqctl list_queues -p openstack | grep -E 'engine_worker|heat-engine-listener' |wc -l
108

Then the queue number will reduce to 104

rabbitmqctl list_queues -p openstack | grep -E 'engine_worker|heat-engine-listener' |wc -l
104

That's because there are 4 fanout queues, so they disappeared due to rabbit_transient_queues_ttl=600 [4].

while the other 4 topic queues without TTL setting will be there forever.

rabbitmqctl list_queues -p openstack | grep -E 'engine_worker|heat-engine-listener' |grep -E '42cee820-4f0c-4aef-b8b6-705e7db3253a|8c12d70c-b00c-4e9b-b33e-7cbf0cb8c510'
engine_worker.42cee820-4f0c-4aef-b8b6-705e7db3253a 0
engine_worker.8c12d70c-b00c-4e9b-b33e-7cbf0cb8c510 0
heat-engine-listener.42cee820-4f0c-4aef-b8b6-705e7db3253a 0
heat-engine-listener.8c12d70c-b00c-4e9b-b33e-7cbf0cb8c510 0

- and so it's going to produce more topic queues without TTL setting until the vicious circle.

Why does this problem happen? That's because heat is using random UUID as the queue name, and that's consistent with the design. Because heat creates workers for some tasks. Each worker is a separate process. And every worker consumes rpc calls via oslo.messaging. It is possible to run several workers on the node in parallel. So, to identify each worker it has your own uuid generated at the moment of worker creation. This uuid used in queues names.

So I think the best solution is as this patch [6] does, also modify charm to support the following TTL setting.

rabbitmqctl set_policy heat_expiry "heat-engine-listener|engine_worker" "{'expires':3600000}" -p openstack --apply-to queues --priority 1

[1] https://bugs.launchpad.net/heat/+bug/1599104
[2] https://bugs.launchpad.net/heat/+bug/1414674
[3] https://bugs.launchpad.net/heat/+bug/1599104/comments/25
[4] https://review.opendev.org/c/openstack/oslo.messaging/+/243845/5/oslo_messaging/_drivers/impl_rabbit.py#170
[5] https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L1164
[6] https://review.opendev.org/c/openstack/fuel-library/+/356272/3/files/fuel-ha-utils/policy/set_rabbitmq_policy

Tags: sts
Hua Zhang (zhhuabj)
tags: added: sts
Hua Zhang (zhhuabj)
Changed in charm-heat:
assignee: nobody → Hua Zhang (zhhuabj)
Hua Zhang (zhhuabj)
Changed in charm-heat:
status: New → Invalid
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (master)
Changed in charm-rabbitmq-server:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-heat (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/charm-heat/+/792795

Changed in charm-heat:
status: Invalid → In Progress
Revision history for this message
Hua Zhang (zhhuabj) wrote :

According to gnuoy's comment for patchset 3 in the page [1], I've refactored the code and implemented his idea. so now we have two patches:

1, heat side, see [2]
2, rabbitmq-server side, still see [1]

[1] https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/787909
[2] https://review.opendev.org/c/openstack/charm-heat/+/792795

Changed in charm-rabbitmq-server:
assignee: nobody → Hua Zhang (zhhuabj)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-rabbitmq-server (master)

Reviewed: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/787909
Committed: https://opendev.org/openstack/charm-rabbitmq-server/commit/707fa0e093af3b6ab636b65b018c29049f7a51ff
Submitter: "Zuul (22348)"
Branch: master

commit 707fa0e093af3b6ab636b65b018c29049f7a51ff
Author: Zhang Hua <email address hidden>
Date: Thu May 20 19:50:31 2021 +0800

    Number of heat queues will keep growing forever after heat-engine restarts

    Set TTL as a solution for topic queue engine_worker and heat-engine-listener
    to avoid them growing all the time after heat-engin restarts.

    This is rabbitmq-server part. eg: we can set heat ttl by:

    juju config heat ttl=3600000

    Closes-Bug: 1925436
    Change-Id: I7b826fe965a200da29020a8f2c6148f76d10a2b0

Changed in charm-rabbitmq-server:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-heat (master)

Reviewed: https://review.opendev.org/c/openstack/charm-heat/+/792795
Committed: https://opendev.org/openstack/charm-heat/commit/de88ad5344c4411dd81c600f724b760a0e8cf03f
Submitter: "Zuul (22348)"
Branch: master

commit de88ad5344c4411dd81c600f724b760a0e8cf03f
Author: Zhang Hua <email address hidden>
Date: Thu May 20 20:06:13 2021 +0800

    Number of heat queues will keep growing forever after heat-engine restarts

    Set TTL as a solution for topic queue engine_worker and heat-engine-listener
    to avoid them growing all the time after heat-engin restarts.

    This is heat part.

    Closes-Bug: 1925436
    Change-Id: I196346e4ca869efab45d1c2aafb1420b2a917d39

Changed in charm-heat:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-heat (stable/21.04)

Fix proposed to branch: stable/21.04
Review: https://review.opendev.org/c/openstack/charm-heat/+/798329

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-rabbitmq-server (stable/21.04)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-heat (stable/21.04)

Change abandoned by "Zhang Hua <email address hidden>" on branch: stable/21.04
Review: https://review.opendev.org/c/openstack/charm-heat/+/798329
Reason: The charm-rabbitmq-server fix 453b8e9 is not in stable/21.04, so this CI can't pass, the custome has applied the workaround and the code has already been in master, let's abandon it now.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on charm-rabbitmq-server (stable/21.04)

Change abandoned by "Zhang Hua <email address hidden>" on branch: stable/21.04
Review: https://review.opendev.org/c/openstack/charm-rabbitmq-server/+/798410
Reason: The charm-rabbitmq-server fix 453b8e9 is not in stable/21.04, so this CI can't pass, the custome has applied the workaround and the code has already been in master, let's abandon it now.

Changed in charm-heat:
milestone: none → 21.10
Changed in charm-rabbitmq-server:
milestone: none → 21.10
Changed in charm-heat:
status: Fix Committed → Fix Released
Changed in charm-rabbitmq-server:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.