Heartbeat not sent for listen connections waiting for a message

Bug #2035113 reported by Arnaud Morin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
oslo.messaging
Fix Released
High
Arnaud Morin

Bug Description

When using heartbeat_in_pthreads = True, the threading and queue libraries are not eventlet monkey patched anymore for heartbeats (see [1])

Because of this, when waiting for a message, the queue.get(block=True) (see [2]) completely blocks the thread, preventing it to send heartbeats.

After a while, the connection could be dropped by rabbitmq because of missing heartbeats.

Note that it also depends on rpc_response_timeout value, which, by luck, is 60 sec by default (same as heartbeat timeout), so with default values this bug is not triggered, but if you try to increase rpc_response_timeout to 300secs and stop nova-conductor, you will see some nova-compute RPC connection beeing killed by rabbitmq servers because of misses heartbeats.

[1] https://github.com/openstack/oslo.messaging/blob/7705b4f3023e0e63f3b37e9a25c774f309fec55e/oslo_messaging/_drivers/impl_rabbit.py#L630-L657
[2] https://github.com/openstack/oslo.messaging/blob/7705b4f3023e0e63f3b37e9a25c774f309fec55e/oslo_messaging/_drivers/amqpdriver.py#L441

Changed in oslo.messaging:
assignee: nobody → Arnaud Morin (arnaud-morin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)
Changed in oslo.messaging:
status: New → In Progress
Changed in oslo.messaging:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/894731
Committed: https://opendev.org/openstack/oslo.messaging/commit/b62208a54c4cfabf641bd9e6e5b91e3e24d54455
Submitter: "Zuul (22348)"
Branch: master

commit b62208a54c4cfabf641bd9e6e5b91e3e24d54455
Author: Arnaud Morin <email address hidden>
Date: Tue Sep 12 11:18:58 2023 +0200

    Use StopWatch timer when waiting for message

    When waiting for a message in a queue, the queue.get(block=True) prevent
    the heartbeats to be sent at correct interval.

    So instead of blocking the thread, doing a loop using a StopWatch timer
    until the timeout is reached.

    Closes-Bug: #2035113

    Signed-off-by: Arnaud Morin <email address hidden>
    Change-Id: Ie5cf5d2bd281508bcd2db1409f18ad96b0822639

Changed in oslo.messaging:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.