Heartbeat in pthreads still using greenthreads

Bug #2009138 reported by Arnaud Morin
38
This bug affects 7 people
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
New
Undecided
Unassigned
oslo.messaging
Fix Released
Undecided
Arnaud Morin

Bug Description

Context
=======
OpenStack Yoga
Nova API behind apache2 with mod_wsgi
RabbitMQ 3.9.12

Explanation
===========
When using nova with apache2/mod_wsgi, we need to set 'heartbeat_in_pthread=True' to avoid using green threads (eventlet monkey patched threads).

The python thread is mandatory to keep sending heartbeats so rabbit will not close the connection.

One other option is to completely disable the heartbeats, so the connection will only rely on tcp keepalive. But more is better.

The problem with the current heartbeat_in_pthread implementation is that some threads are still greenthreads.
The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).

We identified that oslo_messaging is connecting to rabbit for two different purpose:
- send
- listen

The current heartbeat_in_pthread=True parameter is switching heartbeat from greenthread to python thread *only for send* purpose (done in impl_rabbit.py).
For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.

As a result, for listen purpose, rabbit connections are killed.
We can see in rabbit logs:
missed heartbeats from client, timeout: 60s

We can see in nova-api logs:
Server unexpectedly closed connection.

How to reproduce
================
Start nova-api with apache mod_wsgi and set heartbeat_in_pthread=True

Monitor the current rabbitmq connection from nova:
$ ss -tnep |grep 5672

(this can be empty if nova did nothing yet)

Do an nova API call that needs rabbit, e.g. ask for a console url:
$ openstack console url show 5700ecbc-adff-41d3-88a4-f24e0b885b2e

This will create two connecitons:
ESTAB 0 0 10.42.1.165:58206 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570487 sk:1a cgroup:/ <->
ESTAB 0 0 10.42.1.165:58204 10.43.216.243:5672 timer:(keepalive,46sec,0) uid:42436 ino:422570486 sk:1b cgroup:/ <->

One is for "send" purpose, second is for "listen" purpose.

You can also see them in rabbit logs:
connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:88239:41e4b74d-c3be-47f5-8b8f-d3bd99871f46): user 'openstack' authenticated and granted access to vhost '/'
connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2): user 'openstack' authenticated and granted access to vhost '/'

You can also monitor the heartbeats going from/to rabbit:
$ tcpdump -i eth0 -nn port 5672
...
You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).

After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> closing AMQP connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:88239:2b8345ca-fc75-442f-9271-1448352bb2d2):
2023-03-03 09:54:27.932885+00:00 [erro] <0.21390.594> missed heartbeats from client, timeout: 60s

Changed in oslo.messaging:
assignee: nobody → Arnaud Morin (arnaud-morin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (master)
Changed in oslo.messaging:
status: New → In Progress
Revision history for this message
Arnaud Morin (arnaud-morin) wrote :

On this graph we can see the number of AMQP server is unreachable errors grabs in nova logs.

These errors are directly related to a connection closed by rabbit server when heartbeat are missed.

Green and yellow lines belong to two different control planes.

When the patch is applied (around 9h44 for yellow, around 10 for green), the errors are disappearing.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/876318
Committed: https://opendev.org/openstack/oslo.messaging/commit/fd2381c723fe805b17aca1f80bfff4738fbe9628
Submitter: "Zuul (22348)"
Branch: master

commit fd2381c723fe805b17aca1f80bfff4738fbe9628
Author: Arnaud Morin <email address hidden>
Date: Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections

    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).

    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.

    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138

    Signed-off-by: Arnaud Morin <email address hidden>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00

Changed in oslo.messaging:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/oslo.messaging/+/880187

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/zed)

Fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/oslo.messaging/+/880188

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.messaging (stable/yoga)

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/oslo.messaging/+/880189

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 14.3.0

This issue was fixed in the openstack/oslo.messaging 14.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/880187
Committed: https://opendev.org/openstack/oslo.messaging/commit/3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 3645839d162a989da46ba1c0fcbe7f32b48b9fb1
Author: Arnaud Morin <email address hidden>
Date: Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections

    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).

    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.

    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138

    Signed-off-by: Arnaud Morin <email address hidden>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
    (cherry picked from commit fd2381c723fe805b17aca1f80bfff4738fbe9628)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/880188
Committed: https://opendev.org/openstack/oslo.messaging/commit/4d15b7c4fe0c14e285484d23c15fe5531e952679
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 4d15b7c4fe0c14e285484d23c15fe5531e952679
Author: Arnaud Morin <email address hidden>
Date: Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections

    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).

    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.

    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138

    Signed-off-by: Arnaud Morin <email address hidden>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
    (cherry picked from commit 3645839d162a989da46ba1c0fcbe7f32b48b9fb1)

tags: added: in-stable-zed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 14.2.3

This issue was fixed in the openstack/oslo.messaging 14.2.3 release.

Revision history for this message
Dejan SANADER (d-sanader) wrote :

Hello,

How to proceed in order for this version to be picked up in 2023.1 for example ?

https://opendev.org/openstack/requirements/src/branch/stable/2023.1/upper-constraints.txt#L160

Regards.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.messaging (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/oslo.messaging/+/880189
Committed: https://opendev.org/openstack/oslo.messaging/commit/15779aa0733f3c9bd1f85fa8aea25e3bd8915a1c
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit 15779aa0733f3c9bd1f85fa8aea25e3bd8915a1c
Author: Arnaud Morin <email address hidden>
Date: Fri Mar 3 11:16:56 2023 +0100

    Disable greenthreads for RabbitDriver "listen" connections

    When enabling heartbeat_in_pthread, we were restoring the "threading"
    python library from eventlet to original one in RabbitDriver but we
    forgot to do the same in AMQPDriverBase (RabbitDriver is subclass of
    AMQPDriverBase).

    We also need to use the original "queue" so that queues are not going to
    use greenthreads as well.

    Related-bug: #1961402
    Related-bug: #1934937
    Closes-bug: #2009138

    Signed-off-by: Arnaud Morin <email address hidden>
    Change-Id: I34ea0d1381e934297df2f793e0d2594ef8254f00
    (cherry picked from commit 4d15b7c4fe0c14e285484d23c15fe5531e952679)

tags: added: in-stable-yoga
Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

Thanks for working on this so intensively and fixing things for so many releases!

This still requires new point releases of oslo.messaging for all those branches now. And then also a bump of e.g. the Ubuntu Cloud Archive for these packages to reach those deployments.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging yoga-eom

This issue was fixed in the openstack/oslo.messaging yoga-eom release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.messaging 14.0.3

This issue was fixed in the openstack/oslo.messaging 14.0.3 release.

Revision history for this message
dazhaoyu (lufus) wrote (last edit ):

can we fix this issue in xena branch?

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.