Heartbeat in pthreads still using greenthreads
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Ubuntu Cloud Archive |
New
|
Undecided
|
Unassigned | ||
Antelope |
New
|
Undecided
|
Unassigned | ||
Bobcat |
New
|
Undecided
|
Unassigned | ||
Caracal |
New
|
Undecided
|
Unassigned | ||
Dalmation |
New
|
Undecided
|
Unassigned | ||
Yoga |
New
|
Undecided
|
Unassigned | ||
Zed |
New
|
Undecided
|
Unassigned | ||
oslo.messaging |
Fix Released
|
Undecided
|
Arnaud Morin | ||
oslo.messaging (Ubuntu) |
New
|
Undecided
|
Unassigned | ||
Jammy |
New
|
Undecided
|
Unassigned | ||
Noble |
New
|
Undecided
|
Unassigned | ||
Oracular |
New
|
Undecided
|
Unassigned |
Bug Description
Context
=======
OpenStack Yoga
Nova API behind apache2 with mod_wsgi
RabbitMQ 3.9.12
Explanation
===========
When using nova with apache2/mod_wsgi, we need to set 'heartbeat_
The python thread is mandatory to keep sending heartbeats so rabbit will not close the connection.
One other option is to completely disable the heartbeats, so the connection will only rely on tcp keepalive. But more is better.
The problem with the current heartbeat_
The result is that, some connections are correctly sending heartbeats, some others are not (and are still killed by rabbitmq after the heartbeat timeout).
We identified that oslo_messaging is connecting to rabbit for two different purpose:
- send
- listen
The current heartbeat_
For listen purpose, the thread is created by the mother class (in amqpdriver.py), which is still using greenthreads.
As a result, for listen purpose, rabbit connections are killed.
We can see in rabbit logs:
missed heartbeats from client, timeout: 60s
We can see in nova-api logs:
Server unexpectedly closed connection.
How to reproduce
================
Start nova-api with apache mod_wsgi and set heartbeat_
Monitor the current rabbitmq connection from nova:
$ ss -tnep |grep 5672
(this can be empty if nova did nothing yet)
Do an nova API call that needs rabbit, e.g. ask for a console url:
$ openstack console url show 5700ecbc-
This will create two connecitons:
ESTAB 0 0 10.42.1.165:58206 10.43.216.243:5672 timer:(
ESTAB 0 0 10.42.1.165:58204 10.43.216.243:5672 timer:(
One is for "send" purpose, second is for "listen" purpose.
You can also see them in rabbit logs:
connection <0.21408.594> (10.42.1.165:58206 -> 10.42.0.21:5672 - mod_wsgi:
connection <0.21390.594> (10.42.1.165:58204 -> 10.42.0.21:5672 - mod_wsgi:
You can also monitor the heartbeats going from/to rabbit:
$ tcpdump -i eth0 -nn port 5672
...
You will see that both connection are receiving heartbeats every 30sec, but *only one* is sending heartbeats (the one in pthread).
After few minutes, rabbit is killing the "listen" connection, as seen in rabbit logs:
2023-03-03 09:54:27.
2023-03-03 09:54:27.
Changed in oslo.messaging: | |
assignee: | nobody → Arnaud Morin (arnaud-morin) |
summary: |
- Heartbeat in pthreads still using greenthreads + [SRU] Heartbeat in pthreads still using greenthreads |
summary: |
- [SRU] Heartbeat in pthreads still using greenthreads + Heartbeat in pthreads still using greenthreads |
Fix proposed to branch: master /review. opendev. org/c/openstack /oslo.messaging /+/876318
Review: https:/