greenlet.error: cannot switch to a different thread

Bug #2039346 reported by Gregory Thiemonge
24
This bug affects 4 people
Affects Status Importance Assigned to Milestone
octavia
New
Undecided
Unassigned
oslo.log
Fix Released
Critical
Unassigned

Bug Description

Observed in a CI job: octavia-v2-dsvm-scenario-traffic-ops-jobboard on master

The creation of a LB failed (LB stuck in PENDING_CREATE), then the worker was stuck

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_7a5/898232/1/check/octavia-v2-dsvm-scenario-traffic-ops-jobboard/7a5e18f/controller/logs/screen-o-cw.txt

Oct 13 14:42:12.058434 np0035492616 octavia-worker[100079]: DEBUG cotyledon._service [-] Run service ConsumerService(1) [100079] {{(pid=100079) wait_forever /opt/stack/data/venv/lib/python3.10/site-packages/cotyledon/_service.py:241}}
Oct 13 14:42:12.059230 np0035492616 octavia-worker[100079]: INFO octavia.controller.queue.v2.consumer [-] Starting V2 consumer...
Oct 13 14:45:31.303244 np0035492616 octavia-worker[100077]: INFO octavia.controller.queue.v2.endpoints [-] Creating load balancer 'f327ad7e-d41f-4271-9aa2-b8d3c2ce645a'...
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: Traceback (most recent call last):
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: timer()
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/hubs/timer.py", line 59, in __call__
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: cb(*args, **kw)
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: File "/opt/stack/data/venv/lib/python3.10/site-packages/eventlet/semaphore.py", line 147, in _do_acquire
Oct 13 14:45:31.309590 np0035492616 octavia-worker[100077]: waiter.switch()
Oct 13 14:45:31.310105 np0035492616 octavia-worker[100077]: greenlet.error: cannot switch to a different thread

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

based on opensearch, it's happened twice in the last 30 days:

(note that the run reported in the launchpad doesn't appear in opensearch)

octavia-v2-dsvm-cinder-amphora master
octavia-v2-dsvm-tls-barbican master

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

also happened in a 2023.2 job https://zuul.opendev.org/t/openstack/build/ab5f5f781550426e8e4cdbeb3900ecf1

octavia-grenade
FAILURE
Change 896384,1
Project openstack/octavia
Branch stable/2023.2
Pipeline check

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

OK further discussions made me aware that the issue is a regression casued by https://review.opendev.org/c/openstack/oslo.log/+/852443 .

Changed in oslo.log:
status: New → In Progress
Revision history for this message
Daniel Bengtsson (damani42) wrote :

Hi,

I will take of the backport 2024.1, 2023.2 and 2023.1.

Changed in oslo.log:
importance: Undecided → Critical
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (stable/2024.1)

Fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/oslo.log/+/914262

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :

Any reason this could NOT be backported to Zed as well?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/oslo.log/+/914266

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/oslo.log/+/914267

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/914190
Committed: https://opendev.org/openstack/oslo.log/commit/a1fe1b9cfb841d632aa582a52c106b4ebab1c159
Submitter: "Zuul (22348)"
Branch: master

commit a1fe1b9cfb841d632aa582a52c106b4ebab1c159
Author: Vasyl Saienko <email address hidden>
Date: Tue Mar 26 12:33:16 2024 +0200

    Fix eventlet detection

    Eventlet may be installed, but not used for example projects
    like octavia. Improve autodetection mechanism by trying to import
    module and check if it is actually patched.

    Closes-Bug: #2039346
    Change-Id: I860abe953ce945bb5152c77a7daeb6aa1003512b

Changed in oslo.log:
status: In Progress → Fix Released
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

> Any reason this could NOT be backported to Zed as well?
The change which we suspect introduced this problem has never been backported to zed.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (stable/2024.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/914262
Committed: https://opendev.org/openstack/oslo.log/commit/a6c4f8a9c62ae264f819716177f3da899eaf1a54
Submitter: "Zuul (22348)"
Branch: stable/2024.1

commit a6c4f8a9c62ae264f819716177f3da899eaf1a54
Author: Vasyl Saienko <email address hidden>
Date: Tue Mar 26 12:33:16 2024 +0200

    Fix eventlet detection

    Eventlet may be installed, but not used for example projects
    like octavia. Improve autodetection mechanism by trying to import
    module and check if it is actually patched.

    Closes-Bug: #2039346
    Change-Id: I860abe953ce945bb5152c77a7daeb6aa1003512b
    (cherry picked from commit a1fe1b9cfb841d632aa582a52c106b4ebab1c159)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/914266
Committed: https://opendev.org/openstack/oslo.log/commit/287e138adf66c28b7c127dce99a8af05df00225e
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 287e138adf66c28b7c127dce99a8af05df00225e
Author: Vasyl Saienko <email address hidden>
Date: Tue Mar 26 12:33:16 2024 +0200

    Fix eventlet detection

    Eventlet may be installed, but not used for example projects
    like octavia. Improve autodetection mechanism by trying to import
    module and check if it is actually patched.

    Closes-Bug: #2039346
    Change-Id: I860abe953ce945bb5152c77a7daeb6aa1003512b
    (cherry picked from commit a1fe1b9cfb841d632aa582a52c106b4ebab1c159)
    (cherry picked from commit a6c4f8a9c62ae264f819716177f3da899eaf1a54)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/914267
Committed: https://opendev.org/openstack/oslo.log/commit/b159daea8c0453e99cea901b7e01793bb244fde6
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit b159daea8c0453e99cea901b7e01793bb244fde6
Author: Vasyl Saienko <email address hidden>
Date: Tue Mar 26 12:33:16 2024 +0200

    Fix eventlet detection

    Eventlet may be installed, but not used for example projects
    like octavia. Improve autodetection mechanism by trying to import
    module and check if it is actually patched.

    Closes-Bug: #2039346
    Change-Id: I860abe953ce945bb5152c77a7daeb6aa1003512b
    (cherry picked from commit a1fe1b9cfb841d632aa582a52c106b4ebab1c159)
    (cherry picked from commit a6c4f8a9c62ae264f819716177f3da899eaf1a54)
    (cherry picked from commit 287e138adf66c28b7c127dce99a8af05df00225e)

Revision history for this message
Christian Rohmann (christian-rohmann) wrote :
Download full text (3.8 KiB)

>> Any reason this could NOT be backported to Zed as well?
>The change which we suspect introduced this problem has never been backported to zed.

OK, maybe the cause is something else then as we are regularly observe issues across multiple daemons:
(This on Yoga, but are just about to upgrade to Zed, thus my question about backporting any fixed to Zed)

nova-conductor:

```
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: Traceback (most recent call last):
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: timer()
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: cb(*args, **kw)
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: waiter.switch()
Mar 06 13:59:29 ctrl-03 nova-conductor[6503]: greenlet.error: cannot switch to a different thread

```

nova-compute:

```
Mar 26 14:42:29 comp-23 nova-compute[3154276]: Traceback (most recent call last):
Mar 26 14:42:29 comp-23 nova-compute[3154276]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Mar 26 14:42:29 comp-23 nova-compute[3154276]: timer()
Mar 26 14:42:29 comp-23 nova-compute[3154276]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Mar 26 14:42:29 comp-23 nova-compute[3154276]: cb(*args, **kw)
Mar 26 14:42:29 comp-23 nova-compute[3154276]: File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
Mar 26 14:42:29 comp-23 nova-compute[3154276]: waiter.switch()
Mar 26 14:42:29 comp-23 nova-compute[3154276]: greenlet.error: cannot switch to a different thread
```

neutron-server:
```
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: Traceback (most recent call last):
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: timer()
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: cb(*args, **kw)
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: File "/usr/lib/python3/dist-packages/eventlet/semaphore.py", line 152, in _do_acquire
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: waiter.switch()
Mar 26 14:42:32 ctrl-01 neutron-server[7192]: greenlet.error: cannot switch to a different thread
Mar 26 14:56:04 ctrl-01 neutron-server[7191]: Traceback (most recent call last):
Mar 26 14:56:04 ctrl-01 neutron-server[7191]: File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 476, in fire_timers
Mar 26 14:56:04 ctrl-01 neutron-server[7191]: timer()
Mar 26 14:56:04 ctrl-01 neutron-server[7191]: File "/usr/lib/python3/dist-packages/eventlet/hubs/timer.py", line 59, in __call__
Mar 26 14:56:04 ctrl-01 neutron-server[7191]: cb(*arg...

Read more...

Revision history for this message
Herve Beraud (herveberaud) wrote :

Here is a summary of the recent discussions we had about this topic.

The root cause of this eventlet issue is: https://github.com/eventlet/eventlet/issues/432

Apparently the problem come from the patched version of RLock maintained in eventlet.

The problem is that that's RLock module is a non sense. This RLock things is one of the main reasons that motivated us to move away from eventlet, see the community goal proposal for more details https://review.opendev.org/c/openstack/governance/+/902585

So, we shouldn't wait for a fix of the root cause on the eventlet side. This RLock clone is too far from the current implementations of CPython and we don't have the resources to address such a colossal work.

In parallel, oslo.log received a couple of fixes to address that issue on the Openstack side:

- https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17
- https://opendev.org/openstack/oslo.log/commit/de615d9370681a2834cebe88acfa81b919da340c
- https://review.opendev.org/c/openstack/oslo.log/+/914190

Unfortunately, stars are not aligned concerning this topic...

Indeed, years ago we transitioned a couple of oslo stable libraries to the independent release model. oslo.log was part of these deliverables. So during yoga and zed, oslo.log was considered as an independent deliverable, out of coordinated releases, and so, without stable branches related to yoga and zed.

So we are not able to backport these fixes to these not existing stable branches. Operators should either relies on recent versions of the oslo.log library, or manually patch their runtimes with the 3 previous patches.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.log 5.3.1

This issue was fixed in the openstack/oslo.log 5.3.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.log 5.5.1

This issue was fixed in the openstack/oslo.log 5.5.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.