Comment 11 for bug 2015065

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

I looked at the stack trace of the blocked thread from https://bugs.launchpad.net/neutron/+bug/2015065/comments/8 (thanks Yatin for collecting the trace!)

Based on https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_bdb/882413/2/check/grenade/bdb653c/job-output.txt the environment uses eventlet 0.33.1 and urllib3 1.26.12.

The first interesting step at the stacktrace: /usr/local/lib/python3.10/dist-packages/urllib3/util/connection.py:28 in is_connection_dropped

Which is https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/connectionpool.py#L272

So urllib try to check if the existing client connection is still usable or got disconnected
https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/connection.py#L28

It calls wait_for_read(sock, timeout=0.0)
So it checks if it can read from the socket with 0.0 timeout

https://github.com/urllib3/urllib3/blob/a5b29ac1025f9bb30f2c9b756f3b171389c2c039/src/urllib3/util/wait.py#L84-L85

That 0.0 timeout is passed to python's select.select
https://docs.python.org/3.10/library/select.html#select.select
"The optional timeout argument specifies a time-out as a floating point number in seconds. When the timeout argument is omitted the function blocks until at least one file descriptor is ready. A time-out value of zero specifies a poll and never blocks."
So that select.select called with 0.0 should never block

BUT

In our env the envtlet monkey patching is changing python's select.select hence the stack trace points to /usr/local/lib/python3.10/dist-packages/eventlet/green/select.py:80 in select

https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L30-L80C32

Looking at that code it seems enventlet sets a timer with the timeout value via hub.schedule_call_global Here I'm getting lost in the eventlet code but I assume sheduling a timer with 0.0 timeout in eventlet can be racy based on the comment in https://github.com/eventlet/eventlet/blob/88ec603404b2ed25c610dead75d4693c7b3e8072/eventlet/green/select.py#L62-L69

So one could argue that what we see is an eventlet bug as select.select with timeout=0.0 should not ever block but it does block in our case.