py3.7 deadlock in threading

Bug #1782647 reported by Corey Bryant
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Designate
Fix Released
Undecided
Graham Hayes
Python
Unknown
Unknown
eventlet
Fix Released
Unknown

Bug Description

I have a feeling this is an issue with a dependency that designate is using, or even with py3.7 itself, and not directly a designate issue. I'm hitting something similar in heat and will append the bug with details there once I get them narrowed down to a simple recreate.

Things seem to get hung up in:

  File "/usr/lib/python3.7/threading.py", line 296, in wait
    waiter.acquire()

and:

  File "/usr/lib/python3.7/threading.py", line 1048, in _wait_for_tstate_lock
    elif lock.acquire(block, timeout):

To reproduce with designate, update tox.ini with the following:

--- a/tox.ini
+++ b/tox.ini
@@ -1,6 +1,6 @@
 [tox]
 minversion = 2.0
-envlist = py35,py27,flake8
+envlist = py35,py37,py27,flake8
 skipsdist = True

 [testenv]
@@ -39,6 +39,12 @@ commands =
   {[testenv]commands}
   stestr run '{posargs}'

+[testenv:py37]
+basepython = python3.7
+commands =
+ {[testenv]commands}
+ stestr run 'designate\.tests\.test_workers\.test_processing\.TestProcessingExecutor\.(test_execute_multiple_tasks)'
+
 [testenv:docs]
 basepython = python3
 deps =

Notice that while updating tox with a py3.7 target, that also runs a single test, test_execute_multiple_tasks. The test will hang and require a Control-C to cancel it. The results look like this: https://paste.ubuntu.com/p/SwXsCcghjt/

To recreate with py3.7 you can use Ubuntu Cosmic like so:

lxc launch ubuntu-daily:cosmic c1
lxc exec c1 /bin/bash
root@c1:~# git clone https://github.com/openstack/designate
root@c1:~# #update tox.ini as shown above
root@c1:~# tox -e py37

Tags: patch
description: updated
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Just tested with py3.6 and the test runs successfully without any issues.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Ok making some headway. If I:

sudo cp /usr/lib/python3.6/concurrent/futures/thread.py /usr/lib/python3.7/concurrent/futures/thread.py

And run tox -e py37, the test runs successfully without any issues.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Attaching a patch, just to show the delta between:

/usr/lib/python3.6/concurrent/futures/thread.py
/usr/lib/python3.7/concurrent/futures/thread.py

diff -Naur /usr/lib/python3.6/concurrent/futures/thread.py /usr/lib/python3.7/concurrent/futures/thread.py

Revision history for this message
Corey Bryant (corey.bryant) wrote :
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This could be an issue with py3.7 itself, still not sure, but adding python3.7 Ubuntu package as an affected project.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I was going to add heat to this bug as well, but they've moved their bugs to storyboard. That's not that convenient. Anyway heat is affected too if you update tox with py37 target and run tests.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This appears to be the line that causes the difference. If I revert back to using queue.Queue() in /usr/lib/python3.7/concurrent/futures/thread.py, the py37 tests run successfully without any issues.

- self._work_queue = queue.Queue()
+ self._work_queue = queue.SimpleQueue()

Revision history for this message
Corey Bryant (corey.bryant) wrote :

It seems this may be a py3.7 bug so I've opened: https://bugs.python.org/issue34173

description: updated
tags: added: patch
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Karthikeyan Singaravelan (xtreak) has provided a smaller testcase https://bugs.python.org/file47707/bpo34173.py that doesn't deadlock. I'm not sure what the difference is.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This has further been narrowed down to a conflict with eventlet monkeypatching of standard library thread modules. It recreates with: https://bugs.python.org/file47709/bpo34173-recreate.py.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

I've also opened the following issue for eventlet: https://github.com/eventlet/eventlet/issues/508

summary: - py3.7 possible race condition in threading
+ py3.7 deadlock in threading
no longer affects: designate/stein
Changed in designate:
milestone: none → 8.0.0.0b1
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Swapping out ThreadPoolExecutor with GreenThreadPoolExecutor gets rid of the deadlock. Joshua Harlow suggests [1]: If you have the ability to specify which executor your code is using, and you are running under eventlet I'd give preference to the green thread pool executor under that situation (and if not running under eventlet then prefer the threadpool executor variant).

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-July/132473.html

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to designate (master)

Fix proposed to branch: master
Review: https://review.openstack.org/586336

Changed in designate:
assignee: nobody → Graham Hayes (grahamhayes)
status: New → In Progress
Changed in eventlet:
status: Unknown → New
Changed in designate:
assignee: Graham Hayes (grahamhayes) → Erik Olof Gunnar Andersson (eandersson)
Changed in designate:
assignee: Erik Olof Gunnar Andersson (eandersson) → Graham Hayes (grahamhayes)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to designate (master)

Reviewed: https://review.openstack.org/586336
Committed: https://git.openstack.org/cgit/openstack/designate/commit/?id=72e4e13d8ee681cae5564ef538a44ddd96ca586c
Submitter: Zuul
Branch: master

commit 72e4e13d8ee681cae5564ef538a44ddd96ca586c
Author: Graham Hayes <email address hidden>
Date: Thu Jul 26 20:57:10 2018 +0100

    Move to GreenThreadPoolExecutor

    python3.7 and eventlet cause the `future.ThreadPoolExecutor` to hang
    indefinitely. Moving to `futurist.GreenThreadPoolExecutor` allows the
    `designate-worker` process to use native eventlet greenthreads, which bypasses
    the hanging issue.

    Closes-Bug: #1782647

    Related-Bug: https://bugs.python.org/issue34173
    Related-Bug: eventlet/eventlet#508

    Change-Id: I36c79ca72635d81cfcc8d3cc87b1bc5e0657d9e8
    Signed-off-by: Graham Hayes <email address hidden>

Changed in designate:
status: In Progress → Fix Released
Changed in python:
status: Unknown → New
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This was narrowed down to eventlet [1] therefore I'm dropping Python from affected projects.

[1] https://github.com/eventlet/eventlet/issues/508

no longer affects: python3.7 (Ubuntu)
Revision history for this message
Corey Bryant (corey.bryant) wrote :

I'm unable to drop Python from being affected but I've updated the upstream issue to note that it can be closed.

Changed in python:
status: New → Unknown
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/designate 8.0.0.0rc1

This issue was fixed in the openstack/designate 8.0.0.0rc1 release candidate.

Changed in eventlet:
status: New → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.