Logging blocks on race condition under eventlet

Bug #1420788 reported by Alexander Makarov on 2015-02-11
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Identity (keystone)
Medium
Alexander Makarov
Icehouse
Medium
Unassigned
Juno
Medium
Boris Bobrov

Bug Description

Wrong initialization order makes logging block on race condition under eventlet.

bin/keystone-all launcher initialize logging first and after that does eventlet patching leaving logging system with generic thred.lock in critical sections what leads to infinite thread locks under high load.

Boris Bobrov (bbobrov) on 2015-02-11
Changed in keystone:
status: New → Confirmed
Changed in keystone:
assignee: nobody → Alexander Makarov (amakarov)
status: Confirmed → In Progress
David Stanek (dstanek) wrote :

What exactly is the race condition? I'd like a little more info so that I can understand the patch/problem.

Boris Bobrov (bbobrov) wrote :

David,

Copying slightly modified Alexander's explanation from origin bug to Mirantis Openstack:

Python logging used in oslo.log consists of a bunch of logger objects for different modules.
Any logger object calls handle() method of every logging handler registered in the logger or any of it's parents.
All Handlers are derived from python logging.Handler class.
logging.Handler.handle() implemented as a critical section using a lock.
The lock used in logging.Handler.handle() is created by logging.Handler.createLock().
The lock is threading.RLock() which is reentrant allowing owner thread to attempt aquiring lock without blocking.
Thread is identified using _get_ident() call which is patched by eventlet.

If we patch threading late, _get_ident() gets patched and internals of RLock don't, causing deadlocks on greenthread switch.

Alexander Makarov (amakarov) wrote :

Boris, as we discovered this is not the root cause of the problem.

David, as I wrote in the bug description the issue is in initialization sequence:

- logging system initializes all it's stuff including loks (thread.lock) objects
- eventlet monkey-patching replaces thread module, including locks
But the locks, initialized before and linked to objects by reference are not replaced.

In outcome we have logging system using standard, not patched thread.lock objects in an eventlet environment.

Now the case:
here el - eventlet

el1: aquire lock
el1: begin writing log record. This is an IO operation so execution is passed to the next eventlet el2
el2: aquire lock
entire thread: "I'm waiting for lock to release, what do you want of me?"

The solution is to initialize logging after eventlet monkey-patching

Boris Bobrov (bbobrov) wrote :

> But the locks, initialized before and linked to objects by reference are not replaced.

And I say that in my comment ;)

Alexander Makarov (amakarov) wrote :

Boris, ok, I yield: let "the locks, initialized before" be "Internals of RLock" :) It's true.

Alexander Makarov (amakarov) wrote :

This shows that lock remains unpatched

import logging
logging.basicConfig()
logger = logging.getLogger()
import eventlet
eventlet.monkey_patch()
print logger.handlers[0].lock._RLock__block.__class__

Reviewed: https://review.openstack.org/154915
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=6fefe724695122754b9034c3f6434633877c4326
Submitter: Jenkins
Branch: master

commit 6fefe724695122754b9034c3f6434633877c4326
Author: Alexander Makarov <email address hidden>
Date: Wed Feb 11 18:42:32 2015 +0300

    Correct initialization order for logging to use eventlet locks

    oslo logging needs to be reconfigured after eventlet patching

    Change-Id: I717ecfd1c0f9b791a57f69325365144692f1e0b6
    Closes-Bug: 1420788

Changed in keystone:
status: In Progress → Fix Committed

Reviewed: https://review.openstack.org/155292
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=3db10d06828b39e7dbdb2938a581526a78d2957b
Submitter: Jenkins
Branch: stable/juno

commit 3db10d06828b39e7dbdb2938a581526a78d2957b
Author: Boris Bobrov <email address hidden>
Date: Thu Feb 12 15:55:24 2015 +0300

    Correct initialization order for logging to use eventlet locks

    oslo logging needs to be reconfigured after eventlet patching

    This patch follows the approach suggested in change
    I717ecfd1c0f9b791a57f69325365144692f1e0b6

    Change-Id: I84cac685726bbcf1285c0f6d9172439aa186466a
    Closes-Bug: 1420788

Change abandoned by Alan Pevec (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/156649
Reason: Replaced by https://review.openstack.org/163387

Reviewed: https://review.openstack.org/163387
Committed: https://git.openstack.org/cgit/openstack/keystone/commit/?id=d72e6faca06e9470498860f16e64ed161815bb44
Submitter: Jenkins
Branch: stable/icehouse

commit d72e6faca06e9470498860f16e64ed161815bb44
Author: Boris Bobrov <email address hidden>
Date: Thu Feb 12 15:55:24 2015 +0300

    Correct initialization order for logging to use eventlet locks

    oslo logging needs to be reconfigured after eventlet patching

    This patch follows the approach suggested in change
    I717ecfd1c0f9b791a57f69325365144692f1e0b6

    Change-Id: I84cac685726bbcf1285c0f6d9172439aa186466a
    Closes-Bug: 1420788
    (cherry picked from commit 3db10d06828b39e7dbdb2938a581526a78d2957b)

tags: added: in-stable-icehouse
Thierry Carrez (ttx) on 2015-03-19
Changed in keystone:
milestone: none → kilo-3
status: Fix Committed → Fix Released
Changed in keystone:
importance: Undecided → Medium
Thierry Carrez (ttx) on 2015-04-30
Changed in keystone:
milestone: kilo-3 → 2015.1.0

Change abandoned by Morgan Fainberg (<email address hidden>) on branch: master
Review: https://review.openstack.org/159172
Reason: Administratively abandoning based on the conversations. We aren't going to continue with refactoring just to throw out the eventlet code next cycle.

Dr. Jens Harbott (j-harbott) wrote :

So do I understand this correctly that the release fix for kilo was only a partial fix for the apache version of keystone and the wsgi is still broken and also won't get fixed?

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers