Can't log within tpool.execute

Bug #1983863 reported by Gorka Eguileor
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Low
sean mooney
oslo.log
Fix Released
High
Gorka Eguileor

Bug Description

There is a bug in eventlet where logging within a native thread can lead to a deadlock situation: https://github.com/eventlet/eventlet/issues/432

When encountered with this issue some projects in OpenStack using oslo.log, eg. Cinder, resolve them by removing any logging withing native threads.

There is actually a better approach. The Swift team came up with a solution a long time ago, and it would be great if oslo.log could use this workaround automaticaly: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6

Gorka Eguileor (gorka)
Changed in oslo.log:
assignee: nobody → Gorka Eguileor (gorka)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/oslo.log/+/852443

Changed in oslo.log:
status: New → In Progress
Changed in oslo.log:
importance: Undecided → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/852443
Committed: https://opendev.org/openstack/oslo.log/commit/94b9dc32ec1f52a582adbd97fe2847f7c87d6c17
Submitter: "Zuul (22348)"
Branch: master

commit 94b9dc32ec1f52a582adbd97fe2847f7c87d6c17
Author: Gorka Eguileor <email address hidden>
Date: Mon Aug 8 16:44:20 2022 +0200

    Fix logging in eventlet native threads

    There is a bug in eventlet where logging within a native thread can lead
    to a deadlock situation: https://github.com/eventlet/eventlet/issues/432

    When encountered with this issue some projects in OpenStack using
    oslo.log, eg. Cinder, resolve them by removing any logging withing
    native threads.

    There is actually a better approach. The Swift team came up with a
    solution a long time ago [1], and in this patch that fix is included as
    part of the setup method, but will only be run if the eventlet library
    has already been loaded.

    This patch adds the eventlet library as a testing dependency for the
    PipeMutext unit tests.

    [1]: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6

    Closes-Bug: #1983863
    Change-Id: Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d

Changed in oslo.log:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/oslo.log 5.0.1

This issue was fixed in the openstack/oslo.log 5.0.1 release.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

adding nova as the change to fix this is breaking our unit tests.
https://review.opendev.org/c/openstack/nova/+/894538 corrects this
setting as critical as this is blocking the bump of upper constratis to include oslo.log 5.3.0

i don't think there is any real-world impact beyond that.

Changed in nova:
status: New → In Progress
assignee: nobody → sean mooney (sean-k-mooney)
importance: Undecided → Critical
Revision history for this message
sean mooney (sean-k-mooney) wrote :

note this was "fixed" in 5.0.1

the reason this matters for 5.3.0 is it was reverted out of 5.2.0 but not out of master
so 5.3.0 has the fix first released in 5.0.1 again and we now need to adapt nova to it

we have been using 5.2.0 since it was released without the eventlet native threads patch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/894538
Committed: https://opendev.org/openstack/nova/commit/ea07f96eb10a4fb7c4c00cbde34868a3d21b6c22
Submitter: "Zuul (22348)"
Branch: master

commit ea07f96eb10a4fb7c4c00cbde34868a3d21b6c22
Author: Sean Mooney <email address hidden>
Date: Mon Sep 11 14:19:45 2023 +0100

    adapt to oslo.log changes

    This change refactors prvisep util test cases to account for the
    fact that oslo.log now conditionally uses an internal pipe mutex
    when logging under eventlet.

    This was added by Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d
    which is part of oslo.log 5.3.0

    As a result we need to mock all calls to oslo.log in unit tests
    that are assertign if os.write is called. when the internal
    pipe mutex is used oslo.log calls os.write when the mutex is
    released.

    Related-Bug: #1983863
    Change-Id: Id313669df80f9190b79690fff25f8e3fce2a4aca

Changed in nova:
status: In Progress → Invalid
status: Invalid → Fix Committed
Revision history for this message
melanie witt (melwitt) wrote :

Setting this to Fix Released because the fix landed on the master branch which is continuously released.

Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/oslo.log/+/911940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to oslo.log (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/911940
Committed: https://opendev.org/openstack/oslo.log/commit/95461ed9dfb4fb02378ef9e5782ce6642a1a102b
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 95461ed9dfb4fb02378ef9e5782ce6642a1a102b
Author: Gorka Eguileor <email address hidden>
Date: Mon Aug 8 16:44:20 2022 +0200

    Fix logging in eventlet native threads

    There is a bug in eventlet where logging within a native thread can lead
    to a deadlock situation: https://github.com/eventlet/eventlet/issues/432

    When encountered with this issue some projects in OpenStack using
    oslo.log, eg. Cinder, resolve them by removing any logging withing
    native threads.

    There is actually a better approach. The Swift team came up with a
    solution a long time ago [1], and in this patch that fix is included as
    part of the setup method, but will only be run if the eventlet library
    has already been loaded.

    This patch adds the eventlet library as a testing dependency for the
    PipeMutext unit tests.

    [1]: https://opendev.org/openstack/swift/commit/69c715c505cf9e5df29dc1dff2fa1a4847471cb6

    Closes-Bug: #1983863
    Change-Id: Iac1b0891ae584ce4b95964e6cdc0ff2483a4e57d
    (cherry picked from commit 94b9dc32ec1f52a582adbd97fe2847f7c87d6c17)

Revision history for this message
Daniel Bengtsson (damani42) wrote :

Hi there,

We have an issue with the current fix some use a debug option from eventlet hub_prevent_multiple_readers, this debug convenience should never been used in production context, and even less in runtime code.
Using that feature it's a also a blocker for the migration to asyncio[1], because it's not supported by the asyncio hub.
We currently work with Hervé on that jira card[2], can we please have the way to reproduce the original issue? We will change the fix to implement a feeder instead.

[1] https://review.opendev.org/c/openstack/governance/+/902585
[2] https://issues.redhat.com/browse/OSPRH-6122?filter=-1

Revision history for this message
sean mooney (sean-k-mooney) wrote :

as an fyi the "fix" was in oslo so if oslo is updated to not use the unsupported function call it should not require any codechange in nova.

we would inherit it i belive form oslo if we updated the version we used.

Revision history for this message
Daniel Bengtsson (damani42) wrote :

H sean,

I'm not sure to understand what do you mean. Do you mean the new change doesn't have to have any side effect on nova?

Revision history for this message
sean mooney (sean-k-mooney) wrote :

nova does not depend on the multi reader behavior and never has.
the modification to oslo.log was done mainly because of issues reported in cinder and while the
same error could happen in nova if you tried to logs form within an eventlet.tpool we were not doing that.

so yes I'm saying this is an implementation detail of how oslo.logs works form a nova perspective and modifying the internals of oslo log to not call eventlet.debug.hub_prevent_multiple_readers(False)

does not require code changes in nova.

nothing in nova depends on "eventlet.debug.hub_prevent_multiple_readers(False)"

nova is only listed on this bug report because the oslo change broke our unit tests
https://review.opendev.org/c/openstack/nova/+/894538
and it did so during a code freeze period coming up to a release so it was blocking our gate.

the code change i made to nova was a purely test only change adding additional mocking to avoid running
the PipeMutex oslo.log code in our unit tets.

Revision history for this message
sean mooney (sean-k-mooney) wrote :

dropping the nova severity back to low as it was only critical originally because it blocked our ci system during a code freeze period.

Changed in nova:
importance: Critical → Low
Revision history for this message
Daniel Bengtsson (damani42) wrote :

Hi Sean,

Thanks a lot for your answer.

Revision history for this message
Daniel Bengtsson (damani42) wrote :

Hi there,

We have a question about that initial bug. We have seen at the usage of log was removed in cinder, with that commit[1], so it seems it's not used anymore.
The question is do you have reintroduced the usage of logging in the native thread in cinder?
Are you aware if another project use the logging in the native thread?
A question for @Sean do you know if nova use logging in the the native thread somewhere?

[1] https://github.com/openstack/cinder/commit/c6cb84bd63135d62ccd7ed9b23b245c41e9f105c

Revision history for this message
Gorka Eguileor (gorka) wrote :

In Cinder we may have tried to remove the use of logging in native threads, but that doesn't mean that we have located all the places where it is used or that newer code will not introduce it. And this new code could be even backported, as it is common that we fix some concurrency issue using native threads and then backport it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to oslo.log (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/oslo.log/+/918426

Revision history for this message
sean mooney (sean-k-mooney) wrote :

currently nova rarely uses native threads.

all project that have oslo.messaging use logging in native threads if they use

https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L1311-L1354

with the heartbeat_in_pthread option set to true

because the heartbeat thread logs

we do log eventlet monkey patched threads

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L224-L242
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L500-L504

that interact with a native thread

https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L495-L498
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/host.py#L197-L211

but i think we do not currently log in a native thread.
we also don't have a coding polic to not do that so if we are fixing oslo.log we should not regress the ability to do that. we should just use something supported by eventlet.

we my want to start logging form native threads going forward as we start removing your use of eventlet.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to oslo.log (master)

Reviewed: https://review.opendev.org/c/openstack/oslo.log/+/918426
Committed: https://opendev.org/openstack/oslo.log/commit/8705f67bfbd733edf923003c0a5850e7a4a5735b
Submitter: "Zuul (22348)"
Branch: master

commit 8705f67bfbd733edf923003c0a5850e7a4a5735b
Author: damani42 <email address hidden>
Date: Tue May 7 12:03:59 2024 +0200

    Remove the usage of the Eventlet debug feature from oslo.log.

    We initially started to work on this topic with the intention to solve
    the problem of using the Eventlet hub_prevent_multiple_readers debug feature.
    This feature is a debug convenience. This feature is not supported by the new
    asyncio hub[1] of Eventlet. You can also check the documentation here[2] from eventlet. The problem with disabling this procedure is that
    it exposes you to risk. Deactivation applies to the entire stack. If a project
    uses oslo.log, for example nova, then it exposes all threads to concurrent access
    on the process file descriptors.
    As explained in the function's documentation[3], when several greenlets are reading from
    the same socket, it's difficult to predict which greenlet will receive which data.
    You really need to be sure of what you're doing to use this function. Also explained
    in the raise condition added by this commit[4] and we have a reproducer[5] that exposes
    the problem. According to our tests, removing this line does not change the fix provided.
    It also seems that cinder no longer uses[6] logging native thread.

    Related-Bug: #1983863

    [1] https://review.opendev.org/c/openstack/governance/+/902585/17..18
    [2] https://eventlet.readthedocs.io/en/latest/asyncio/migration.html#known-limitations-and-work-in-progress
    [3] https://github.com/eventlet/eventlet/blob/master/eventlet/debug.py#L149
    [4] https://github.com/eventlet/eventlet/commit/cb7c8c0196ed70665b0382909141ac743d7633a2
    [5] https://github.com/4383/oslo.log/blob/reproducer2/reproducer.py
    [6] https://bugs.launchpad.net/oslo.log/+bug/1983863/comments/18

    Co-authored-by: Hervé Beraud <email address hidden>
    Change-Id: Ia4c8e82f957c4ef3c176246fea0bc0adab4f4dc4

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to oslo.log (stable/2024.1)

Related fix proposed to branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/oslo.log/+/920267

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on oslo.log (stable/2024.1)

Change abandoned by "Daniel Bengtsson <email address hidden>" on branch: stable/2024.1
Review: https://review.opendev.org/c/openstack/oslo.log/+/920267
Reason: Merge other patches first.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.