Bug #1825584 “eventlet monkey-patching breaks AMQP heartbeat on ...” : Bugs : OpenStack Compute (nova)

iain MacDonnell (imacdonn) on 2019-04-19

description:

updated

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2019-04-23:

#1

https://review.opendev.org/#/c/647310/ hasn't landed in stable/stein yet, have you tested with it applied?

Revision history for this message

Matthew Booth (mbooth-9) wrote on 2019-04-23:

#2

I'm guessing so, because he mentions patching nova.monkey_patch, which is only introduced in that change.

The mod_wsgi sleuthing in the linked ML post is excellent, btw. However, note that the only reason we continue to do eventlet monkey patching for wsgi callers is that we unfortunately still require it: it has an explicit caller in multi-cell instance list, and possibly other places. Nobody likes this, and I think Mel Witt was working on patches to fix it, but until we get rid of it the focus needs to be on running wsgi such that it's not broken with eventlet: there's not a lot we can do if mod_wsgi (or uwsgi if it turns out to work the same way) isn't running us at all.

Can mod_wsgi be configured such that it *will* run timers?

Revision history for this message

iain MacDonnell (imacdonn) wrote on 2019-04-23:

#3

Yes, although that change is not in stable/stein yet, apparently it has been back-ported in RDO. I tried reverting it, but the problem persisted, until I also reverted the change that originally added monkey-patching to API.

Note that I am using uWSGI, not mod_wsgi. It's not yet completely clear to me if mod_wsgi has the same issue, or if the other case has a different root-cause.

Revision history for this message

iain MacDonnell (imacdonn) wrote on 2019-04-23:

#4

Reproduced the problem (easily) with mod_wsgi, and confirmed that removing the monkey-patching "fixes" it.

Revision history for this message

iain MacDonnell (imacdonn) wrote on 2019-04-23:

#5

Also, FWIW (probably not much), eventlet.monkey_patch(thread=False) seems to allow it to work OK.

Revision history for this message

Damien Ciabrini (dciabrin) wrote on 2019-04-25:

#6

As a follow up of the the thread on the ML [1], it appears that the AMQP hearbeat issues that we were seeing in other OpenStack services running under Apache mod_wsgi were due to a change in container healthcheck.

In Stein, TripleO configures the interval between two healthchecks to be 60s + random(45)s, which is too long to ensure that mod_wsgi can always schedule eventlet timers and answers to AMQP heartbeat packets in a timely manner.

The work to fix that specific problem is tracked in https://bugs.launchpad.net/tripleo/+bug/1826281

[1] http://lists.openstack.org/pipermail/openstack-discuss/2019-April/005310.html

Revision history for this message

Damien Ciabrini (dciabrin) wrote on 2019-05-04:

#7

Actually I'm taking comment #6 back... the issue I'm experiencing when running nova under mod_wsgi is exactly the one reported here.

Running the AMQP heartbeat thread under mod_wsgi doesn't work when the threading library is monkey_patched, because the thread waits on a data structure [1] that has been monkey patched [2], which makes it yield its execution instead of sleeping for 15s.

Because mod_wsgi stops the execution of its embedded interpreter, the AMQP heartbeat thread can't be resumed until there's a message to be processed in the mod_wsgi queue, which would resume the python interpreter and make eventlet resume the thread.

Disabling monkey-patching in nova_api makes the scheduling issue go away.
Note: other services like heat-api do not use monkey patching and aren't affected, so this seem to confirm that monkey-patching shouldn't happen in nova_api running under mod_wsgi in the first place.

[1] https://github.com/openstack/oslo.messaging/blob/master/oslo_messaging/_drivers/impl_rabbit.py#L904
[2] https://github.com/openstack/oslo.utils/blob/master/oslo_utils/eventletutils.py#L182

Bogdan Dobrelya (bogdando) on 2019-05-27

Changed in nova:
status:	New → Confirmed

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-05-29: Related fix proposed to nova (stable/stein)

#8

Related fix proposed to branch: stable/stein
Review: https://review.opendev.org/662095

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2019-10-02: Related fix merged to nova (stable/stein)

#9

Reviewed: https://review.opendev.org/662095
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=a694952eacfe3a2dac34957cf95d5529eb89d4b2
Submitter: Zuul
Branch: stable/stein

commit a694952eacfe3a2dac34957cf95d5529eb89d4b2
Author: melanie witt <email address hidden>
Date: Wed May 29 21:32:11 2019 +0000

Add reno about nova-api eventlet monkey-patching and rabbitmq

This adds a known issue release note about eventlet monkey-patching in
nova-api and workarounds.

Related-Bug: #1825584
Related-Bug: #1829062

Change-Id: I22abd1f5377489dd809eb705c8e7aec2814ced0e

tags:

added: in-stable-stein

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-28: Fix proposed to nova (master)

#10

Fix proposed to branch: master
Review: https://review.opendev.org/724188

Changed in nova:
assignee:	nobody → sean mooney (sean-k-mooney)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-04-30: Related fix proposed to nova (master)

#11

Related fix proposed to branch: master
Review: https://review.opendev.org/724725

OpenStack Infra (hudson-openstack) on 2020-05-07

Changed in nova:
assignee:	sean mooney (sean-k-mooney) → Dan Smith (danms)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-08: Fix merged to nova (master)

#12

Reviewed: https://review.opendev.org/724188
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d6f664524dfa4eecbe3e16fda0fec5359faf84c6
Submitter: Zuul
Branch: master

commit d6f664524dfa4eecbe3e16fda0fec5359faf84c6
Author: Sean Mooney <email address hidden>
Date: Tue Apr 28 23:20:16 2020 +0100

Silence amqp heartbeat warning

    When the nova api is executing under uWSGI or MOD_WSGI
    the lifetime of the amqp heartbeat thread is controlled
    by the wsgi server. As a result when the nova api is run
    in this configuration we expect that the heartbeat thread
    will be suspended and heartbeats will be missed when the wsgi
    server suspends execution of the wsgi application.

    This change adds a python logging filter to suppress
    the reporting of heartbeat warnings as this behavior is
    expected. Since the operator cannot do anything to address
    the issue the warning is just noise and many operators
    and customers find it to be off-putting.

Change-Id: I642b1e3ed6de2be4dcc19fe214f84095d2e1d31a
Closes-Bug: #1825584

Changed in nova:
status:	In Progress → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-08: Related fix merged to nova (master)

#13

Reviewed: https://review.opendev.org/724725
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ca2fe01cc1c73e9ca7b2376ea566db0fe45a44ea
Submitter: Zuul
Branch: master

commit ca2fe01cc1c73e9ca7b2376ea566db0fe45a44ea
Author: Sean Mooney <email address hidden>
Date: Thu Apr 30 19:09:17 2020 +0100

remove support of oslo.messaging 9.8.0 warning message

This change removes support for supressing the
heartbeat message from olso.messaging before 9.8.0

Change-Id: I2035d5df31e43b730cd472cc438ec863bb538d62
Related-Bug: #1825584

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-14: Fix proposed to nova (stable/ussuri)

#14

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728054

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-14: Related fix proposed to nova (stable/ussuri)

#15

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/728055

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-14: Fix proposed to nova (stable/train)

#16

Fix proposed to branch: stable/train
Review: https://review.opendev.org/728057

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-14: Related fix proposed to nova (stable/train)

#17

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/728058

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-14: Fix proposed to nova (stable/stein)

#18

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/728059

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Fix merged to nova (stable/ussuri)

#19

Reviewed: https://review.opendev.org/728054
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=1caddc4250b4acbdb8a945c8f26a0792bee5e2ed
Submitter: Zuul
Branch: stable/ussuri

commit 1caddc4250b4acbdb8a945c8f26a0792bee5e2ed
Author: Sean Mooney <email address hidden>
Date: Tue Apr 28 23:20:16 2020 +0100

Silence amqp heartbeat warning

    When the nova api is executing under uWSGI or MOD_WSGI
    the lifetime of the amqp heartbeat thread is controlled
    by the wsgi server. As a result when the nova api is run
    in this configuration we expect that the heartbeat thread
    will be suspended and heartbeats will be missed when the wsgi
    server suspends execution of the wsgi application.

    This change adds a python logging filter to suppress
    the reporting of heartbeat warnings as this behavior is
    expected. Since the operator cannot do anything to address
    the issue the warning is just noise and many operators
    and customers find it to be off-putting.

    Change-Id: I642b1e3ed6de2be4dcc19fe214f84095d2e1d31a
    Closes-Bug: #1825584
    (cherry picked from commit d6f664524dfa4eecbe3e16fda0fec5359faf84c6)

tags:

added: in-stable-ussuri

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-22: Related fix merged to nova (stable/ussuri)

#20

Reviewed: https://review.opendev.org/728055
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ed0524606998847a7b1ea9103579d027db092e83
Submitter: Zuul
Branch: stable/ussuri

commit ed0524606998847a7b1ea9103579d027db092e83
Author: Sean Mooney <email address hidden>
Date: Thu Apr 30 19:09:17 2020 +0100

remove support of oslo.messaging 9.8.0 warning message

This change removes support for supressing the
heartbeat message from olso.messaging before 9.8.0

    Change-Id: I2035d5df31e43b730cd472cc438ec863bb538d62
    Related-Bug: #1825584
    (cherry picked from commit ca2fe01cc1c73e9ca7b2376ea566db0fe45a44ea)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-11: Change abandoned on nova (stable/train)

#21

Change abandoned by sean mooney (<email address hidden>) on branch: stable/train
Review: https://review.opendev.org/728058
Reason: i based this on upper-constraitns not lower so this should not be backported to train

sean mooney (sean-k-mooney) on 2020-07-08

Changed in nova:
importance:	Undecided → Low

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-08-05: Fix merged to nova (stable/train)

#22

Reviewed: https://review.opendev.org/728057
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=280007c870b671de9439efba71fd3916403d15da
Submitter: Zuul
Branch: stable/train

commit 280007c870b671de9439efba71fd3916403d15da
Author: Sean Mooney <email address hidden>
Date: Tue Apr 28 23:20:16 2020 +0100

Silence amqp heartbeat warning

    When the nova api is executing under uWSGI or MOD_WSGI
    the lifetime of the amqp heartbeat thread is controlled
    by the wsgi server. As a result when the nova api is run
    in this configuration we expect that the heartbeat thread
    will be suspended and heartbeats will be missed when the wsgi
    server suspends execution of the wsgi application.

    This change adds a python logging filter to suppress
    the reporting of heartbeat warnings as this behavior is
    expected. Since the operator cannot do anything to address
    the issue the warning is just noise and many operators
    and customers find it to be off-putting.

    Changes:
      nova/config.py
    in python 3 log filters can simply be a function that
    returns true or false and accepts single argument which
    is the record to filter. To support python 2 the
    rabbit_heartbeat_filter has been converted into an instance
    of a new HeartbeatFilter class with its implementation moved
    to the filter method.

    Change-Id: I642b1e3ed6de2be4dcc19fe214f84095d2e1d31a
    Closes-Bug: #1825584
    (cherry picked from commit d6f664524dfa4eecbe3e16fda0fec5359faf84c6)
    (cherry picked from commit 1caddc4250b4acbdb8a945c8f26a0792bee5e2ed)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-11-11: Change abandoned on nova (stable/stein)

#23

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/728059
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2024-02-22: Related fix proposed to nova (master)

#24

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/909880

OpenStack Compute (nova)

eventlet monkey-patching breaks AMQP heartbeat on uWSGI

Bug Description

Duplicates of this bug

Other bug subscribers

Remote bug watches

	Status	Importance	Assigned to
OpenStack Compute (nova)	Fix Released	Low	Dan Smith
Stein	In Progress	Low	sean mooney
Train	Fix Released	Low	sean mooney
Ussuri	Fix Released	Low	sean mooney