Neutron ml2/ovn does not exit when killed with SIGTERM

Bug #2056366 reported by Terry Wilson
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Terry Wilson

Bug Description

When Neutron is killed with SIGTERM (like via systemctl), when using ML2/OVN neutron workers do not exit and instead are eventually killed with SIGKILL when the graceful timeout is reached (often around 1 minute).

This is happening due to the signal handlers for SIGTERM. There are multiple issues.

1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call signal.signal(signal.SIGTERM, ...) overwriting each others signal handlers.
2) SIGTERM is handled in the main thread, and running blocking code there causes AssertionErrors in eventlet
3) The ml2/ovn cleanup code doesn't cause the process to end, so it interrupts the killing of the process

oslo_service has a singleton SignalHandler class that solves all of these issues and we should use that instead of calling signal.signal() ourselves.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/911625

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
assignee: nobody → Terry Wilson (otherwiseguy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/911625
Committed: https://opendev.org/openstack/neutron/commit/a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Submitter: "Zuul (22348)"
Branch: master

commit a4e49b6b8fcf9acfa4e84c65de19ffd56b9022e7
Author: Terry Wilson <email address hidden>
Date: Wed Mar 6 20:13:58 2024 +0000

    Use oslo_service's SignalHandler for signals

    When Neutron is killed with SIGTERM (like via systemctl), when using
    ML2/OVN neutron workers do not exit and instead are eventually killed
    with SIGKILL when the graceful timeout is reached (often around 1
    minute).

    This is happening due to the signal handlers for SIGTERM. There are
    multiple issues.

    1) oslo_service, ml2/ovn mech_driver, and ml2/ovo_rpc.py all call
       signal.signal(signal.SIGTERM, ...) overwriting each others signal
       handlers.
    2) SIGTERM is handled in the main thread, and running blocking code
       there causes AssertionErrors in eventlet which also prevents the
       process from exiting.
    3) The ml2/ovn cleanup code doesn't cause the process to end, so it
       interrupts the killing of the process.

    oslo_service has a singleton SignalHandler class that solves all of
    these issues

    Closes-Bug: #2056366
    Depends-On: https://review.opendev.org/c/openstack/oslo.service/+/911627
    Change-Id: I730a12746bceaa744c658854e38439420efc4629
    Signed-off-by: Terry Wilson <email address hidden>

Changed in neutron:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.