[OVN] The API worker fails during "post_fork_initialize" call

Bug #2036607 reported by Rodolfo Alonso
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Rodolfo Alonso

Bug Description

Bugzilla reference: https://bugzilla.redhat.com/show_bug.cgi?id=2233797

This issue has been reproduced using the Tobiko framework. The test, that is executed several times, is rebooting the controllers and thus the Neutron API. Randomly, one Neutron API worker fails during the event method execution "post_fork_initialize", during the "_setup_hash_ring" call [1].

Regardless of the result of the method "post_fork_initialize", the API worker starts. But in this case there are some methods (mainly related to the OVN agents) that are not patched and thus the result of the API calls ("agent show", "agent list", etc) is wrong.

This bug proposes:
* To properly handle any possible error in the "_setup_hash_ring" call.
* To log a message at the end of the "post_fork_initialize" method to check that this event method has finished properly.
* To catch any possible error during the "post_fork_initialize" execution and if this error is not retried, fail and exit.

[1]https://paste.opendev.org/show/bqzDPR5TukLq9d1GIcnz/

Tags: ovn
tags: added: ovn
Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron-lib/+/895940

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/895946

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/896009

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/896009
Committed: https://opendev.org/openstack/neutron/commit/236f8d0b97193cebc6d165e8d6af2da9ea06e2f1
Submitter: "Zuul (22348)"
Branch: master

commit 236f8d0b97193cebc6d165e8d6af2da9ea06e2f1
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 19:50:47 2023 +0000

    [OVN] Add a log message after the "post_fork_initialize" method

    Related-Bug: #2036607
    Change-Id: I84a7a6cff5921488686ebf9ab95aa270d22b4e31

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (master)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/895940
Committed: https://opendev.org/openstack/neutron-lib/commit/08b8c6d33b34de4be96116ed78835645dca9493a
Submitter: "Zuul (22348)"
Branch: master

commit 08b8c6d33b34de4be96116ed78835645dca9493a
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 14:30:47 2023 +0000

    Add the "cancellable" flag to the ``CallbacksManager`` events

    The ``CallbacksManager`` class considers, by default, that the events
    starting with "before_" and "precommit_" can raise an Exception
    (``CallbackFailure``) in case that the callbacks associated to these
    methods exit with an error.

    However there are some other events (those started with "after_") that
    won't generate an exception in case of error. The error will be logged
    but the process will continue.

    This new functionality adds the possibility of adding any kind of event
    and mark is as "cancellable". The ``CallbacksManager`` instance will check
    the errors returned by the callback methods and if any of them is marked
    as "cancellable", the manager will raise a ``CallbackFailure`` exception,
    terminating the process.

    In case of being a Neutron worker, for example, the
    ``oslo_service.service.Services`` class will restart the process again.

    Related-Bug: #2036607
    Change-Id: Ie1e7be6d70cca957c1b1b6c15b402e8bc6523865

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/895946
Committed: https://opendev.org/openstack/neutron/commit/bb1114c8b1b99834f2cb782dae796fa778ec2319
Submitter: "Zuul (22348)"
Branch: master

commit bb1114c8b1b99834f2cb782dae796fa778ec2319
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 17:26:56 2023 +0000

    Make ``OVNMechanismDriver.post_fork_initialize`` callback cancellable

    If the callback method ``OVNMechanismDriver.post_fork_initialize``
    fails, the callback manager must raise an exception and finish the
    process. If that happens in a Neutron worker, the
    ``oslo_service.service.Services`` class will restart the process.

    The neutron-lib version is bumped to 3.9.0. It contains [1], needed
    for this patch.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/895940

    Partial-Bug: #2036607
    Change-Id: I2aca9a522bda2d69962369748b70fa9270fbe245

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The following patches have been merged:
* https://review.opendev.org/c/openstack/neutron/+/896009
* https://review.opendev.org/c/openstack/neutron/+/895946 (depends on the n-lib patch)
* https://review.opendev.org/c/openstack/neutron-lib/+/895940

This bug could be considered as fixed; any improvement in "_setup_hash_ring" will be pushed referring to this bug but now we can catch any error and raise an exception.

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/903332

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/903333

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/2023.2)

Related fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron-lib/+/903334

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron-lib (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron-lib/+/903335

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.2)

Fix proposed to branch: stable/2023.2
Review: https://review.opendev.org/c/openstack/neutron/+/903336

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/903337

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/903334
Committed: https://opendev.org/openstack/neutron-lib/commit/2224f23ca50fc0b305e3d5ac232a74bd2e9770a6
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit 2224f23ca50fc0b305e3d5ac232a74bd2e9770a6
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 14:30:47 2023 +0000

    Add the "cancellable" flag to the ``CallbacksManager`` events

    The ``CallbacksManager`` class considers, by default, that the events
    starting with "before_" and "precommit_" can raise an Exception
    (``CallbackFailure``) in case that the callbacks associated to these
    methods exit with an error.

    However there are some other events (those started with "after_") that
    won't generate an exception in case of error. The error will be logged
    but the process will continue.

    This new functionality adds the possibility of adding any kind of event
    and mark is as "cancellable". The ``CallbacksManager`` instance will check
    the errors returned by the callback methods and if any of them is marked
    as "cancellable", the manager will raise a ``CallbackFailure`` exception,
    terminating the process.

    In case of being a Neutron worker, for example, the
    ``oslo_service.service.Services`` class will restart the process again.

    Related-Bug: #2036607
    Change-Id: Ie1e7be6d70cca957c1b1b6c15b402e8bc6523865
    (cherry picked from commit 08b8c6d33b34de4be96116ed78835645dca9493a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/903332
Committed: https://opendev.org/openstack/neutron/commit/dd3ba9a6cffd49f701b2a217000d7138873fbe1c
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit dd3ba9a6cffd49f701b2a217000d7138873fbe1c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 19:50:47 2023 +0000

    [OVN] Add a log message after the "post_fork_initialize" method

    Related-Bug: #2036607
    Change-Id: I84a7a6cff5921488686ebf9ab95aa270d22b4e31
    (cherry picked from commit 236f8d0b97193cebc6d165e8d6af2da9ea06e2f1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/903333
Committed: https://opendev.org/openstack/neutron/commit/0de8e70d295a4f6f23583f7f2bf50434a4a24d1b
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 0de8e70d295a4f6f23583f7f2bf50434a4a24d1b
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 19:50:47 2023 +0000

    [OVN] Add a log message after the "post_fork_initialize" method

    Related-Bug: #2036607
    Change-Id: I84a7a6cff5921488686ebf9ab95aa270d22b4e31
    (cherry picked from commit 236f8d0b97193cebc6d165e8d6af2da9ea06e2f1)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron-lib (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron-lib/+/903335
Committed: https://opendev.org/openstack/neutron-lib/commit/1e0f5983ccfa9b0c04e17a01876eed10187ec1ee
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 1e0f5983ccfa9b0c04e17a01876eed10187ec1ee
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 14:30:47 2023 +0000

    Add the "cancellable" flag to the ``CallbacksManager`` events

    The ``CallbacksManager`` class considers, by default, that the events
    starting with "before_" and "precommit_" can raise an Exception
    (``CallbackFailure``) in case that the callbacks associated to these
    methods exit with an error.

    However there are some other events (those started with "after_") that
    won't generate an exception in case of error. The error will be logged
    but the process will continue.

    This new functionality adds the possibility of adding any kind of event
    and mark is as "cancellable". The ``CallbacksManager`` instance will check
    the errors returned by the callback methods and if any of them is marked
    as "cancellable", the manager will raise a ``CallbackFailure`` exception,
    terminating the process.

    In case of being a Neutron worker, for example, the
    ``oslo_service.service.Services`` class will restart the process again.

    Related-Bug: #2036607
    Change-Id: Ie1e7be6d70cca957c1b1b6c15b402e8bc6523865
    (cherry picked from commit 08b8c6d33b34de4be96116ed78835645dca9493a)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/903336
Committed: https://opendev.org/openstack/neutron/commit/d5945fcaa5733470a3801a07f1fb9d63673f14b9
Submitter: "Zuul (22348)"
Branch: stable/2023.2

commit d5945fcaa5733470a3801a07f1fb9d63673f14b9
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 17:26:56 2023 +0000

    Make ``OVNMechanismDriver.post_fork_initialize`` callback cancellable

    If the callback method ``OVNMechanismDriver.post_fork_initialize``
    fails, the callback manager must raise an exception and finish the
    process. If that happens in a Neutron worker, the
    ``oslo_service.service.Services`` class will restart the process.

    The neutron-lib version is bumped to 3.8.1. It contains [1], needed
    for this patch.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/903334

    Conflicts:
        requirements.txt

    Partial-Bug: #2036607
    Change-Id: I2aca9a522bda2d69962369748b70fa9270fbe245
    (cherry picked from commit bb1114c8b1b99834f2cb782dae796fa778ec2319)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/903337
Committed: https://opendev.org/openstack/neutron/commit/20ba4477874728b8ce2d8a44468f153ed60e1268
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 20ba4477874728b8ce2d8a44468f153ed60e1268
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Mon Sep 11 17:26:56 2023 +0000

    Make ``OVNMechanismDriver.post_fork_initialize`` callback cancellable

    If the callback method ``OVNMechanismDriver.post_fork_initialize``
    fails, the callback manager must raise an exception and finish the
    process. If that happens in a Neutron worker, the
    ``oslo_service.service.Services`` class will restart the process.

    NOTE: update n-lib version
    The neutron-lib version is bumped to 3.4.2. It contains [1], needed
    for this patch.

    [1]https://review.opendev.org/c/openstack/neutron-lib/+/903335

    Conflicts:
        requirements.txt

    Partial-Bug: #2036607
    Change-Id: I2aca9a522bda2d69962369748b70fa9270fbe245
    (cherry picked from commit bb1114c8b1b99834f2cb782dae796fa778ec2319)

Changed in neutron:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.