Improve stability and robustness of periodic agent checks

Bug #1458119 reported by Eugene Nikanorov
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Eugene Nikanorov
Kilo
New
Undecided
Unassigned

Bug Description

In some cases due to DB controller failure, DB connections could be interrupted.
This causes exceptions that sneak in looping call method effectively shutting loop down and preventing any further failover for particular resource time.

tags: added: l3-ipam-dhcp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/185722

Changed in neutron:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/185722
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ae8c1c5f80fd4fb7b4ab116677f4cff988c67cf1
Submitter: Jenkins
Branch: master

commit ae8c1c5f80fd4fb7b4ab116677f4cff988c67cf1
Author: Eugene Nikanorov <email address hidden>
Date: Tue May 26 20:17:20 2015 +0400

    Catch broad exception in methods used in FixedIntervalLoopingCall

    Unlike other places where it might make sense to catch specific
    exceptions, methods that are used to check L3 and DHCP agents
    liveness via FixedIntervalLoopingCall should never allow exceptions
    to leak to calling method and interrupt the loop.

    Further improvement of FixedIntervalLoopingCall might be needed,
    but for the sake of easy backporting it makes sense to fix the issue
    in neutron before pushing refactoring to 3rd-party library.

    Change-Id: I6a61e99a6f4e445e26ea4a9923b47e35559e5703
    Closes-Bug: #1458119

Changed in neutron:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196701

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (feature/pecan)

Change abandoned by Kyle Mestery (<email address hidden>) on branch: feature/pecan
Review: https://review.openstack.org/196701
Reason: This is lacking the functional fix [1], so I'll propose a new merge commit which includes that one.

[1] https://review.openstack.org/#/c/196711/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/196920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (171.5 KiB)

Reviewed: https://review.openstack.org/196920
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7f759c077f8f860c13db92d2ea6b353ef6b70900
Submitter: Jenkins
Branch: feature/pecan

commit 8123144fadd7c5d5e6e56a76ea860512619a2cf6
Author: Moshe Levi <email address hidden>
Date: Sun Jun 28 14:37:14 2015 +0300

    Fix Consolidate sriov agent and driver code

    This patch add mising __init to mech_sriov/mech_driver/
    and update the setup.cfg to the new agent entrypoint

    Trivial Fix

    Change-Id: I53a527081feb78472f496675bbb3c5121d38a14a

commit 8942fccf02e6e179d47582fdb2792a1ca972da21
Author: Assaf Muller <email address hidden>
Date: Mon Jun 29 11:38:51 2015 -0400

    Remove failing SafeFixture tests

    The fixtures 1.3 release attempted to fix the fixtures resource
    leak issue, but failed to do so completely. Our own SafeFixture
    is still needed: The 1.3 release broke our SafeFixture tests,
    but not the usage of SafeFixture itself. This patch removes
    those failing tests for now to unbreak the gate. Jakub reported
    a bug on fixtures 1.3:
    https://bugs.launchpad.net/python-fixtures/+bug/1469759

    We will continue to use SafeFixture until that bug is fixed
    in fixtures, at which point we will be able to require
    fixtures > 1.3.

    Change-Id: I59457c3bb198ff86d5ad55a1e623d008f0034b8f
    Closes-Bug: #1469734

commit 71dffb0a2c1720cd8233a329d32958a0160dd6f5
Author: Kevin Benton <email address hidden>
Date: Mon Jun 29 08:27:41 2015 +0000

    Revert "Removed test_lib module"

    This reverts commit 9a6536de6e1a7fe9b2552adc142e254426b82b6f.

    We pulled all of the plugins out of the tree, many of which still inherit
    from neutron test classes. This change then stated that we no longer
    support testing other plugins. I think this is a bit premature and should
    have been discussed under the subject
    "Neutron plugins can't use neutron plugin unit tests" or something
    similar.

    Change-Id: I68318589f010b731574ea3bfa8df98492bab31fc

commit b20fd81dbd497e058384a0af065dd0f1fdc4c728
Author: Jakub Libosvar <email address hidden>
Date: Fri Jun 5 14:32:51 2015 +0000

    Refactor NetcatTester class

    Following capabilities were added:
       - used transport protocol is passed as a constant instead of bool
       - src port for testing was added
       - connection can be established explicitly
       - change constructor parameters of NetcatTester

    As a part of removing bool for protocol definition
    get_free_namespace_port() was also modified to match the behavior.

    Change-Id: Id2ec322e7f731c05a3754a65411c9a5d8b258126

commit 83e37980dcd0b2bad6d64dd2cb23bcd2891cafca
Author: jingliuqing <email address hidden>
Date: Sat Jun 27 13:41:54 2015 +0800

    Use REST rather than ReST

    Change-Id: I06c9deaab58c5ec13bfeec39fb8fd4b1fe21f42d

commit 1b60df85ba3ad442c2e4e7e52538e1b9a1bf9378
Author: Kevin Benton <email address hidden>
Date: Thu Jun 25 18:34:38 2015 -0700

    Add a double-mock guard to the base test case

    Use mock to patch mock with a check to prevent multiple active
    patches to the...

tags: added: in-feature-pecan
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/198809

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/kilo)

Reviewed: https://review.openstack.org/198809
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=97448d5d132bcc64a95e20a24c73587ffa9e913c
Submitter: Jenkins
Branch: stable/kilo

commit 97448d5d132bcc64a95e20a24c73587ffa9e913c
Author: Eugene Nikanorov <email address hidden>
Date: Tue May 26 20:17:20 2015 +0400

    Catch broad exception in methods used in FixedIntervalLoopingCall

    Unlike other places where it might make sense to catch specific
    exceptions, methods that are used to check L3 and DHCP agents
    liveness via FixedIntervalLoopingCall should never allow exceptions
    to leak to calling method and interrupt the loop.

    Further improvement of FixedIntervalLoopingCall might be needed,
    but for the sake of easy backporting it makes sense to fix the issue
    in neutron before pushing refactoring to 3rd-party library.

    Change-Id: I6a61e99a6f4e445e26ea4a9923b47e35559e5703
    Closes-Bug: #1458119
    (cherry picked from commit ae8c1c5f80fd4fb7b4ab116677f4cff988c67cf1)

tags: added: in-stable-kilo
Revision history for this message
JohnsonYi (yichengli) wrote :

I found this bug by searching related bug fix from https://bugs.launchpad.net/mos/+bug/1457123, in our Openstack environment, fuel 6.0/juno, 3 controller HA, when we restarted the master controller node, there were some VM instances can't reach the external network(xshell session lost connect), internal network was not interrupted.
does the master fix available for juno?
I will test it soon.

Roman Rufanov (rrufanov)
tags: added: customer-found
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-1 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.