DVR: Self recover from the loss of 'fg' ports in FIP Namespace

Bug #1776984 reported by Swaminathan Vasudevan
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Swaminathan Vasudevan

Bug Description

Sometimes we have seen the 'fg' ports within the fip-namespace either goes down, not created in time or getting deleted due to some race conditions.
When this happens, the code tries to recover itself after couple of exceptions when there is a router_update message.

But after recovery we could see that the fip-namespace is recreated and the 'fg-' port is plugged in and active, but the 'fpr' and the 'rfp' ports are missing which leads to the FloatingIP failure.

So we need to fix this issue, if this happens, then it should check for all the ports within the 'fipnamespace' and recreate the necessary plumbing.

Here is the error log we have been seeing when the 'fg' port was missing.

http://paste.openstack.org/show/723505/

tags: added: pike-backport-potential queens-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/575562

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/575562
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5a7c12f245fd665de5a0364059e4ad918def3e12
Submitter: Zuul
Branch: master

commit 5a7c12f245fd665de5a0364059e4ad918def3e12
Author: Swaminathan Vasudevan <email address hidden>
Date: Thu Jun 14 13:49:23 2018 -0700

    DVR: Self recover from the loss of 'fg' ports in FIP Namespace

    Sometimes we have seen the 'fg' ports within the fip-namespace
    either goes down, not created in time or getting deleted due to
    some race conditions.
    When this happens, the code tries to recover itself after couple
    of exceptions when there is a router_update message.
    But after recovery we could see that the fip-namespace is
    recreated and the 'fg-' port is plugged in and active, but the
    'fpr' and the 'rfp' ports are missing which leads to the
    FloatingIP failure.

    This patch will fix this issue by checking for the missing devices
    in all router_updates.

    Change-Id: I78c7ea9f3b6a1cf5b208286eb372da05dc1ba379
    Closes-Bug: #1776984

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/578494

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/578495

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/578494
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e5f292aa8a4388d4cf0f70692d85d830704511fc
Submitter: Zuul
Branch: stable/queens

commit e5f292aa8a4388d4cf0f70692d85d830704511fc
Author: Swaminathan Vasudevan <email address hidden>
Date: Thu Jun 14 13:49:23 2018 -0700

    DVR: Self recover from the loss of 'fg' ports in FIP Namespace

    Sometimes we have seen the 'fg' ports within the fip-namespace
    either goes down, not created in time or getting deleted due to
    some race conditions.
    When this happens, the code tries to recover itself after couple
    of exceptions when there is a router_update message.
    But after recovery we could see that the fip-namespace is
    recreated and the 'fg-' port is plugged in and active, but the
    'fpr' and the 'rfp' ports are missing which leads to the
    FloatingIP failure.

    This patch will fix this issue by checking for the missing devices
    in all router_updates.

    Change-Id: I78c7ea9f3b6a1cf5b208286eb372da05dc1ba379
    Closes-Bug: #1776984
    (cherry picked from commit 5a7c12f245fd665de5a0364059e4ad918def3e12)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/578495
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7c9352681a38ea67d28314f50dbcd55d4aa726c
Submitter: Zuul
Branch: stable/pike

commit a7c9352681a38ea67d28314f50dbcd55d4aa726c
Author: Swaminathan Vasudevan <email address hidden>
Date: Thu Jun 14 13:49:23 2018 -0700

    DVR: Self recover from the loss of 'fg' ports in FIP Namespace

    Sometimes we have seen the 'fg' ports within the fip-namespace
    either goes down, not created in time or getting deleted due to
    some race conditions.
    When this happens, the code tries to recover itself after couple
    of exceptions when there is a router_update message.
    But after recovery we could see that the fip-namespace is
    recreated and the 'fg-' port is plugged in and active, but the
    'fpr' and the 'rfp' ports are missing which leads to the
    FloatingIP failure.

    This patch will fix this issue by checking for the missing devices
    in all router_updates.

    Change-Id: I78c7ea9f3b6a1cf5b208286eb372da05dc1ba379
    Closes-Bug: #1776984
    (cherry picked from commit 5a7c12f245fd665de5a0364059e4ad918def3e12)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b3

This issue was fixed in the openstack/neutron 13.0.0.0b3 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.4

This issue was fixed in the openstack/neutron 12.0.4 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.