Restarting l3 agent results in lost of centralized fip in snat ns

Bug #1740450 reported by sunzuohua
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Medium
Brian Haley

Bug Description

Reproduce steps:
l3 agent mode:
    network node:dvr_snat
    compute node:dvr_no_external
1、Create dvr+ha.
2、Set router gateway and add router interface.
3、Create vm and associate fip.
4、Restart l3 agent.
5、Restart l3 agent again.

After step 3, fip can be found in snat ns on network node.
After step 4, fip can not be found in snat ns on network node.
After step 5, fip can be found again in snat ns on network node.

The reason may be that for ha router, router cidrs should be seek from keepalived instance, not from device.
Adding following code in [1] can solve this problem:
    def _get_centralized_fip_cidr_set(self, device):
        """Returns the fip_cidr set for centralized floatingips."""
        return set(self._get_cidrs_from_keepalived(device.name))

[1]https://github.com/openstack/neutron/blob/master/neutron/agent/l3/dvr_edge_ha_router.py

sunzuohua (zuohuasun)
description: updated
tags: added: l3-dvr-backlog
Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

So in Step3 you are restarting the 'dvr_snat' agent that hosts the centralized fip.
In Step4, you mentioned after restarting the 'fip' is not found. Since you have configured the 'HA', do you see the fip in the alternade 'Network Node' Snat namespace? Can you confirm.

In Step5, when you say 'restart' again, are you restarting on the Same node or different node. Can you confirm.

Revision history for this message
sunzuohua (zuohuasun) wrote :

@swaminathan-vasudevan, I am sure.
The reason is that:
    After step 4,fips can be found from devices and were not added to keepalived instance.
    After step 5,fips can not be found from devices and were added to keepalived instance.
We know, for ha router, fips are configured by keepalived, so fips should be got from and added to keepalived instance.

Revision history for this message
Swaminathan Vasudevan (swaminathan-vasudevan) wrote :

Yes I was able to reproduce the problem with the steps provided above.

Changed in neutron:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/555952

Changed in neutron:
assignee: nobody → Swaminathan Vasudevan (swaminathan-vasudevan)
status: Confirmed → In Progress
Changed in neutron:
assignee: Swaminathan Vasudevan (swaminathan-vasudevan) → Brian Haley (brian-haley)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/555952
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=64028a389ff904f15e471b44bd5b3979c5db2cd2
Submitter: Zuul
Branch: master

commit 64028a389ff904f15e471b44bd5b3979c5db2cd2
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Mar 23 15:11:13 2018 -0700

    DVR: Restarting l3 agent loses centralized fip ip on qg-interface

    When l3 agent is restarted on a dvr_snat node that is configured
    for L3_HA and has a centralized FloatingIP configured to the
    qg-interface in the snat_namespace, that FloatingIP is not
    re-configured to the qg-interface when agent starts.

    The reason being, the cidr is not being retrieved from the
    keepalived instance and only retrieved from the
    centralized_fip_cidr_set.

    If 'L3_HA' is configured we need to retrieve it from the keepalived
    instance.

    This patch fixes the problem by retrieving the cidrs from the
    keepalived instance for the qg-interface.

    Change-Id: I848a20d06e2d344503a4cb1776dbe2617d91bc41
    Closes-Bug: #1740450

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/559864

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/559865

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/559864
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=63886613fd56624c2dbdf7f7415f8cc767e1f95e
Submitter: Zuul
Branch: stable/queens

commit 63886613fd56624c2dbdf7f7415f8cc767e1f95e
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Mar 23 15:11:13 2018 -0700

    DVR: Restarting l3 agent loses centralized fip ip on qg-interface

    When l3 agent is restarted on a dvr_snat node that is configured
    for L3_HA and has a centralized FloatingIP configured to the
    qg-interface in the snat_namespace, that FloatingIP is not
    re-configured to the qg-interface when agent starts.

    The reason being, the cidr is not being retrieved from the
    keepalived instance and only retrieved from the
    centralized_fip_cidr_set.

    If 'L3_HA' is configured we need to retrieve it from the keepalived
    instance.

    This patch fixes the problem by retrieving the cidrs from the
    keepalived instance for the qg-interface.

    Change-Id: I848a20d06e2d344503a4cb1776dbe2617d91bc41
    Closes-Bug: #1740450
    (cherry picked from commit 64028a389ff904f15e471b44bd5b3979c5db2cd2)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.0.0b1

This issue was fixed in the openstack/neutron 13.0.0.0b1 development milestone.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.2

This issue was fixed in the openstack/neutron 12.0.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/559865
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=cf14d20b543662847e2be00af84633b5b3bfd9b5
Submitter: Zuul
Branch: stable/pike

commit cf14d20b543662847e2be00af84633b5b3bfd9b5
Author: Swaminathan Vasudevan <email address hidden>
Date: Fri Mar 23 15:11:13 2018 -0700

    DVR: Restarting l3 agent loses centralized fip ip on qg-interface

    When l3 agent is restarted on a dvr_snat node that is configured
    for L3_HA and has a centralized FloatingIP configured to the
    qg-interface in the snat_namespace, that FloatingIP is not
    re-configured to the qg-interface when agent starts.

    The reason being, the cidr is not being retrieved from the
    keepalived instance and only retrieved from the
    centralized_fip_cidr_set.

    If 'L3_HA' is configured we need to retrieve it from the keepalived
    instance.

    This patch fixes the problem by retrieving the cidrs from the
    keepalived instance for the qg-interface.

    Change-Id: I848a20d06e2d344503a4cb1776dbe2617d91bc41
    Closes-Bug: #1740450
    (cherry picked from commit 64028a389ff904f15e471b44bd5b3979c5db2cd2)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.6

This issue was fixed in the openstack/neutron 11.0.6 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.