dvr fips state variables not initialized correctly across restart

Bug #1367039 reported by Mike Smith
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Mike Smith

Bug Description

When the l3-agent is restarted, the local variables self.agent_fip_count and ri.dist_fip_count are not properly initialized. Current namespaces and fips are not counted so the variables are initialized to 0. This will result in unwanted behavior like stale or duplicate fg ports or fip namespaces.

tags: added: l3-dvr-backlog
Changed in neutron:
assignee: nobody → Mike Smith (michael-smith6)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Brian Haley (brian-haley) wrote :

Is this bug describing this? Duplicate fg- ports every restart of the l3/vpn agent:

# ip netns exec fip-5d7fe649-1b56-48a6-8481-9a784a6d25d2 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: fpr-f70a43d4-b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether be:6f:f6:5c:f9:a1 brd ff:ff:ff:ff:ff:ff
    inet 169.254.30.21/31 scope global fpr-f70a43d4-b
       valid_lft forever preferred_lft forever
    inet6 fe80::bc6f:f6ff:fe5c:f9a1/64 scope link
       valid_lft forever preferred_lft forever
44: fg-33df118a-e9: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:61:7d:3f brd ff:ff:ff:ff:ff:ff
    inet 172.24.4.6/24 brd 172.24.4.255 scope global fg-33df118a-e9
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe61:7d3f/64 scope link
       valid_lft forever preferred_lft forever
45: fg-9427748b-ba: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:b8:1c:a7 brd ff:ff:ff:ff:ff:ff
    inet 172.24.4.7/24 brd 172.24.4.255 scope global fg-9427748b-ba
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:feb8:1ca7/64 scope link
       valid_lft forever preferred_lft forever
47: fg-68c06dfa-99: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:0f:dd:96 brd ff:ff:ff:ff:ff:ff
    inet 172.24.4.8/24 brd 172.24.4.255 scope global fg-68c06dfa-99
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe0f:dd96/64 scope link
       valid_lft forever preferred_lft forever
48: fg-65a101a1-83: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
    link/ether fa:16:3e:e6:21:c9 brd ff:ff:ff:ff:ff:ff
    inet 172.24.4.9/24 brd 172.24.4.255 scope global fg-65a101a1-83
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fee6:21c9/64 scope link
       valid_lft forever preferred_lft forever

If so we need to raise the priority since it's consuming floating IP space that tenants are going to need.

Revision history for this message
Mike Smith (michael-smith6) wrote :

Brian - it very well could be the same issue.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/124879

Changed in neutron:
status: New → In Progress
Revision history for this message
Eugene Nikanorov (enikanorov) wrote : Re: dvr fips are not handled properly on l3-agent restart

Raising importance as this seems to be a severe issue.

Changed in neutron:
importance: Medium → High
summary: - dvr fips are not handled properly on l3-agent restart
+ dvr fips state variables not reinitialized correctly across restart
summary: - dvr fips state variables not reinitialized correctly across restart
+ dvr fips state variables not initialized correctly across restart
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/124879
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ee4bae211309e0f1fcee5565ddd2379997e1de13
Submitter: Jenkins
Branch: master

commit ee4bae211309e0f1fcee5565ddd2379997e1de13
Author: Michael Smith <email address hidden>
Date: Wed Sep 10 16:59:14 2014 -0700

    Initialize dist_fip_count after agent restart

    Runtime router variable dist_fip_count has been
    used to keep track of FIPs for DVR routers.
    This variable is not re-initialized correctly on
    agent restart and can get stale from other errors
    which cause problems with namespace and port cleanup.

    This patch will initialize the ri.dist_fip_count
    once in process_router for dvr routers only. This
    method was selected instead of the _router_added or
    _router_removed path because it is the one central
    entry point for rotuer add, delete, and update.

    The object self.agent_gateway_port also needs to be
    properly handled after an agent restart and this
    patch will handle that as well.

    When needed, the system will be read via system
    calls to determine the state of namespaces and ports
    since the variables cannot be relied on.
    System calls will be kept to a minimum to reduce
    and possible performance hits.

    Change-Id: Iae5ebf5249f8e16ab57df78e042293ca2855ddf1
    Closes-bug: #1367039

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
Xu Han Peng (xuhanp) wrote :

After this fix is merged, when migrating a legacy router to distributed router, AttributeError: 'LegacyRouter' object has no attribute 'dist_fip_count' is reported because ri object in l3 agent is LegacyRouter instead of DvrRouter. LegacyRouter doesn't has attribute dist_fip_count.

I will propose a small patch to address this problem.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/151153

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/151494

Stephen Ma (stephen-ma)
tags: added: juno-backport-potential
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/juno)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/151494
Reason: Author had enough time to reply. Abandoning the patch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.