L3 Agent in DVR mode is removing FIP namespace on startup

Bug #1482521 reported by Artur Korzeniewski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Oleg Bondarev

Bug Description

Rrestarting the L3 agent in DVR mode is causing the VM network downtime for configured floating IP.
The responsible for this situation is removing of FIP namespace at startup.

The reproduction steps:
1. Configure Openstack to have tenant network and external network with Floating IPs.
2. Launch VM and assign floating IP to it.
3. Ping the VM from external network machine.
4. Restart the L3 Agent on compute node where VM was placed.
5. You can observe that few pings are lost.

I guess the problem is at startup when network namespace are parsed, and the FIP namespace is not included in L3 server message - so it is treated as stale and removed.

The traceback when I raise exception in /neutron/neutron/agent/l3/dvr_fip_ns.py FipNamespace.delete() :
2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222
2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.utils [-] Unable to access /opt/openstack/data/neutron/external/pids/8223e12e-837b-49d4-9793-63603fccbc9f.pid from (pid=70216) get_value_from_file /opt/openstack/neutron/neutron/agent/linux/utils.py:222
2015-08-06 06:35:28.469 DEBUG neutron.agent.linux.external_process [-] No process started for 8223e12e-837b-49d4-9793-63603fccbc9f from (pid=70216) disable /opt/openstack/neutron/neutron/agent/linux/external_process.py:113
Traceback (most recent call last):
 File "/usr/local/lib/python2.7/dist-packages/eventlet/queue.py", line 117, in switch
    self.greenlet.switch(value)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/oslo_service/service.py", line 612, in run_service
    service.start()
  File "/opt/openstack/neutron/neutron/service.py", line 233, in start
    self.manager.after_start()
  File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 641, in after_start
    self.periodic_sync_routers_task(self.context)
  File "/opt/openstack/neutron/neutron/agent/l3/agent.py", line 519, in periodic_sync_routers_task
    self.fetch_and_sync_all_routers(context, ns_manager)
  File "/opt/openstack/neutron/neutron/agent/l3/namespace_manager.py", line 91, in __exit__
    self._cleanup(_ns_prefix, ns_id)
  File "/opt/openstack/neutron/neutron/agent/l3/namespace_manager.py", line 140, in _cleanup
    ns.delete()
  File "/opt/openstack/neutron/neutron/agent/l3/dvr_fip_ns.py", line 147, in delete
    raise TypeError("ss")
TypeError: ss

full log here:
http://pastebin.com/xRa77kk6

Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)
tags: added: l3-dvr-backlog
Revision history for this message
Oleg Bondarev (obondarev) wrote :

The bug was introduced by me in commit 46608806aa7a9c60214e28429ca5a8b87b2a15de.
The difference of fip namespace is that it's name is "fip-<external network id>" as opposed to qrouter and snat namespaces where id is taken from the router. On resync agent keeps only namespaces which ids are known routers ids. FIP namespace is removed because it's id is not known router id. Will come up with the fix shortly.

Changed in neutron:
importance: Undecided → Medium
status: New → Confirmed
tags: added: l3-ipam-dhcp
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/210539

Changed in neutron:
status: Confirmed → In Progress
Assaf Muller (amuller)
Changed in neutron:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/210539
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b220d10125c859367069eeb43ea45e3650a782f9
Submitter: Jenkins
Branch: master

commit b220d10125c859367069eeb43ea45e3650a782f9
Author: Oleg Bondarev <email address hidden>
Date: Fri Aug 7 19:56:13 2015 +0300

    Do not delete fip namespace during l3 dvr agent resync

    This was introduced by commit 46608806aa7a9c60214e28429ca5a8b87b2a15de
    which didn't take into account that fip namespace name is composed
    from external network id rather than router id.
    The fix is to ensure fip namespaces for the known routers are kept
    by namespace manager on agent resync.

    Closes-Bug: #1482521
    Change-Id: I0ffd0a3f6d83f7356638827a1cfe4dabef24b891

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (feature/pecan)

Fix proposed to branch: feature/pecan
Review: https://review.openstack.org/211492

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (feature/pecan)
Download full text (37.3 KiB)

Reviewed: https://review.openstack.org/211492
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7b91632fc65ab9d2687298c68b1d715866d0356
Submitter: Jenkins
Branch: feature/pecan

commit 966203f89dee8fe61fb2dce654e36e510e80380f
Author: Sukhdev Kapur <email address hidden>
Date: Wed Jul 1 16:30:44 2015 -0700

    Neutron-Ironic integration patch

    This patch is in preparation for the integration
    of Ironic and Neutron. A new vnic_type is being
    added so that ML2 drivers can filter for all
    Ironic ports based upon match for 'baremetal'.
    Nova/Ironic will set this vnic_type when issuing
    port-create request to neutron.
    (e.g. binding:vnic_type = 'baremetal' )

    Change-Id: I25dc9472b31db052719db503a10c1fb1a55572ef
    Partial-Implements: blueprint neutron-ironic-integration

commit 236e408272bcb9b8e957524864e571b5afdc4623
Author: Oleg Bondarev <email address hidden>
Date: Tue Jul 7 12:02:58 2015 +0300

    DVR: fix router scheduling

    Fix scheduling of DVR routers to not stop scheduling once
    csnat portion was scheduled. See bug report for failing
    scenario.

    This partially reverts
    commit 3794b4a83e68041e24b715135f0ccf09a5631178
    and fixes bug 1374473 by moving csnat scheduling
    after general dvr router scheduling, so double binding does
    not happen.

    Closes-Bug: #1472163
    Related-Bug: #1374473
    Change-Id: I57c06e2be732e47b6cce7c724f6b255ea2d8fa32

commit e152f93878b9bb6af7cfedc9e045892fcf7d0615
Author: Assaf Muller <email address hidden>
Date: Sat Aug 8 21:15:03 2015 +0300

    TESTING.rst love

    Change-Id: I64b569048f8f87ea2fe63d861302b4020d36493d

commit 633c52cca1b383af2c900e1663c8682114acd177
Author: sridhargaddam <email address hidden>
Date: Wed Aug 5 10:49:33 2015 +0000

    Avoid dhcp_release for ipv6 addresses

    dhcp_release is only supported for IPv4 addresses [1] and not for
    IPv6 addresses [2]. There will be no effect when it is called with
    IPv6 address. This patch adds a corresponding note and avoids calling
    dhcp_release for IPv6 addresses.

    [1] http://manpages.ubuntu.com/manpages/trusty/man1/dhcp_release.1.html
    [2] http://lists.thekelleys.org.uk/pipermail/dnsmasq-discuss/2013q2/007084.html

    Change-Id: I8b8316c9d3d011c2a687a3a1e2a4da5cf1b5d604

commit 2de8fad17402f38bbc30204ee2f4f99cf21cb69d
Author: OpenStack Proposal Bot <email address hidden>
Date: Mon Aug 10 06:11:06 2015 +0000

    Imported Translations from Transifex

    For more information about this automatic import see:
    https://wiki.openstack.org/wiki/Translations/Infrastructure

    Change-Id: I2b423e83a7d0ac8b23239f81fe33dd8382c6fff6

commit fef79dc7b9162e03c8891645494c115b52d4d014
Author: Henry Gessau <email address hidden>
Date: Mon Aug 3 23:30:34 2015 -0400

    Consistent layout and headings for devref

    The lack of convention for heading levels among the independently
    written devref documents was starting to make the Table of Contents
    look rather messy when rendered in HTML.

    This patch does not cover the "Neutron Internals" section since its
    layo...

tags: added: in-feature-pecan
Thierry Carrez (ttx)
Changed in neutron:
milestone: none → liberty-3
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in neutron:
milestone: liberty-3 → 7.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.