L3 agent constantly resyncing deleted router

Bug #1606844 reported by Oleg Bondarev
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Oleg Bondarev

Bug Description

No need to constantly resync router which was deleted and for which there is no namespace.

Observed: l3 agent log full of

2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent [-] Error while deleting router 81ef46de-f7f9-4c5e-b787-c935e0af253a
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in _safe_router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._router_removed(router_id)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in _router_removed
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ri.delete(self)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 347, in delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.process_delete(agent)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 385, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.logger(e)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self.force_reraise()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/common/utils.py", line 382, in call
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 947, in process_delete
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 530, in _process_internal_ports
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent existing_devices = self._get_existing_devices()
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/router_info.py", line 413, in _get_existing_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent ip_devs = ip_wrapper.get_devices(exclude_loopback=True)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/ip_lib.py", line 130, in get_devices
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent log_fail_as_error=self.log_fail_as_error
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py", line 140, in execute
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent raise RuntimeError(msg)
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent RuntimeError: Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.224 13360 ERROR neutron.agent.l3.agent
2016-07-26 14:00:45.236 13360 ERROR neutron.agent.linux.utils [-] Exit code: 1; Stdin: ; Stdout: ; Stderr: Cannot open network namespace "qrouter-81ef46de-f7f9-4c5e-b787-c935e0af253a": No such file or directory

this consumes memory, cpu, disk.

Revision history for this message
Jakub Libosvar (libosvar) wrote :

Sounds like a dup of https://bugs.launchpad.net/neutron/+bug/1605546 . Oleg, can you please confirm?

Revision history for this message
John Schwarz (jschwarz) wrote :

This does seem like a dup of the aforementioned bug to me.

Revision history for this message
Jakub Libosvar (libosvar) wrote :

Oleg nacked the duplicate as this is related to failed router deletion while bug 1605546 is about ha routers.

Revision history for this message
John Schwarz (jschwarz) wrote :

This is not a duplicate after all - it deals with the client side.

Reproduction steps can include pausing (using pdb) the code that is responsible for creating a namespace, creating a router, deleting it and then letting the l3's code run. The agent should now loop indefinitely (until the agent restarts).

This is easily reproducible on a loaded machine and a create_and_delete_routers rally task. It is also not directly connected to l3-ha but is enhanced by it (for example, lack of keepalived_manager also causes this issue), since its cleanup code is less robust.

Revision history for this message
John Schwarz (jschwarz) wrote :

And by "client side" (from the first line of comment #4) I mean "aggent side".

Miguel Lavalle (minsel)
Changed in neutron:
importance: Undecided → Medium
summary: - Neutron constantly resyncing deleted router
+ L3 agent constantly resyncing deleted router
Revision history for this message
LIU Yulong (dragon889) wrote :
Revision history for this message
John Schwarz (jschwarz) wrote :

This will be used to track non-HA agent loops. A new bug report will be opened for the HA case by Liu Yulong soon.

Changed in neutron:
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/348372

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
LIU Yulong (dragon889) wrote :

The absent of ha_port may also cause infinite loop trace:
http://paste.openstack.org/show/523757/
http://paste.openstack.org/show/528407/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/348372
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=31a7feea6b60dac138b00652d2f16982a3b25f78
Submitter: Jenkins
Branch: master

commit 31a7feea6b60dac138b00652d2f16982a3b25f78
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 28 17:03:22 2016 +0300

    L3 agent: check router namespace existence before delete

    Router namespace absence may lead to infinite loop in l3 agent trying
    to delete the router.
    This patch adds checks before going into namespace to prevent RuntimeError
    and following infinite loop.

    Closes-Bug: #1606844
    Change-Id: Iae95ccb8eeb06d0fd5fc7d71e63408b3f843b371

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/353010

tags: added: mitaka-backport-potential
tags: added: l3-dvr-backlog
tags: added: liberty-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/mitaka)

Reviewed: https://review.openstack.org/353010
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3f02209b77bb5d0718818e73dcb3aa28ea542580
Submitter: Jenkins
Branch: stable/mitaka

commit 3f02209b77bb5d0718818e73dcb3aa28ea542580
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 28 17:03:22 2016 +0300

    L3 agent: check router namespace existence before delete

    Router namespace absence may lead to infinite loop in l3 agent trying
    to delete the router.
    This patch adds checks before going into namespace to prevent RuntimeError
    and following infinite loop.

    Closes-Bug: #1606844
    Change-Id: Iae95ccb8eeb06d0fd5fc7d71e63408b3f843b371
    (cherry picked from commit 31a7feea6b60dac138b00652d2f16982a3b25f78)

tags: added: in-stable-mitaka
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/liberty)

Fix proposed to branch: stable/liberty
Review: https://review.openstack.org/354406

Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Infinite loop is not cool. Raising to High.

Changed in neutron:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/liberty)

Reviewed: https://review.openstack.org/354406
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=affccd131bbf0ae899c451538708c728169a62ea
Submitter: Jenkins
Branch: stable/liberty

commit affccd131bbf0ae899c451538708c728169a62ea
Author: Oleg Bondarev <email address hidden>
Date: Thu Jul 28 17:03:22 2016 +0300

    L3 agent: check router namespace existence before delete

    Router namespace absence may lead to infinite loop in l3 agent trying
    to delete the router.
    This patch adds checks before going into namespace to prevent RuntimeError
    and following infinite loop.

    Closes-Bug: #1606844
    (cherry picked from commit 31a7feea6b60dac138b00652d2f16982a3b25f78)

    Conflicts:
     neutron/agent/l3/namespaces.py
     neutron/tests/unit/agent/l3/test_dvr_fip_ns.py
     neutron/tests/unit/agent/l3/test_router_info.py

    Change-Id: Iae95ccb8eeb06d0fd5fc7d71e63408b3f843b371

tags: added: in-stable-liberty
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/361799

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.0.0.0b3

This issue was fixed in the openstack/neutron 9.0.0.0b3 development milestone.

tags: removed: liberty-backport-potential mitaka-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 7.2.0

This issue was fixed in the openstack/neutron 7.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 8.3.0

This issue was fixed in the openstack/neutron 8.3.0 release.

Revision history for this message
John Schwarz (jschwarz) wrote :

This seems to have resurfaced in https://bugs.launchpad.net/neutron/+bug/1635554. I think it's best we deal with the resurfacing there as to not add clutter to this bug report.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 7.2.0

This issue was fixed in the openstack/neutron 7.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 8.3.0

This issue was fixed in the openstack/neutron 8.3.0 release.

Revision history for this message
Roman Klimenko (rklimenko) wrote :

I still have the issue, neutron logs filled with this messages.
Fuel 9.2 / neutron

# dpkg -l | grep neutron-l3-agent
ii neutron-l3-agent 2:8.3.0-1~u14.04+mos30 all OpenStack virtual network service - l3 agent

Revision history for this message
Alexander M (st41ker) wrote :

Confirming for Fuel 9.2.

>Mar 15 05:52:31 controller1 neutron-l3-agent: 2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent [-] Error while deleting router e0174f10-b5cd-484e-8505-36350b198af1
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 359, in _safe_router_removed
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent self._router_removed(router_id)
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/agent.py", line 377, in _router_removed
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent ri.delete(self)
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 397, in delete
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent self.destroy_state_change_monitor(self.process_monitor)
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 344, in destroy_state_change_monitor
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent pm = self._get_state_change_monitor_process_manager()
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 315, in _get_state_change_monitor_process_manager
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent default_cmd_callback=self._get_state_change_monitor_callback())
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line 318, in _get_state_change_monitor_callback
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent ha_device = self.get_ha_device_name()
2017-03-15 05:52:31.347 13127 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/dist-packages/neutron/agent/l3/ha_router.py", line

Revision history for this message
Oleg Bondarev (obondarev) wrote :
Revision history for this message
Kevin (kvasko) wrote :

I have a 9.0 fuel deployment that seems to be running into this issue. Is there a workaround for this to avoid having to upgrade the entire cluster?

Revision history for this message
Matt C (matthew-czajka) wrote :

I'm getting this same issue on ocata

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.