[SRU] Agent is failing to process HA router if initialize() fails

Bug #1662804 reported by venkata anil
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
Undecided
Unassigned
Mitaka
Fix Released
Undecided
Edward Hope-Morley
Newton
Fix Released
Undecided
Edward Hope-Morley
neutron
Fix Released
High
venkata anil
neutron (Ubuntu)
Fix Released
Undecided
Unassigned
Xenial
Fix Released
Undecided
Edward Hope-Morley
Yakkety
Fix Released
Undecided
Edward Hope-Morley

Bug Description

[Impact]

This patch resolves, amongst other things, issues with a create and delete router request race condition when using l3 HA. At the time of backport this patch is already available from Ocata onwards and has been verified as sufficiently minimal and safe for backport to Newton and Mitaka. Essentially the error case is a result of an incorrectly intialised router update action being executed without proper checks and this patch fixes this.

[Test Case]

 * Deploy Openstack Mitaka - http://pastebin.ubuntu.com/24637244/ - with neutron-l3-agent configured to provide HA (vrrp) routers.

 * Repeatedly create and delete routers in rapid succession and check that the l3 agent does not go into an infinite error loop i.e. run http://pastebin.ubuntu.com/24634950/ and run do tail -F /var/log/neutron/neutron-l3-agent.log on all units of l3 agent. Also check that qrouter- namepspaces are not stacking up. For Mitaka I typically hit the error after ~20 create/deletes.

[Regression Potential]

 * I do not envisage any regression potential from this patch.

====

When HA router initialize() function fails for some reason(rabbitmq restart or no ha_port), keepalived_manager or KeepalivedInstance won't be configured. In this case, _process_router_if_compatible fails with exception, then _resync_router(update) will again try to process this router in loop. As we try initialize() only once(which was failed), retry of _process_router_if_compatible will always fail(no keepalived manager or instance) and router is never configured(see below trace).

2017-02-06 18:34:18.539 26120 DEBUG neutron.agent.linux.utils [-] Running command (rootwrap daemon): ['ip', 'netns', 'exec', 'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', 'qr-e6
3406e1-e7'] execute_rootwrap_daemon /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:101
2017-02-06 18:34:18.544 26120 DEBUG neutron.agent.linux.utils [-]
Command: ['ip', 'netns', 'exec', u'qrouter-114a72fe-02ae-4b87-a2e7-70f962df0951', 'ip', '-o', 'link', 'show', u'qr-e63406e1-e7']
Exit code: 0
 execute /usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py:156
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info [-] 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info Traceback (most recent call last):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info return func(*args, **kwargs)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self.internal_network_added(p)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info if manager.get_process().active:
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info AttributeError: 'NoneType' object has no attribute 'get_process'
2017-02-06 18:34:18.544 26120 ERROR neutron.agent.l3.router_info
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent [-] Failed to process compatible router '114a72fe-02ae-4b87-a2e7-70f962df0951'
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent Traceback (most recent call last):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 506, in _process_router_update
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_router_if_compatible(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 445, in _process_router_if_compatible
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_updated_router(router)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/agent.py", line 459, in _process_updated_router
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent ri.process(self)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 377, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent super(HaRouter, self).process(agent)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 362, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.logger(e)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 204, in __exit__
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent six.reraise(self.type_, self.value, self.tb)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/common/utils.py", line 359, in call
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent return func(*args, **kwargs)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 744, in process
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._process_internal_ports(agent.pd)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/router_info.py", line 394, in _process_internal_ports
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self.internal_network_added(p)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 275, in internal_network_added
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent self._disable_ipv6_addressing_on_interface(interface_name)
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 235, in _disable_ipv6_addressing_on_interface
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if self._should_delete_ipv6_lladdr(ipv6_lladdr):
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent File "/usr/lib/python2.7/site-packages/neutron/agent/l3/ha_router.py", line 217, in _should_delete_ipv6_lladdr
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent if manager.get_process().active:
2017-02-06 18:34:18.549 26120 ERROR neutron.agent.l3.agent AttributeError: 'NoneType' object has no attribute 'get_process'

Changed in neutron:
assignee: nobody → venkata anil (anil-venkata)
tags: added: l3-ha
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/431026

Changed in neutron:
status: New → In Progress
Changed in neutron:
importance: Undecided → Medium
Changed in neutron:
assignee: venkata anil (anil-venkata) → Brian Haley (brian-haley)
Changed in neutron:
assignee: Brian Haley (brian-haley) → venkata anil (anil-venkata)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/431026
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=3e1ed94e389c427f1da56cde43a458832078f073
Submitter: Jenkins
Branch: master

commit 3e1ed94e389c427f1da56cde43a458832078f073
Author: venkata anil <email address hidden>
Date: Wed Feb 8 15:49:47 2017 +0000

    Avoid router ri.process if initialize() fails

    When router_info initialize() fails(with trace) some resources(
    like keepalived process) may not be created. While handling this
    exception, l3 agent calls _process_updated_router instead of
    again calling _process_added_router, which also fails trying to
    access resources which are not created.

    In this change, agent will have new router_info(i.e
    self.router_info[router_id] = ri) only when initialize() succeeds.
    When initialize() fails, as router_info is not part of agent,
    "_process_router_if_compatible" will again call initialize().
    We also cleanup router_info when initialize() fails.

    Closes-bug: #1662804
    Change-Id: I278ac83de57713c93d6e50846d79034d774c5d47

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/452099

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/452100

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/452099
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=71c0e8940661fefbe2830258509e6c4afb887783
Submitter: Jenkins
Branch: stable/ocata

commit 71c0e8940661fefbe2830258509e6c4afb887783
Author: venkata anil <email address hidden>
Date: Wed Feb 8 15:49:47 2017 +0000

    Avoid router ri.process if initialize() fails

    When router_info initialize() fails(with trace) some resources(
    like keepalived process) may not be created. While handling this
    exception, l3 agent calls _process_updated_router instead of
    again calling _process_added_router, which also fails trying to
    access resources which are not created.

    In this change, agent will have new router_info(i.e
    self.router_info[router_id] = ri) only when initialize() succeeds.
    When initialize() fails, as router_info is not part of agent,
    "_process_router_if_compatible" will again call initialize().
    We also cleanup router_info when initialize() fails.

    Closes-bug: #1662804
    Change-Id: I278ac83de57713c93d6e50846d79034d774c5d47
    (cherry picked from commit 3e1ed94e389c427f1da56cde43a458832078f073)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 10.0.1

This issue was fixed in the openstack/neutron 10.0.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.0.0b1

This issue was fixed in the openstack/neutron 11.0.0.0b1 development milestone.

Revision history for this message
Edward Hope-Morley (hopem) wrote : Re: Agent is failing to process HA router if initialize() fails

Queueing this for SRU since it resolves issues with create/delete ha router race conditions.

Changed in neutron (Ubuntu):
status: New → Fix Released
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Changed in cloud-archive:
status: New → Fix Released
summary: - Agent is failing to process HA router if initialize() fails
+ [SRU] Agent is failing to process HA router if initialize() fails
description: updated
tags: added: sts sts-sru-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

Broken router, and l3 agent spinning in the loop, fetching router state over and over from neutron-server. I consider it a High impact bug, setting High.

Changed in neutron:
importance: Medium → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/newton)

Reviewed: https://review.openstack.org/452100
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=b98267f73af5a6c6388a76a73d88e46c90f8a71e
Submitter: Jenkins
Branch: stable/newton

commit b98267f73af5a6c6388a76a73d88e46c90f8a71e
Author: venkata anil <email address hidden>
Date: Wed Feb 8 15:49:47 2017 +0000

    Avoid router ri.process if initialize() fails

    When router_info initialize() fails(with trace) some resources(
    like keepalived process) may not be created. While handling this
    exception, l3 agent calls _process_updated_router instead of
    again calling _process_added_router, which also fails trying to
    access resources which are not created.

    In this change, agent will have new router_info(i.e
    self.router_info[router_id] = ri) only when initialize() succeeds.
    When initialize() fails, as router_info is not part of agent,
    "_process_router_if_compatible" will again call initialize().
    We also cleanup router_info when initialize() fails.

    Closes-bug: #1662804
    Change-Id: I278ac83de57713c93d6e50846d79034d774c5d47
    (cherry picked from commit 3e1ed94e389c427f1da56cde43a458832078f073)
    (cherry picked from commit 71c0e8940661fefbe2830258509e6c4afb887783)

tags: added: in-stable-newton
Revision history for this message
Edward Hope-Morley (hopem) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 9.4.0

This issue was fixed in the openstack/neutron 9.4.0 release.

Changed in neutron (Ubuntu Xenial):
assignee: nobody → Edward Hope-Morley (hopem)
Changed in neutron (Ubuntu Yakkety):
assignee: nobody → Edward Hope-Morley (hopem)
James Page (james-page)
Changed in neutron (Ubuntu Xenial):
status: New → In Progress
James Page (james-page)
Changed in neutron (Ubuntu Yakkety):
status: New → In Progress
Revision history for this message
Edward Hope-Morley (hopem) wrote :

This fix will be released as part of the upcoming Newton PR which incl. neutron 9.4.0 and is tracked in https://bugs.launchpad.net/ubuntu/+source/neutron/+bug/1696133. I'll leave this bug open until that PR is released.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello venkata, or anyone else affected,

Accepted neutron into xenial-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:8.4.0-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in neutron (Ubuntu Xenial):
status: In Progress → Fix Committed
tags: added: verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello venkata, or anyone else affected,

Accepted neutron into yakkety-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:9.4.0-0ubuntu1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in neutron (Ubuntu Yakkety):
status: In Progress → Fix Committed
Revision history for this message
James Page (james-page) wrote :

Hello venkata, or anyone else affected,

Accepted neutron into newton-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:newton-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-newton-needed to verification-newton-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-newton-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-newton-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote : Change of SRU verification policy

As part of a recent change in the Stable Release Update verification policy we would like to inform that for a bug to be considered verified for a given release a verification-done-$RELEASE tag needs to be added to the bug where $RELEASE is the name of the series the package that was tested (e.g. verification-done-xenial). Please note that the global 'verification-done' tag can no longer be used for this purpose.

Thank you!

Revision history for this message
James Page (james-page) wrote : Please test proposed package

Hello venkata, or anyone else affected,

Accepted neutron into mitaka-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:mitaka-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-mitaka-needed to verification-mitaka-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-mitaka-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-mitaka-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

yakkety-proposed verified ltgm

tags: added: verification-done-yakkety
Revision history for this message
Edward Hope-Morley (hopem) wrote :

xenial-newton-proposed verified and lgtm

tags: added: verification-newton-done
removed: verification-newton-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

trusty-mitaka-proposed verified and lgtm

Revision history for this message
Edward Hope-Morley (hopem) wrote :

xenial-mitaka-proposed and lgtm

tags: added: verification-done-xenial verification-mitaka-done
removed: verification-mitaka-needed verification-needed
Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Could you please include some additional context on what tests have been performed and on which versions of the software? Thank you.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@sil2100 the tests performed are exactly as detailed in the [Test Case] in the description of this bug and I performed a test against a deployment for each proposed series/release i.e. trusty mitaka uca proposed, xenial proposed, yakkety proposed and xenial newton uca proposed.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:9.4.0-0ubuntu1

---------------
neutron (2:9.4.0-0ubuntu1) yakkety; urgency=medium

  * New upstream point release for OpenStack Newton (LP: #1696133, #1662804).

 -- James Page <email address hidden> Wed, 07 Jun 2017 13:08:13 +0100

Changed in neutron (Ubuntu Yakkety):
status: Fix Committed → Fix Released
Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu3

---------------
neutron (2:8.4.0-0ubuntu3) xenial; urgency=medium

  * d/p/avoid-router-ri.process-if-initialize-fails: Backport fix for
    avoid router ri process if initialize fails (LP: #1662804).

 -- Edward Hope-Morley <email address hidden> Wed, 07 Jun 2017 09:48:11 +0100

Changed in neutron (Ubuntu Xenial):
status: Fix Committed → Fix Released
Revision history for this message
James Page (james-page) wrote :

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package neutron - 2:9.4.0-0ubuntu1~cloud0
---------------

 neutron (2:9.4.0-0ubuntu1~cloud0) xenial-newton; urgency=medium
 .
   * New upstream release for the Ubuntu Cloud Archive.
 .
 neutron (2:9.4.0-0ubuntu1) yakkety; urgency=medium
 .
   * New upstream point release for OpenStack Newton (LP: #1696133, #1662804).

Revision history for this message
James Page (james-page) wrote :

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
James Page (james-page) wrote :

This bug was fixed in the package neutron - 2:8.4.0-0ubuntu3~cloud0
---------------

 neutron (2:8.4.0-0ubuntu3~cloud0) trusty-mitaka; urgency=medium
 .
   * New update for the Ubuntu Cloud Archive.
 .
 neutron (2:8.4.0-0ubuntu3) xenial; urgency=medium
 .
   * d/p/avoid-router-ri.process-if-initialize-fails: Backport fix for
     avoid router ri process if initialize fails (LP: #1662804).

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.