[SRU] Various L3HA functional tests fails often

Bug #1818614 reported by Slawek Kaplonski
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu Cloud Archive
Fix Released
High
Unassigned
Pike
Fix Released
High
Unassigned
Queens
Fix Released
High
Unassigned
Rocky
Fix Released
High
Unassigned
Stein
Fix Released
High
Unassigned
neutron
Fix Released
Critical
Slawek Kaplonski
neutron (Ubuntu)
Fix Released
High
Unassigned
Bionic
Fix Released
High
Unassigned
Cosmic
Fix Released
High
Unassigned
Disco
Fix Released
High
Unassigned

Bug Description

[Impact]
Need to get this added to the Ubuntu packages in order to safeguard against missed VRRP transitions due to ip -o monitor not running at the time the transition occurs. We have seen many cases in the fields where neutron routers end up as active on multiple l3 agents (via neutron api) which leads to a number of problems.

[Test Case]
* deploy Openstack (any version that supports l3ha)
* create HA router with max-l3-agents=2
* check neutron l3-agent-list-hosting-router for master location
* on both hosts that are running the l3-agent do

pid=`pgrep -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID"`
ps -f --ppid $pid
pkill -f "/var/lib/neutron/ha_confs/$ROUTER_UUID/keepalived.conf"
pkill -f "/usr/bin/neutron-keepalived-state-change --router_id=$ROUTER_UUID"
ps -f --ppid $pid # <<<<<<<<<<< this should return nothing now

* without this patch you should now see both agents reporting the router as "active"
* with the patch this should not happen (once neutron-keepalived-state-change has been restarted by neutron-l3-agent)

[Regression Potential]
These patches have already landed in corresponding upstream branches and therefore have undergone reviews + unit and functional testing upstream, therefore regression potential is expected to be low.

====================================================================

Recently many L3 HA related functional tests are failing.
The common thing in all those errors is fact that it fails when waiting for l3 ha router to become master.

Example stack trace:

ft2.12: neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ha_router_lifecycle_StringException: Traceback (most recent call last):
  File "neutron/tests/base.py", line 174, in func
    return f(self, *args, **kwargs)
  File "neutron/tests/base.py", line 174, in func
    return f(self, *args, **kwargs)
  File "neutron/tests/functional/agent/l3/test_ha_router.py", line 81, in test_ha_router_lifecycle
    self._router_lifecycle(enable_ha=True, router_info=router_info)
  File "neutron/tests/functional/agent/l3/framework.py", line 274, in _router_lifecycle
    common_utils.wait_until_true(lambda: router.ha_state == 'master')
  File "neutron/common/utils.py", line 690, in wait_until_true
    raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 60 seconds

Example failure: http://logs.openstack.org/79/633979/21/check/neutron-functional-python27/ce7ef07/logs/testr_results.html.gz

Logstash query: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A%5C%22ha_state%20%3D%3D%20'master')%5C%22

Revision history for this message
LIU Yulong (dragon889) wrote :
Miguel Lavalle (minsel)
tags: added: l3-dvr-backlog
removed: l3-ha
Revision history for this message
Slawek Kaplonski (slaweq) wrote :

What I so far found if fact that probably keepalived process wasn't started for router in such failed case. It's in logs. e g. here: http://logs.openstack.org/74/640874/2/check/neutron-functional/37e3040/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.extensions.test_port_forwarding_extension.TestL3AgentFipPortForwardingExtensionDVR.test_dvr_ha_router_failover_without_gw.txt.gz#_2019-03-05_23_36_44_978

"No process started for 01fae686-bdfe-4709-baf4-23a564fdbd11 disable /opt/stack/new/neutron/neutron/agent/linux/external_process.py:118"

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/641434

Revision history for this message
Slawek Kaplonski (slaweq) wrote : Re: Various L3HA functional tests fails often

I sent DNM patch with some additional logs to find out what is going on here.
Also patch which adds journal.log to functional tests job (https://review.openstack.org/#/c/641127/) is now merged and may be useful for debugging this issue.
For now I didn't spot it yet with those additional logged things. I will continue debugging as soon as I will find some new job failed in same way.

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

I found one of the issues which cause such error described in this bug report.
It is now reported in separate bug report: https://bugs.launchpad.net/neutron/+bug/1819160

But it's not the only issue which cause failures in the way reported in this report. Sometimes it may happen that some HA router (nonDVR) tests are failing and that still needs to be investigated here.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/642065

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote : Re: Various L3HA functional tests fails often

Today I analyze one more such failure from test_ha_router module (no dvr).

It looks that this issue is caused by race condition between spawning keepalived and spawning ip monitor by neutron-keepalived-state-change process.

Lets check logs from failed test http://logs.openstack.org/17/641117/6/gate/neutron-functional/379d405/logs/dsvm-functional-logs/neutron.tests.functional.agent.l3.test_ha_router.LinuxBridgeL3HATestCase.test_ipv6_router_advts_and_fwd_after_router_state_change_backup.txt.gz

This test is creating 2 routers, one by one: https://github.com/openstack/neutron/blob/b847cd02c56dc8fe654f4731306dc2b5493a62eb/neutron/tests/functional/agent/l3/test_ha_router.py#L142

In our example, first router was e20c5656-7e6f-4a29-8413-3aaad80daca1 which was properly transitioned first to backup at 2019-03-08 10:34:07.072 and then to master at 2019-03-08 10:34:19.899

Second router has got ID b357d56c-4f76-4f5d-9767-289d8cde726e and was first transitioned to backup at 2019-03-08 10:34:26.061 but then was never transitioned to master and that's why test failed.

So let's now check in journal.log what happened with keepalived and neutron-keepalived-state-change processes for both routers.
First router which worked fine:
- neutron-keepalived-state-change spawned ip monitor process at Mar 08 10:34:17:
Mar 08 10:34:17 ubuntu-xenial-ovh-gra1-0003584991 neutron-keepalived-state-change[31497]: 2019-03-08 10:34:17.894 31497 DEBUG neutron.agent.linux.utils [-] Running command: ['ip', 'netns', 'exec', 'qroute

- keepalived switched to MASTER STATE at Mar 08 10:34:17:
ubuntu-xenial-ovh-gra1-0003584991 Keepalived_vrrp[32243]: VRRP_Instance(VR_1) Transition to MASTER STATE

- neutron-keepalived-state-change notices event on ip monitor stdout and thus notified L3 agent that router is now master:
Mar 08 10:34:19 ubuntu-xenial-ovh-gra1-0003584991 neutron-keepalived-state-change[31497]: 2019-03-08 10:34:19.893 31497 DEBUG neutron.agent.l3.keepalived_state_change [-] Wrote router e20c5656-7e6f-4a29-8

So now, lets see how it was in case of second router, which failed:

- keepalived switched to MASTER STATE at Mar 08 10:34:36
ubuntu-xenial-ovh-gra1-0003584991 Keepalived_vrrp[4024]: VRRP_Instance(VR_1) Transition to MASTER STATE

- neutron-keepalived-state-change spawned ip monitor process at Mar 08 10:34:52
ubuntu-xenial-ovh-gra1-0003584991 neutron-keepalived-state-change[3531]: 2019-03-08 10:34:52.313 3531 DEBUG neutron.agent.common.async_process [-] Launching async process [ip netns exec qr

- as keepalived already changed state to master, there is no any event on ip monitor noticed so L3 agent isn't informed about current state of router. After one minute, test fails because of that.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/642295

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/641434

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/642295
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=8fec1ffc833eba9b3fc5f812bf881f44b4beba0c
Submitter: Zuul
Branch: master

commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c
Author: Slawek Kaplonski <email address hidden>
Date: Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change

    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.

    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.

    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.

    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/643459

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.openstack.org/643460

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/643461

tags: added: neutron-proactive-backport-potential
tags: added: neutron-easy-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/643459
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=56c591996bdf3a2bda8855df4af5e4011779e5a2
Submitter: Zuul
Branch: stable/rocky

commit 56c591996bdf3a2bda8855df4af5e4011779e5a2
Author: Slawek Kaplonski <email address hidden>
Date: Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change

    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.

    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.

    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.

    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614
    (cherry picked from commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c)

tags: added: in-stable-rocky
tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/643460
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=5bcca13f4a58ee5541ae81a45b89f783194f1279
Submitter: Zuul
Branch: stable/queens

commit 5bcca13f4a58ee5541ae81a45b89f783194f1279
Author: Slawek Kaplonski <email address hidden>
Date: Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change

    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.

    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.

    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.

    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614
    (cherry picked from commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/643461
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=91c26f56586e88a75e942cd06c8f4539acfb4963
Submitter: Zuul
Branch: stable/pike

commit 91c26f56586e88a75e942cd06c8f4539acfb4963
Author: Slawek Kaplonski <email address hidden>
Date: Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change

    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.

    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.

    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.

    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614
    (cherry picked from commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/645278

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 14.0.0.0rc1

This issue was fixed in the openstack/neutron 14.0.0.0rc1 release candidate.

description: updated
description: updated
summary: - Various L3HA functional tests fails often
+ [SRU] Various L3HA functional tests fails often
Changed in neutron (Ubuntu Bionic):
importance: Undecided → High
status: New → Triaged
Changed in neutron (Ubuntu Cosmic):
importance: Undecided → High
status: New → Triaged
Changed in neutron (Ubuntu Disco):
status: New → Fix Released
importance: Undecided → High
Revision history for this message
Corey Bryant (corey.bryant) wrote :

New packages versions including this fix have been uploaded to the cosmic and bionic unapproved queues awaiting SRU team review. I've also uploaded to pike-staging.

Revision history for this message
Corey Bryant (corey.bryant) wrote : Please test proposed package

Hello Slawek, or anyone else affected,

Accepted neutron into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-pike-needed
description: updated
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:13.0.2-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Cosmic):
status: Triaged → Fix Committed
tags: added: verification-needed verification-needed-cosmic
Revision history for this message
Brian Murray (brian-murray) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:12.0.5-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Changed in neutron (Ubuntu Bionic):
status: Triaged → Fix Committed
tags: added: verification-needed-bionic
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:rocky-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-rocky-needed
Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/642065
Reason: it's addressed by different patches currently

Revision history for this message
Timo Aaltonen (tjaalton) wrote : Please test proposed package

Hello Slawek, or anyone else affected,

Accepted neutron into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:13.0.2-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Timo Aaltonen (tjaalton) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:12.0.5-0ubuntu3 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into rocky-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:rocky-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-rocky-needed to verification-rocky-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-rocky-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into queens-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:queens-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-queens-needed to verification-queens-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-queens-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

tags: added: verification-queens-needed
tags: added: sts-sru-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Hi @slawek, while verifying this sru I seem to have hit a bug in your patch - https://pastebin.ubuntu.com/p/4h9bhtB7DF/

My test does the following:

    * on master VR host, kill neutron-keepalived-state-change for router
    * on master VR host, kill keepalived for router VR
    * check that master moved to other node - confirmed
    * wait for neutron-l3-agent to respawn neutron-keepalived-state-change etc
    * then I hit this bug

tags: added: verification-failed-cosmic
removed: verification-needed-cosmic
Revision history for this message
Edward Hope-Morley (hopem) wrote :

The code does catch the exception but the result is that the local ha_conf/<router>/state file remains set to "master"

Revision history for this message
Edward Hope-Morley (hopem) wrote :

Patch submitted to fix this issue - https://review.openstack.org/#/c/649991/

Revision history for this message
Edward Hope-Morley (hopem) wrote :

cause of this issue:

The problem is that the following is being passed to neutron-keepalived-state-change when spawned:

'--AGENT-root_helper_daemon=%s' % self.agent_conf.AGENT.root_helper_daemon

And in my env root_helper_daemon is not configured or running (which is the neutron default fwiw).

So we need to not pass that if it is not set so that neutron-keepalive-state-change won't try to use it so e.g. right now in ps i have:

... --AGENT-root_helper=sudo /usr/bin/neutron-rootwrap /etc/neutron/rootwrap.conf --AGENT-root_helper_daemon=None

The "None" is what is breaking it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.openstack.org/649991
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=afbbec83a2578aac6aa0f16c205c5da3a788969b
Submitter: Zuul
Branch: master

commit afbbec83a2578aac6aa0f16c205c5da3a788969b
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/stein)

Related fix proposed to branch: stable/stein
Review: https://review.openstack.org/650574

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/rocky)

Related fix proposed to branch: stable/rocky
Review: https://review.openstack.org/650575

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/queens)

Related fix proposed to branch: stable/queens
Review: https://review.openstack.org/650576

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/pike)

Related fix proposed to branch: stable/pike
Review: https://review.openstack.org/650596

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/queens)

Reviewed: https://review.openstack.org/650576
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=21387750a9f424ef3275c2076f2152749dc5bb87
Submitter: Zuul
Branch: stable/queens

commit 21387750a9f424ef3275c2076f2152749dc5bb87
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038
    (cherry picked from commit afbbec83a2578aac6aa0f16c205c5da3a788969b)
    (cherry picked from commit a7df1c458c1fcfee03abfff7e2dd5994eca3f91e)
    (cherry picked from commit 279c99ab7d69e236afe7f6cbc91b7a0586a40edd)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/645278
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=e2d3a94018b771963dad83c36d4209cfb7f7a427
Submitter: Zuul
Branch: stable/ocata

commit e2d3a94018b771963dad83c36d4209cfb7f7a427
Author: Slawek Kaplonski <email address hidden>
Date: Sun Mar 10 22:45:15 2019 +0100

    Set initial ha router state in neutron-keepalived-state-change

    Sometimes in case of HA routers it may happend that
    keepalived will set status of router to MASTER before
    neutron-keepalived-state-change daemon will spawn "ip monitor"
    to monitor changes of IPs in router's namespace.

    In such case neutron-keepalived-state-change process will never
    notice that keepalived set router to be MASTER and L3 agent will
    not be notified about that so router will not be configured properly.

    To avoid such race condition neutron-keepalived-state-change will
    now check if VIP address is already configured on ha interface
    before it will spawn "ip monitor". If it is already configured
    by keepalived, it will notify L3 agent that router is set to
    MASTER.

    Change-Id: Ie3fe825d65408fc969c478767b411fe0156e9fbc
    Closes-Bug: #1818614
    (cherry picked from commit 8fec1ffc833eba9b3fc5f812bf881f44b4beba0c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/ocata)

Related fix proposed to branch: stable/ocata
Review: https://review.openstack.org/650970

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/stein)

Reviewed: https://review.openstack.org/650574
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=a7df1c458c1fcfee03abfff7e2dd5994eca3f91e
Submitter: Zuul
Branch: stable/stein

commit a7df1c458c1fcfee03abfff7e2dd5994eca3f91e
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038
    (cherry picked from commit afbbec83a2578aac6aa0f16c205c5da3a788969b)

tags: added: in-stable-stein
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/rocky)

Reviewed: https://review.openstack.org/650575
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=279c99ab7d69e236afe7f6cbc91b7a0586a40edd
Submitter: Zuul
Branch: stable/rocky

commit 279c99ab7d69e236afe7f6cbc91b7a0586a40edd
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038
    (cherry picked from commit afbbec83a2578aac6aa0f16c205c5da3a788969b)
    (cherry picked from commit a7df1c458c1fcfee03abfff7e2dd5994eca3f91e)

Revision history for this message
James Page (james-page) wrote :

@ubuntu-sru as there was a regression in the updates currently in proposed I'll upload new versions with the cherry-picks that have landed into the neutron stable/* branches upstream.

I'll include the original SRU changelog entries using -v so its clear this is all one changeset.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/pike)

Reviewed: https://review.openstack.org/650596
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=36a1e193cb73f79ffdfafe91987839de4760b2de
Submitter: Zuul
Branch: stable/pike

commit 36a1e193cb73f79ffdfafe91987839de4760b2de
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038
    (cherry picked from commit afbbec83a2578aac6aa0f16c205c5da3a788969b)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/ocata)

Reviewed: https://review.openstack.org/650970
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=7047f0c29303ac32f23c520a514c4b11520c8a69
Submitter: Zuul
Branch: stable/ocata

commit 7047f0c29303ac32f23c520a514c4b11520c8a69
Author: Edward Hope-Morley <email address hidden>
Date: Thu Apr 4 14:22:54 2019 +0100

    Don't pass None arg to neutron-keepalived-state-change

    The original fix for bug 1818614 added two new cli args
    when spawning neutron-keepalived-state-change but if
    e.g. self.agent_conf.AGENT.root_helper_daemon is unset
    then "None" string is passed which breaks the
    neutron-keepalived-state-change daemon.

    Change-Id: I4afcdbbf2f3d2dafcad241ba3fc0778b52b8fc85
    Related-Bug: #1818614
    Related-Bug: #1823038
    (cherry picked from commit afbbec83a2578aac6aa0f16c205c5da3a788969b)
    (cherry picked from commit a7df1c458c1fcfee03abfff7e2dd5994eca3f91e)
    (cherry picked from commit 279c99ab7d69e236afe7f6cbc91b7a0586a40edd)
    (cherry picked from commit 21387750a9f424ef3275c2076f2152749dc5bb87)

Revision history for this message
Edward Hope-Morley (hopem) wrote :

This regressed SRU is being replaced (updated) by the SRU package uploaded in https://bugs.launchpad.net/neutron/+bug/1823038 i.e. the original -proposed package has been updated to also include the recently landed and backported patch to fix the regression found in the original sru.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 11.0.7

This issue was fixed in the openstack/neutron 11.0.7 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 13.0.3

This issue was fixed in the openstack/neutron 13.0.3 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 12.0.6

This issue was fixed in the openstack/neutron 12.0.6 release.

Revision history for this message
Łukasz Zemczak (sil2100) wrote : Please test proposed package

Hello Slawek, or anyone else affected,

Accepted neutron into cosmic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:13.0.2-0ubuntu3.1 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-cosmic to verification-done-cosmic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-cosmic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

tags: added: verification-needed-cosmic
removed: verification-failed-cosmic
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Verified cosmic-proposed using test from [TESTCASE]

description: updated
tags: added: verification-done-cosmic
removed: verification-needed-cosmic
Revision history for this message
James Page (james-page) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into pike-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.

Please help us by testing this new package. To enable the -proposed repository:

  sudo add-apt-repository cloud-archive:pike-proposed
  sudo apt-get update

Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-pike-needed to verification-pike-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-pike-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Revision history for this message
Łukasz Zemczak (sil2100) wrote :

Hello Slawek, or anyone else affected,

Accepted neutron into bionic-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/neutron/2:12.0.5-0ubuntu4 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested and change the tag from verification-needed-bionic to verification-done-bionic. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed-bionic. In either case, without details of your testing we will not be able to proceed.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance for helping!

N.B. The updated package will be released to -updates after the bug(s) fixed by this package have been verified and the package has been in -proposed for a minimum of 7 days.

Revision history for this message
Chris Halse Rogers (raof) wrote : Update Released

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:13.0.2-0ubuntu3.1

---------------
neutron (2:13.0.2-0ubuntu3.1) cosmic; urgency=medium

  * d/p/bug1823038.patch: Cherry pick fix to ensure that None is not
    passed as an argument when spawning the neutron-keepalived-state-change
    agent (LP: #1823038).

neutron (2:13.0.2-0ubuntu3) cosmic; urgency=medium

  * d/p/fix-KeyError-in-OVS-firewall.patch: Cherry-picked from upstream
    to prevent neutron ovs agent from crashing due to creation of two
    security groups that both use the same remote security group, where
    the first group's port range is a subset of the second (LP: #1813007).
  * d/p/set-initial-ha-router-state-in-neutron-keepalived-st.patch:
    Cherry-picked from upstream stable/rocky branch to ensure proper
    detection of MASTER HA router by neutron-keepalived-state-change
    (LP: #1818614).

 -- James Page <email address hidden> Tue, 09 Apr 2019 11:37:29 +0100

Changed in neutron (Ubuntu Cosmic):
status: Fix Committed → Fix Released
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Bionic Rocky verified using [Test Case]

tags: added: verification-rocky-done
removed: verification-rocky-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Bionic Queens verified using [Test Case]

tags: added: verification-done-bionic
removed: verification-needed-bionic
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Xenial Queens verified using [Test Case]

tags: added: verification-queens-done
removed: verification-queens-needed
Revision history for this message
Edward Hope-Morley (hopem) wrote :

Xenial Pike verified using [Test Case]

tags: added: verification-pike-done
removed: verification-pike-needed
tags: added: verification-done
removed: verification-needed
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package neutron - 2:12.0.5-0ubuntu4

---------------
neutron (2:12.0.5-0ubuntu4) bionic; urgency=medium

  * d/p/bug1823038.patch: Cherry pick fix to ensure that None is not
    passed as an argument when spawning the neutron-keepalived-state-change
    agent (LP: #1823038).

neutron (2:12.0.5-0ubuntu3) bionic; urgency=medium

  * d/p/fix-KeyError-in-OVS-firewall.patch: Cherry-picked from upstream
    to prevent neutron ovs agent from crashing due to creation of two
    security groups that both use the same remote security group, where
    the first group's port range is a subset of the second (LP: #1813007).
  * d/p/set-initial-ha-router-state-in-neutron-keepalived-st.patch:
    Cherry-picked from upstream stable/rocky branch to ensure proper
    detection of MASTER HA router by neutron-keepalived-state-change
    (LP: #1818614).

 -- James Page <email address hidden> Tue, 09 Apr 2019 10:59:22 +0100

Changed in neutron (Ubuntu Bionic):
status: Fix Committed → Fix Released
Revision history for this message
Corey Bryant (corey.bryant) wrote :

This is already fix released in rocky, included in neutron version 2:13.0.2-0ubuntu3.1~cloud0.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

The verification of the Stable Release Update for neutron has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Revision history for this message
Corey Bryant (corey.bryant) wrote :

This bug was fixed in the package neutron - 2:11.0.6-0ubuntu1~cloud2.1
---------------

 neutron (2:11.0.6-0ubuntu1~cloud2.1) xenial; urgency=medium
 .
   * d/p/bug1823038.patch: Cherry pick fix to ensure that None is not
     passed as an argument when spawning the neutron-keepalived-state-change
     agent (LP: #1823038).
 .
 neutron (2:11.0.6-0ubuntu1~cloud2) xenial-pike; urgency=medium
 .
   * d/p/fix-KeyError-in-OVS-firewall.patch: Cherry-picked from upstream
     to prevent neutron ovs agent from crashing due to creation of two
     security groups that both use the same remote security group, where
     the first group's port range is a subset of the second (LP: #1813007).
   * d/p/set-initial-ha-router-state-in-neutron-keepalived-st.patch:
     Cherry-picked from upstream stable/rocky branch to ensure proper
     detection of MASTER HA router by neutron-keepalived-state-change
     (LP: #1818614).

Revision history for this message
Corey Bryant (corey.bryant) wrote :

Changed to fix released for queens as this fix is included in neutron queens version 2:12.0.5-0ubuntu4~cloud0.

tags: removed: neutron-easy-proactive-backport-potential neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron ocata-eol

This issue was fixed in the openstack/neutron ocata-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.