test_dvr_router_lifecycle_ha_with_snat_with_fips fails occasionally in the gate

Bug #1998337 reported by Bence Romsics
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Medium
Unassigned

Bug Description

Opening this report to track the following test that fails occasionally in the gate:

job neutron-functional-with-uwsgi
test neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools

Sample traceback:

ft1.31: neutron.tests.functional.agent.l3.extensions.qos.test_fip_qos_extension.TestL3AgentFipQosExtensionDVR.test_dvr_router_lifecycle_ha_with_snat_with_fipstesttools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips
    self._dvr_router_lifecycle(enable_ha=True, enable_snat=True)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 626, in _dvr_router_lifecycle
    self._assert_dvr_floating_ips(router, snat_bound_fip=snat_bound_fip,
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_dvr_router.py", line 791, in _assert_dvr_floating_ips
    self.assertTrue(fg_port_created_successfully)
  File "/usr/lib/python3.10/unittest/case.py", line 687, in assertTrue
    raise self.failureException(msg)
AssertionError: False is not true

It seems to recur occasionally, for example:

https://675daf3418638bf15806-f7e1f8eddcfdd9404f4b72ab9bb1f324.ssl.cf1.rackcdn.com/865575/1/check/neutron-functional-with-uwsgi/bd983b3/testr_results.html
https://488eb2b76bde124417ee-80e67ec01f194d5b25d665df26ee3378.ssl.cf2.rackcdn.com/839066/18/check/neutron-functional-with-uwsgi/66c7fcc/testr_results.html

There may be more that's similar:

$ logsearch log --project openstack/neutron --result FAILURE --pipeline check --job neutron-functional-with-uwsgi --limit 30 'line 208, in test_dvr_router_lifecycle_ha_with_snat_with_fips'
Builds with matching logs 5/30:
+----------------------------------+---------------------+-----------------------------------+--------+
| uuid | finished | review | branch |
+----------------------------------+---------------------+-----------------------------------+--------+
| 1d265722d23548d6930486699202347d | 2022-11-30T13:42:28 | https://review.opendev.org/863881 | master |
| cb2a2d7161764d5f823a09528eedc44c | 2022-11-28T16:47:20 | https://review.opendev.org/865018 | master |
| 66c7fcc56a5347648732bfcb90341ef5 | 2022-11-27T00:55:10 | https://review.opendev.org/839066 | master |
| 85b3b709e9d54718a4f0847da5b4b2df | 2022-11-25T10:00:01 | https://review.opendev.org/865018 | master |
| bd983b367ac441c190e38dcf1fadc87f | 2022-11-24T16:17:06 | https://review.opendev.org/865575 | master |
+----------------------------------+---------------------+-----------------------------------+--------+

Revision history for this message
Lajos Katona (lajos-katona) wrote :

Very similar to this one: https://bugs.launchpad.net/neutron/+bug/1995031 (perhaps duplicate, but I have no time to deeper analysis)

tags: added: functional-tests
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/875767

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/875767
Committed: https://opendev.org/openstack/neutron/commit/2af5fd889b3286dcec21e2bc89f287a0e4129d0f
Submitter: "Zuul (22348)"
Branch: master

commit 2af5fd889b3286dcec21e2bc89f287a0e4129d0f
Author: Slawek Kaplonski <email address hidden>
Date: Tue Feb 28 18:27:29 2023 +0100

    Add sleep before checking if ovs port is in the namespace

    When network device which is ovs internal port is moved to the namespace
    it may happend sometimes that it will have "shy port syndrome" [1].
    Even though there is wait for device to be in namespace in the set_netns
    method it may happend that device is in namespace during this check but
    it dissapears for short time later and that causes failures e.g. in
    functional tests like described in [2].
    To avoid that, this patch proposed simple (and ugly) sleep for 1 second
    before checking if port really exists in the namespace. If it will be
    "shy" port it should already flap during that 1 second.

    [1] https://bugs.launchpad.net/neutron/+bug/1618987
    [2] https://bugs.launchpad.net/neutron/+bug/1961740

    Related-Bug: #1961740
    Related-Bug: #1998337
    Change-Id: I442587e7ef55917f4ea873e190bf8afbc0e911e1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/2023.1)

Related fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/904829

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/zed)

Related fix proposed to branch: stable/zed
Review: https://review.opendev.org/c/openstack/neutron/+/904830

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/yoga)

Related fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/neutron/+/904831

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/xena)

Related fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/904832

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/904833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/zed)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904830
Committed: https://opendev.org/openstack/neutron/commit/222c997022392561c2de2cb493f0f5214eb20dfc
Submitter: "Zuul (22348)"
Branch: stable/zed

commit 222c997022392561c2de2cb493f0f5214eb20dfc
Author: Slawek Kaplonski <email address hidden>
Date: Tue Feb 28 18:27:29 2023 +0100

    Add sleep before checking if ovs port is in the namespace

    When network device which is ovs internal port is moved to the namespace
    it may happend sometimes that it will have "shy port syndrome" [1].
    Even though there is wait for device to be in namespace in the set_netns
    method it may happend that device is in namespace during this check but
    it dissapears for short time later and that causes failures e.g. in
    functional tests like described in [2].
    To avoid that, this patch proposed simple (and ugly) sleep for 1 second
    before checking if port really exists in the namespace. If it will be
    "shy" port it should already flap during that 1 second.

    [1] https://bugs.launchpad.net/neutron/+bug/1618987
    [2] https://bugs.launchpad.net/neutron/+bug/1961740

    Related-Bug: #1961740
    Related-Bug: #1998337
    Change-Id: I442587e7ef55917f4ea873e190bf8afbc0e911e1
    (cherry picked from commit 2af5fd889b3286dcec21e2bc89f287a0e4129d0f)

tags: added: in-stable-zed
tags: added: in-stable-yoga
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/yoga)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904831
Committed: https://opendev.org/openstack/neutron/commit/f4e0b023621a922260b35b37f9940826327efc6e
Submitter: "Zuul (22348)"
Branch: stable/yoga

commit f4e0b023621a922260b35b37f9940826327efc6e
Author: Slawek Kaplonski <email address hidden>
Date: Tue Feb 28 18:27:29 2023 +0100

    Add sleep before checking if ovs port is in the namespace

    When network device which is ovs internal port is moved to the namespace
    it may happend sometimes that it will have "shy port syndrome" [1].
    Even though there is wait for device to be in namespace in the set_netns
    method it may happend that device is in namespace during this check but
    it dissapears for short time later and that causes failures e.g. in
    functional tests like described in [2].
    To avoid that, this patch proposed simple (and ugly) sleep for 1 second
    before checking if port really exists in the namespace. If it will be
    "shy" port it should already flap during that 1 second.

    [1] https://bugs.launchpad.net/neutron/+bug/1618987
    [2] https://bugs.launchpad.net/neutron/+bug/1961740

    Related-Bug: #1961740
    Related-Bug: #1998337
    Change-Id: I442587e7ef55917f4ea873e190bf8afbc0e911e1
    (cherry picked from commit 2af5fd889b3286dcec21e2bc89f287a0e4129d0f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/904829
Committed: https://opendev.org/openstack/neutron/commit/c5e70ad716604a7a379fa51ff16500cd0ad0094c
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit c5e70ad716604a7a379fa51ff16500cd0ad0094c
Author: Slawek Kaplonski <email address hidden>
Date: Tue Feb 28 18:27:29 2023 +0100

    Add sleep before checking if ovs port is in the namespace

    When network device which is ovs internal port is moved to the namespace
    it may happend sometimes that it will have "shy port syndrome" [1].
    Even though there is wait for device to be in namespace in the set_netns
    method it may happend that device is in namespace during this check but
    it dissapears for short time later and that causes failures e.g. in
    functional tests like described in [2].
    To avoid that, this patch proposed simple (and ugly) sleep for 1 second
    before checking if port really exists in the namespace. If it will be
    "shy" port it should already flap during that 1 second.

    [1] https://bugs.launchpad.net/neutron/+bug/1618987
    [2] https://bugs.launchpad.net/neutron/+bug/1961740

    Related-Bug: #1961740
    Related-Bug: #1998337
    Change-Id: I442587e7ef55917f4ea873e190bf8afbc0e911e1
    (cherry picked from commit 2af5fd889b3286dcec21e2bc89f287a0e4129d0f)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/wallaby)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/904833

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/xena)

Change abandoned by "Rodolfo Alonso <email address hidden>" on branch: stable/xena
Review: https://review.opendev.org/c/openstack/neutron/+/904832

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.