Timeout while waiting for router HA state transition

Bug #1939507 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Slawek Kaplonski

Bug Description

It happens in functional tests, like e.g. in neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_ipv6_router_advts_and_fwd_after_router_state_change_backup:

https://a1fab4006c6a1daf82f2-bd8cbc347d913753596edf9ef5797d55.ssl.cf1.rackcdn.com/786478/17/check/neutron-functional-with-uwsgi/7250dcf/testr_results.html

Error is like:

ft1.10: neutron.tests.functional.agent.l3.test_ha_router.L3HATestCase.test_ipv6_router_advts_and_fwd_after_router_state_change_backuptesttools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 702, in wait_until_true
    eventlet.sleep(sleep)
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 60 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_ha_router.py", line 148, in test_ipv6_router_advts_and_fwd_after_router_state_change_backup
    self._test_ipv6_router_advts_and_fwd_helper('backup',
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/l3/test_ha_router.py", line 118, in _test_ipv6_router_advts_and_fwd_helper
    common_utils.wait_until_true(lambda: router.ha_state == 'backup')
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 707, in wait_until_true
    raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 60 seconds

Changed in neutron:
assignee: nobody → Slawek Kaplonski (slaweq)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/804397

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/804397
Committed: https://opendev.org/openstack/neutron/commit/82fd968011b481f11cfec6a1ec767e095d3f41db
Submitter: "Zuul (22348)"
Branch: master

commit 82fd968011b481f11cfec6a1ec767e095d3f41db
Author: Slawek Kaplonski <email address hidden>
Date: Thu Aug 12 16:48:57 2021 +0200

    [L3HA] Add extra logs to the process of ha state changes

    Some extra debug logs may be useful to understand exactly what happens
    during ha states transitions and e.g. to understand failures like
    described in the related bug.

    Related-bug: #1939507
    Change-Id: Id708b2c7a602df8d4ba1b32e58d4b152b5c58ba6

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/808071

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808071
Committed: https://opendev.org/openstack/neutron/commit/b8ef8e722af761dd394064ab70e159aa05639e56
Submitter: "Zuul (22348)"
Branch: master

commit b8ef8e722af761dd394064ab70e159aa05639e56
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/808497

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/808498

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/neutron/+/808499

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/neutron/+/808500

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/c/openstack/neutron/+/808501

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.opendev.org/c/openstack/neutron/+/808502

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/queens)

Fix proposed to branch: stable/queens
Review: https://review.opendev.org/c/openstack/neutron/+/808503

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 19.0.0.0rc1

This issue was fixed in the openstack/neutron 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808497
Committed: https://opendev.org/openstack/neutron/commit/39a2e5e4f2659f83392b15285c35febb56809888
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 39a2e5e4f2659f83392b15285c35febb56809888
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808498
Committed: https://opendev.org/openstack/neutron/commit/c8692833612ca8cf7b8b93b0c58f39711be3594a
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit c8692833612ca8cf7b8b93b0c58f39711be3594a
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808499
Committed: https://opendev.org/openstack/neutron/commit/ebdf7c9f65594cefc558adad860d36b88e4a9a69
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit ebdf7c9f65594cefc558adad860d36b88e4a9a69
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Additionally in that patch there is "master" instead of "primary" used
    for router state.

    Conflicts:
        neutron/tests/functional/agent/l3/test_dvr_router.py

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/train)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808500
Committed: https://opendev.org/openstack/neutron/commit/3ce22c7e4753e2f49157a89b86d86e07c2e4ab84
Submitter: "Zuul (22348)"
Branch: stable/train

commit 3ce22c7e4753e2f49157a89b86d86e07c2e4ab84
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Additionally in that patch there is "master" instead of "primary" used
    for router state.

    Conflicts:
        neutron/tests/functional/agent/l3/test_dvr_router.py

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 16.4.2

This issue was fixed in the openstack/neutron 16.4.2 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/stein)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808501
Committed: https://opendev.org/openstack/neutron/commit/9243c7f3668d50d13c3db000d542a774b49a9bd4
Submitter: "Zuul (22348)"
Branch: stable/stein

commit 9243c7f3668d50d13c3db000d542a774b49a9bd4
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Additionally in that patch there is "master" instead of "primary" used
    for router state.

    Conflicts:
        neutron/tests/functional/agent/l3/test_dvr_router.py

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-stein
tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/queens)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808503
Committed: https://opendev.org/openstack/neutron/commit/fdaef0efaaa29c79af0e0307caae7fdc6d7e9a66
Submitter: "Zuul (22348)"
Branch: stable/queens

commit fdaef0efaaa29c79af0e0307caae7fdc6d7e9a66
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Additionally in that patch there is "master" instead of "primary" used
    for router state.

    Conflicts:
        neutron/tests/functional/agent/l3/test_dvr_router.py

    Depends-On: https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/817933

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-queens
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/rocky)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/808502
Committed: https://opendev.org/openstack/neutron/commit/87c2572edd9f9f916374a59a0a890d62c420961f
Submitter: "Zuul (22348)"
Branch: stable/rocky

commit 87c2572edd9f9f916374a59a0a890d62c420961f
Author: Slawek Kaplonski <email address hidden>
Date: Thu Sep 9 15:10:40 2021 +0200

    [Functional] Wait for the initial state of ha router before test

    In functional tests of the HA and DVR HA routers, when e.g.
    failover is tested, we should always wait for routers to be in the
    expected initial state (primary or backup) before router failover
    will actually be done.
    Without that, we may hit race condition when initial router's state
    is enqueued but not processed yet and then state will be changed thus
    no any action will be performed by L3 agent and test may fail.

    Additionally in that patch there is "master" instead of "primary" used
    for router state.

    Conflicts:
        neutron/tests/functional/agent/l3/test_dvr_router.py

    Depends-On: https://review.opendev.org/c/openstack/neutron-tempest-plugin/+/819109

    Closes-Bug: #1939507
    Change-Id: Ibd8f78fc822b04965c6a79b57b13be364934f64f
    (cherry picked from commit b8ef8e722af761dd394064ab70e159aa05639e56)

tags: added: in-stable-rocky
tags: removed: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 17.3.0

This issue was fixed in the openstack/neutron 17.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 18.2.0

This issue was fixed in the openstack/neutron 18.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron queens-eol

This issue was fixed in the openstack/neutron queens-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron rocky-eol

This issue was fixed in the openstack/neutron rocky-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron stein-eol

This issue was fixed in the openstack/neutron stein-eol release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron train-eol

This issue was fixed in the openstack/neutron train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.