Fullstack ha failover tests from neutron.tests.fullstack.test_l3_agent.TestHAL3Agent failing due to "destination ip 42.0.0.15 is replying to ping from namespace test-3dc5c664-ddba-4c2e-a9a9-48bbda478506, but it shouldn't"

Bug #2091021 reported by Slawek Kaplonski
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Slawek Kaplonski

Bug Description

Traceback:

ft1.10: neutron.tests.fullstack.test_l3_agent.TestHAL3Agent.test_ha_router_failover_host_failuretesttools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 178, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py", line 565, in test_ha_router_failover_host_failure
    self._test_ha_router_failover('kill')
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_l3_agent.py", line 522, in _test_ha_router_failover
    vm.assert_no_ping(external.ip)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/machine_fixtures.py", line 86, in assert_no_ping
    net_helpers.assert_no_ping(self.namespace, dst_ip)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/net_helpers.py", line 168, in assert_no_ping
    tools.fail("destination ip %(destination)s is replying to ping from "
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/tools.py", line 167, in fail
    raise unittest.TestCase.failureException(msg)
AssertionError: destination ip 42.0.0.15 is replying to ping from namespace test-3dc5c664-ddba-4c2e-a9a9-48bbda478506, but it shouldn't

Failure example: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e5c/periodic/opendev.org/openstack/neutron/master/neutron-fullstack/e5cb4b0/testr_results.html

Revision history for this message
Slawek Kaplonski (slaweq) wrote :

At first glance it looked like similar to https://bugs.launchpad.net/neutron/+bug/2083609 but it is not. After deeper look into this it seems for me that sometime we may hit an issue when keepalived will do failover "too fast" and vm.assert_no_ping(external.ip) will fail as ping will be already working after keepalived will do failover.

I think we should remove that assertion from the test as it relies on the external tool (keepalived) which we can't really control and due to that we may hit issues like that from time to time.

Changed in neutron:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/937097

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/937097
Committed: https://opendev.org/openstack/neutron/commit/813e743f0665b30082e3f6789850019dc35bcad7
Submitter: "Zuul (22348)"
Branch: master

commit 813e743f0665b30082e3f6789850019dc35bcad7
Author: Slawek Kaplonski <email address hidden>
Date: Thu Dec 5 10:02:22 2024 +0100

    [Fullstack] Don't assert no connectivity after host is stopped

    In the fullstack L3 HA tests, after fake host with router was killed
    or disconnected (so wasn't stopped gracefully) there was assertion that
    connectivity using Floating IP is broken for short period of time.
    This could lead to the race between test and keepalived which is doing
    failover. If keepalived was fast enough, test could fail at this
    assertion as failover would already happen and Floating IP would be
    working fine again.

    This patch removes that assertion of no connectity as this can't be
    really controlled by the test and may lead to random failures. It's also
    not the most important thing in those tests - we should make sure that
    connectivity is working fine after host is crashed thanks to the
    failover which should happen and that is tested in the same tests.

    Closes-bug: #2091021
    Change-Id: Ib00c9823e3600bb2c234cbc90cac81723b4eec11

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2024.2)

Fix proposed to branch: stable/2024.2
Review: https://review.opendev.org/c/openstack/neutron/+/937332

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2024.2)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/937332
Committed: https://opendev.org/openstack/neutron/commit/ecfcd4b3931b56fd5d4aefd1e33660efa570b5a8
Submitter: "Zuul (22348)"
Branch: stable/2024.2

commit ecfcd4b3931b56fd5d4aefd1e33660efa570b5a8
Author: Slawek Kaplonski <email address hidden>
Date: Thu Dec 5 10:02:22 2024 +0100

    [Fullstack] Don't assert no connectivity after host is stopped

    In the fullstack L3 HA tests, after fake host with router was killed
    or disconnected (so wasn't stopped gracefully) there was assertion that
    connectivity using Floating IP is broken for short period of time.
    This could lead to the race between test and keepalived which is doing
    failover. If keepalived was fast enough, test could fail at this
    assertion as failover would already happen and Floating IP would be
    working fine again.

    This patch removes that assertion of no connectity as this can't be
    really controlled by the test and may lead to random failures. It's also
    not the most important thing in those tests - we should make sure that
    connectivity is working fine after host is crashed thanks to the
    failover which should happen and that is tested in the same tests.

    Closes-bug: #2091021
    Change-Id: Ib00c9823e3600bb2c234cbc90cac81723b4eec11
    (cherry picked from commit 813e743f0665b30082e3f6789850019dc35bcad7)

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.