Race condition in DHCP agent cause deletion of the dhcp port as stale port

Bug #2007152 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Confirmed
Critical
Slawek Kaplonski

Bug Description

It seems that when there are 2 subnets using different segments created on same network, it can cause race condition and one of the green threads will expect tap port to be there but other will treat it as "stale" device and will remove it from namespace.

It causes failures in the neutron.tests.fullstack.test_multisegs.TestMultiSegs.test_multi_segs_network pretty often.

Stacktrace example:

ft1.1: neutron.tests.fullstack.test_multisegs.TestMultiSegs.test_multi_segs_network(Open vSwitch Agent)testtools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 743, in wait_until_true
    eventlet.sleep(sleep)
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.10/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.10/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 60 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 182, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_multisegs.py", line 146, in test_multi_segs_network
    self.vm1 = self._spawn_vm(neutron_port=self.port1)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_multisegs.py", line 65, in _spawn_vm
    vm.block_until_dhcp_config_done()
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/resources/machine.py", line 217, in block_until_dhcp_config_done
    utils.wait_until_true(
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 747, in wait_until_true
    raise exception
neutron.tests.common.machine_fixtures.FakeMachineException: Address 10.0.11.195/24 or gateway 10.0.11.1 not configured properly on port portbc2128

Failure example: https://750440148c983d142b4c-dd0601cbd7f0aa15c09a82437ec7e47b.ssl.cf5.rackcdn.com/872396/2/gate/neutron-fullstack-with-uwsgi/c5e45ed/testr_results.html

Revision history for this message
Lajos Katona (lajos-katona) wrote :
yatin (yatinkarel)
tags: added: fullstack
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/874821

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/874821
Committed: https://opendev.org/openstack/neutron/commit/aa40aef70fac609c086c4c6511c6b17e597da044
Submitter: "Zuul (22348)"
Branch: master

commit aa40aef70fac609c086c4c6511c6b17e597da044
Author: Slawek Kaplonski <email address hidden>
Date: Thu Feb 23 08:10:12 2023 +0100

    Mark fullstack TestMultiSegs.test_multi_segs_network as unstable

    It is failing intermittently due to some issue with dhcp which we so far
    don't know. Let's make our life easier and mark this test as unstable
    for now.

    Related-bug: #2007152
    Change-Id: I4d51e5a9ece8d2265549db66e1bf31e3dd727748

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.