neutron.tests.functional.agent.test_dhcp_agent.DHCPAgentOVSTestCase.test_good_address_allocation is failing intermittently

Bug #1966035 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Critical
Oleg Bondarev

Bug Description

Stacktrace:

ft1.11: neutron.tests.functional.agent.test_dhcp_agent.DHCPAgentOVSTestCase.test_good_address_allocationtesttools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 718, in wait_until_true
    eventlet.sleep(sleep)
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/greenthread.py", line 36, in sleep
    hub.switch()
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-functional/lib/python3.8/site-packages/eventlet/hubs/hub.py", line 313, in switch
    return self.greenlet.switch()
eventlet.timeout.Timeout: 10 seconds

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/test_dhcp_agent.py", line 304, in test_good_address_allocation
    self.assert_good_allocation_for_port(network, port)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/functional/agent/test_dhcp_agent.py", line 224, in assert_good_allocation_for_port
    common_utils.wait_until_true(predicate, 10)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/common/utils.py", line 723, in wait_until_true
    raise WaitTimeout(_("Timed out after %d seconds") % timeout)
neutron.common.utils.WaitTimeout: Timed out after 10 seconds

Failure example: https://12ef4cf37bb4f9cf8615-49968699300828e6c9b78fd54dff75ef.ssl.cf5.rackcdn.com/828687/8/gate/neutron-functional-with-uwsgi/3688faf/testr_results.html

Changed in neutron:
assignee: nobody → Oleg Bondarev (obondarev)
Revision history for this message
Oleg Bondarev (obondarev) wrote :

I've run the test locally many times and couldn't reproduce the issue.
The failed test log has nothing suspicious and actually does not differ from a success test log (except 10 seconds break in one place).

Last patch that touched this test is https://review.opendev.org/c/openstack/neutron/+/827315 - but in fact the test was just restored to the state it had before patch https://review.opendev.org/c/openstack/neutron/+/820897.
Additionally _set_port_dead() added by 827315 is mocked in the test so it shouldn't affect (with removed mocking the test fails 100% times).

Given that the test failed just once (or I can't search well) I suggest to monitor it a bit, and return if it fails once again - then we can think of adding more logging to it.

Revision history for this message
Bence Romsics (bence-romsics) wrote (last edit ):

Still no recurrence in the last 100 runs:

 $ logsearch log --project openstack/neutron --job neutron-functional-with-uwsgi --branch master --limit 100 "line 224, in assert_good_allocation_for_port"
 ...
 Builds with matching logs 0/100:
 ...

Revision history for this message
Miguel Lavalle (minsel) wrote (last edit ):
Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/843389

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/843389
Committed: https://opendev.org/openstack/neutron/commit/763d8af1a3c17ab123f4f4d05f5c15a7ed169283
Submitter: "Zuul (22348)"
Branch: master

commit 763d8af1a3c17ab123f4f4d05f5c15a7ed169283
Author: Oleg Bondarev <email address hidden>
Date: Thu May 26 11:06:33 2022 +0400

    Add some logging to test_good_address_allocation

    Let's see how many times the test asks for IP addr list
    during 10 sec timeout pediod. Probably sporadic failures
    are caused by waiting for GIL for too long.

    Related-Bug: #1966035
    Change-Id: I41679cd7e39b0f7d64f99f509605ac9bc760ac5d

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

This error is not being reproduced in the CI, closing this bug for now.

Changed in neutron:
status: Confirmed → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.