Fullstack test TestUninterruptedConnectivityOnL2AgentRestart failing often with LB agent

Bug #1928764 reported by Slawek Kaplonski
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
Critical
Lajos Katona

Bug Description

It seems that test neutron.tests.fullstack.test_connectivity.TestUninterruptedConnectivityOnL2AgentRestart.test_l2_agent_restart in various LB scenarios (flat, vxlan network) are failing recently pretty often.

Examples of failures:

https://09f8e4e92bfb8d2ac89d-b41143eab52d80358d8555f964e9341b.ssl.cf5.rackcdn.com/670611/13/check/neutron-fullstack-with-uwsgi/8f51833/testr_results.html
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_400/790288/1/check/neutron-fullstack-with-uwsgi/40025f9/testr_results.html
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_400/790288/1/check/neutron-fullstack-with-uwsgi/40025f9/testr_results.html
https://0603beb4ddbd36de1165-42644bdefd5590a8f7e4e2e8a8a4112f.ssl.cf5.rackcdn.com/787956/1/check/neutron-fullstack-with-uwsgi/7640987/testr_results.html
https://e978bdcfc0235dcd9417-6560bc3b6382c1d289b358872777ca09.ssl.cf1.rackcdn.com/787956/1/check/neutron-fullstack-with-uwsgi/779913e/testr_results.html
https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_0cb/789648/5/check/neutron-fullstack-with-uwsgi/0cb6d65/testr_results.html

Stacktrace:

ft1.1: neutron.tests.fullstack.test_connectivity.TestUninterruptedConnectivityOnL2AgentRestart.test_l2_agent_restart(LB,Flat network)testtools.testresult.real._StringException: Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/base.py", line 183, in func
    return f(self, *args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/test_connectivity.py", line 236, in test_l2_agent_restart
    self._assert_ping_during_agents_restart(
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/fullstack/base.py", line 123, in _assert_ping_during_agents_restart
    common_utils.wait_until_true(
  File "/usr/lib/python3.8/contextlib.py", line 120, in __exit__
    next(self.gen)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/net_helpers.py", line 147, in async_ping
    f.result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result
    return self.__get_result()
  File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result
    raise self._exception
  File "/usr/lib/python3.8/concurrent/futures/thread.py", line 57, in run
    result = self.fn(*self.args, **self.kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/tests/common/net_helpers.py", line 128, in assert_async_ping
    ns_ip_wrapper.netns.execute(
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/ip_lib.py", line 718, in execute
    return utils.execute(cmd, check_exit_code=check_exit_code,
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/utils.py", line 156, in execute
    raise exceptions.ProcessExecutionError(msg,
neutron_lib.exceptions.ProcessExecutionError: Exit code: 1; Cmd: ['ip', 'netns', 'exec', 'test-af70cf3a-c531-4fdf-ab4c-31cc69cc2c56', 'ping', '-W', 2, '-c', '1', '20.0.0.212']; Stdin: ; Stdout: PING 20.0.0.212 (20.0.0.212) 56(84) bytes of data.

--- 20.0.0.212 ping statistics ---
1 packets transmitted, 0 received, 100% packet loss, time 0ms

; Stderr:

I checked linuxbridge-agent logs (2 cases) and I found there error like below:

2021-05-13 15:46:07.721 96421 DEBUG oslo.privsep.daemon [-] privsep: reply[139960964907248]: (4, ()) _call_back /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py:510
2021-05-13 15:46:07.725 96421 DEBUG oslo.privsep.daemon [-] privsep: reply[139960964907248]: (4, None) _call_back /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py:510
2021-05-13 15:46:07.728 96421 DEBUG oslo.privsep.daemon [-] privsep: Exception during request[139960964907248]: Network interface brqa235fa8c-09 not found in namespace None. _process_cmd /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py:488
Traceback (most recent call last):
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py", line 485, in _process_cmd
    ret = func(*f_args, **f_kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/priv_context.py", line 249, in _wrap
    return func(*args, **kwargs)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 278, in delete_ip_address
    _run_iproute_addr("delete",
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 239, in _run_iproute_addr
    idx = get_link_id(device, namespace)
  File "/home/zuul/src/opendev.org/openstack/neutron/neutron/privileged/agent/linux/ip_lib.py", line 201, in get_link_id
    raise NetworkInterfaceNotFound(device=device, namespace=namespace)
neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network interface brqa235fa8c-09 not found in namespace None.
2021-05-13 15:46:07.730 96421 DEBUG oslo.privsep.daemon [-] privsep: reply[139960964907248]: (5, 'neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound', ('Network interface brqa235fa8c-09 not found in namespace None.',)) _call_back /home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py:510
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server [req-6e40de24-c317-438b-914e-65ea4acea314 - - - - -] Exception during message handling: neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network interface brqa235fa8c-09 not found in namespace None.
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server Traceback (most recent call last):
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_messaging/rpc/server.py", line 165, in _process_incoming
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server res = self.dispatcher.dispatch(message)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 309, in dispatch
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server return self._do_dispatch(endpoint, method, ctxt, args)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_messaging/rpc/dispatcher.py", line 229, in _do_dispatch
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server result = func(ctxt, **new_args)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 887, in network_delete
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server self.agent.mgr.delete_bridge(bridge_name)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 600, in delete_bridge
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server updated = self.update_interface_ip_details(interface,
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 418, in update_interface_ip_details
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server self._update_interface_ip_details(destination, source, ips,
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/plugins/ml2/drivers/linuxbridge/agent/linuxbridge_neutron_agent.py", line 410, in _update_interface_ip_details
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server src_device.addr.delete(cidr=ip['cidr'])
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/ip_lib.py", line 517, in delete
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server delete_ip_address(cidr, self.name, self._parent.namespace)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/neutron/agent/linux/ip_lib.py", line 811, in delete_ip_address
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server privileged.delete_ip_address(
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/priv_context.py", line 247, in _wrap
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server return self.channel.remote_call(name, args, kwargs)
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server File "/home/zuul/src/opendev.org/openstack/neutron/.tox/dsvm-fullstack-gate/lib/python3.8/site-packages/oslo_privsep/daemon.py", line 224, in remote_call
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server raise exc_type(*result[2])
2021-05-13 15:46:07.730 96075 ERROR oslo_messaging.rpc.server neutron.privileged.agent.linux.ip_lib.NetworkInterfaceNotFound: Network interface brqa235fa8c-09 not found in namespace None.

This may be related to the issue or may be red herring. I don't really know for now.

Changed in neutron:
assignee: nobody → Lajos Katona (lajos-katona)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/792505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/792507

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/794228

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/794228
Committed: https://opendev.org/openstack/neutron/commit/13994d2327d02977faa638422b579912cf3dda7f
Submitter: "Zuul (22348)"
Branch: master

commit 13994d2327d02977faa638422b579912cf3dda7f
Author: elajkat <email address hidden>
Date: Wed Jun 2 15:40:35 2021 +0200

    Mark fullstack test_l2_agent_restart as unstable

    With linuxbridge agent the connectivity test fails often after l2 agent
    restart, so mark it as unstable until the root cause is found.

    Change-Id: Ida3193acac2fa22bbfd4b18ef3a3bceafe46d3ec
    Related-Bug: #1928764

tags: added: neutron-proactive-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by "Lajos Katona <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/792505

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Slawek Kaplonski <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/792507
Reason: This review is > 4 weeks without comment, and failed Zuul jobs the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Mohammed Naser (mnaser) wrote :

I believe this is related to this:

https://bugs.launchpad.net/neutron/+bug/1896734

I think that issue is still not resolved.

Revision history for this message
Mohammed Naser (mnaser) wrote :
Changed in neutron:
status: Confirmed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/873941

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Changed in neutron:
status: Fix Released → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/neutron/+/894428

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/894428
Committed: https://opendev.org/openstack/neutron/commit/39593d86d506572ff5ade112e272c47e909f70b6
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 39593d86d506572ff5ade112e272c47e909f70b6
Author: elajkat <email address hidden>
Date: Wed Jun 2 15:40:35 2021 +0200

    Mark fullstack test_l2_agent_restart as unstable

    With linuxbridge agent the connectivity test fails often after l2 agent
    restart, so mark it as unstable until the root cause is found.

    Change-Id: Ida3193acac2fa22bbfd4b18ef3a3bceafe46d3ec
    Related-Bug: #1928764
    (cherry picked from commit 13994d2327d02977faa638422b579912cf3dda7f)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to neutron (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/neutron/+/900301

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to neutron (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/900301
Committed: https://opendev.org/openstack/neutron/commit/452180ca9d6b280ead4adb92b21966061219268e
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 452180ca9d6b280ead4adb92b21966061219268e
Author: elajkat <email address hidden>
Date: Wed Jun 2 15:40:35 2021 +0200

    Mark fullstack test_l2_agent_restart as unstable

    With linuxbridge agent the connectivity test fails often after l2 agent
    restart, so mark it as unstable until the root cause is found.

    Change-Id: Ida3193acac2fa22bbfd4b18ef3a3bceafe46d3ec
    Related-Bug: #1928764
    (cherry picked from commit 13994d2327d02977faa638422b579912cf3dda7f)
    (cherry picked from commit 39593d86d506572ff5ade112e272c47e909f70b6)

tags: added: in-stable-victoria
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

The test "test_l2_agent_restart" is no longer considered as unstable and we didn't see any occurrence of this error recently. Closing this bug for now.

Changed in neutron:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.