tripleo-ci-centos-8-containers-multinode failing for the tempest TestNetworkBasicOps.test_hotplug_nic test due to Device net1 is already in the process of unplug

Bug #1926647 reported by Pooja Jadhav
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

tripleo-ci-centos-8-containers-multinode failing with the tempest test - tempest.scenario.test_network_basic_ops.TestNetworkBasicOps.test_hotplug_nic on this patch[1].

Traceback :

traceback-1: {{{
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/network/subnets_client.py", line 52, in delete_subnet
    return self.delete_resource(uri)
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/network/base.py", line 42, in delete_resource
    resp, body = self.delete(req_uri)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 331, in delete
    return self.request('DELETE', url, extra_headers, headers, body)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 704, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 825, in _error_checker
    raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: Conflict with state of target resource
Details: {'type': 'SubnetInUse', 'message': 'Unable to complete operation on subnet 71f02597-bd45-434a-818d-0bc21e6f33f9: One or more ports have an IP allocation from this subnet.', 'detail': ''}
}}}

traceback-2: {{{
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/network/networks_client.py", line 52, in delete_network
    return self.delete_resource(uri)
  File "/usr/lib/python3.6/site-packages/tempest/lib/services/network/base.py", line 42, in delete_resource
    resp, body = self.delete(req_uri)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 331, in delete
    return self.request('DELETE', url, extra_headers, headers, body)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 704, in request
    self._error_checker(resp, resp_body)
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 825, in _error_checker
    raise exceptions.Conflict(resp_body, resp=resp)
tempest.lib.exceptions.Conflict: Conflict with state of target resource
Details: {'type': 'NetworkInUse', 'message': 'Unable to complete operation on network f189cece-dd5d-4bca-9bde-054acc47def7. There are one or more ports still in use on the network.', 'detail': ''}
}}}

Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tempest/lib/common/rest_client.py", line 916, in wait_for_resource_deletion
    raise exceptions.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: (TestNetworkBasicOps:_run_cleanups) Failed to delete resource 2b0d3668-0f72-4522-91dd-59ee8d9c6f6c within the required time (300 s).

References :

http://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f2c/785027/7/check/tripleo-ci-centos-8-containers-multinode/f2c27c8/logs/undercloud/var/log/tempest/stestr_results.html

[1]https://review.opendev.org/c/openstack/tripleo-heat-templates/+/785027

Changed in networking-midonet:
status: New → Invalid
Amol Kahat (amolkahat)
affects: networking-midonet → tripleo
Changed in tripleo:
status: Invalid → Triaged
importance: Undecided → Critical
milestone: none → xena-1
tags: added: promotion-blocker
Revision history for this message
wes hayutin (weshayutin) wrote :

pausible cause:

At the same time the test failed w/ see:
2021-04-29 05:23:23.423 ERROR /var/log/containers/nova/nova-api.log: 12 ERROR oslo.messaging._drivers.impl_rabbit [-] [1f6deee6-3bde-4f68-bcfa-841428e0f048] AMQP server on centos-8-stream-vexxhost-ca-ymq-1-0024415186.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: ConnectionResetError: [Errno 104] Connection reset by peer

http://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f2c/785027/7/check/tripleo-ci-centos-8-containers-multinode/f2c27c8/logs/subnode-1/var/log/extra/errors.txt.txt

2021-04-29 05:23:25.264 [info] <0.4083.0> connection <0.4083.0> (192.168.24.3:35586 -> 192.168.24.3:5672 - mod_wsgi:12:1f6deee6-3bde-4f68-bcfa-841428e0f048): user 'guest' authenticated and granted access to vhost '/'
2021-04-29 05:34:25.275 [error] <0.4083.0> closing AMQP connection <0.4083.0> (192.168.24.3:35586 -> 192.168.24.3:5672 - mod_wsgi:12:1f6deee6-3bde-4f68-bcfa-841428e0f048):
missed heartbeats from client, timeout: 60s

http://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_f2c/785027/7/check/tripleo-ci-centos-8-containers-multinode/f2c27c8/logs/subnode-1/var/log/containers/rabbitmq/rabbit%40centos-8-stream-vexxhost-ca-ymq-1-0024415186.log

Revision history for this message
Slawek Kaplonski (slaweq) wrote :
Download full text (15.0 KiB)

I looked at the logs from that failed job today. It seems for me that some related problem may be in the nova-compute's log.
In the failed test there is port "2b0d3668-0f72-4522-91dd-59ee8d9c6f6c" which was attached to the vm at:

2021-04-29 05:18:21,198 149481 INFO [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:test_hotplug_nic): 200 POST http://192.168.24.16:8774/v2.1/servers/00e072b3-56a3-4659-8a2f-1bbf2faf61d8/os-interface 3.107s

later connectivity was checked and all seemed to be fine. And then there was detach of that interface done by test:

2021-04-29 05:18:22,415 149481 INFO [tempest.lib.common.rest_client] Request (TestNetworkBasicOps:_run_cleanups): 202 DELETE http://192.168.24.16:8774/v2.1/servers/00e072b3-56a3-4659-8a2f-1bbf2faf61d8/os-interface/2b0d3668-0f72-4522-91dd-59ee8d9c6f6c 0.078s

just after that in nova-compute's log there was error like:

2021-04-29 05:18:22.507 7 DEBUG nova.virt.libvirt.guest [req-658c758b-7fa6-4b4e-9079-b3d7d1b28e42 1ae5541c1de04e099e2ef0c804811a22 0b8332dcd5eb4b9f981cfac4e46bf52e - default default] Attempting initial detach for device tap2b0d3668-0f detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:455
2021-04-29 05:18:22.507 7 DEBUG nova.virt.libvirt.guest [req-658c758b-7fa6-4b4e-9079-b3d7d1b28e42 1ae5541c1de04e099e2ef0c804811a22 0b8332dcd5eb4b9f981cfac4e46bf52e - default default] detach device xml: <interface type="bridge">
  <mac address="fa:16:3e:43:49:68"/>
  <model type="virtio"/>
  <driver name="qemu" rx_queue_size="512"/>
  <source bridge="br-int"/>
  <mtu size="1292"/>
  <target dev="tap2b0d3668-0f"/>
  <virtualport type="openvswitch">
    <parameters interfaceid="2b0d3668-0f72-4522-91dd-59ee8d9c6f6c"/>
  </virtualport>
</interface>
 detach_device /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:506
2021-04-29 05:18:23.253 7 DEBUG ovsdbapp.backend.ovs_idl.vlog [-] [POLLIN] on fd 25 __log_wakeup /usr/lib64/python3.6/site-packages/ovs/poller.py:263
2021-04-29 05:18:27.530 7 DEBUG nova.virt.libvirt.guest [req-658c758b-7fa6-4b4e-9079-b3d7d1b28e42 1ae5541c1de04e099e2ef0c804811a22 0b8332dcd5eb4b9f981cfac4e46bf52e - default default] Start retrying detach until device tap2b0d3668-0f is gone. detach_device_with_retry /usr/lib/python3.6/site-packages/nova/virt/libvirt/guest.py:471
2021-04-29 05:18:27.531 7 DEBUG oslo.service.loopingcall [req-658c758b-7fa6-4b4e-9079-b3d7d1b28e42 1ae5541c1de04e099e2ef0c804811a22 0b8332dcd5eb4b9f981cfac4e46bf52e - default default] Waiting for function nova.virt.libvirt.guest.Guest.detach_device_with_retry.<locals>._do_wait_and_retry_detach to return. func /usr/lib/python3.6/site-packages/oslo_service/loopingcall.py:435
2021-04-29 05:18:27.534 7 DEBUG nova.virt.libvirt.guest [-] detach device xml: <interface type="bridge">
  <mac address="fa:16:3e:43:49:68"/>
  <model type="virtio"/>
  <driver name="qemu" rx_queue_size="512"/>
  <source bridge="br-int"/>
  <mtu size="1292"/>
  <target dev="tap2b0d3668-0f"/>
  <virtualport type="openvswitch">
    <parameters interfaceid="2b0d3668-0f72-4522-91dd-59ee8d9c6f6c"/>
  </virtualport>
</interface>
 detach_device /usr/lib/python3.6/site-packages/nova/virt/libvir...

Revision history for this message
James Slagle (james-slagle) wrote :
Revision history for this message
Lee Yarwood (lyarwood) wrote :
summary: tripleo-ci-centos-8-containers-multinode failing for the tempest
- TestNetworkBasicOps.test_hotplug_nic test
+ TestNetworkBasicOps.test_hotplug_nic test due to Device net1 is already
+ in the process of unplug
wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.