Noticed this frequently lately causing CI gate failures, example:
2023-06-22 19:45:50,760 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:50,803 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,004 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,005 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,005 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
2023-06-22 19:45:51,005 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,032 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,222 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,222 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,492 93264 DEBUG [tempest.lib.common.utils.test_utils] Call _wait_for_volume_available_on_system returns true in 0.486690 seconds
2023-06-22 19:45:51,492 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
2023-06-22 19:45:51,492 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,523 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,708 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,709 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,922 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,957 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:52,130 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:52,130 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:52,131 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb
2023-06-22 19:45:52,131 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:52,155 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:52,335 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:52,336 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:49:08,382 93264 ERROR [tempest.lib.common.utils.linux.remote_client] (TestStampPattern:test_stamp_pattern) Executing command on 172.24.5.17 failed. Error: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.
[...]
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc
return func(*args, **kwargs)
File "/opt/stack/tempest/tempest/scenario/manager.py", line 932, in nova_volume_detach
waiters.wait_for_volume_attachment_remove_from_server(
File "/opt/stack/tempest/tempest/common/waiters.py", line 416, in wait_for_volume_attachment_remove_from_server
raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Volume a24921ea-8b2e-4b55-9044-e3e41117f715 failed to detach from server 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 within the required time (196 s) from the compute API perspective
}}}
Traceback (most recent call last):
File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
return f(*func_args, **func_kwargs)
File "/opt/stack/tempest/tempest/scenario/test_stamp_pattern.py", line 104, in test_stamp_pattern
timestamp = self.create_timestamp(ip_for_server,
File "/opt/stack/tempest/tempest/scenario/manager.py", line 1144, in create_timestamp
ssh_client.make_fs(dev_name, fs=fs)
File "/opt/stack/tempest/tempest/common/utils/linux/remote_client.py", line 188, in make_fs
self.exec_command(cmd_mkfs)
File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
return function(self, *args, **kwargs)
File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 115, in exec_command
return self.ssh_client.exec_command(cmd)
File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 207, in exec_command
raise exceptions.TimeoutException(
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.
The test times out after create_timestamp runs mke2fs on the guest over ssh [1], where the default timeout is 300 seconds [2]:
def create_timestamp(self, ip_address, dev_name=None, mount_path='/mnt',
private_key=None, server=None, username=None,
fs='ext4'):
"""Creates timestamp
This wrapper utility does ssh, creates timestamp and returns the
created timestamp.
"""
ssh_client = self.get_remote_client(ip_address,
private_key=private_key,
server=server,
username=username)
if dev_name is not None:
ssh_client.make_fs(dev_name, fs=fs)
Along with the mke2fs command timeout, I also see in the n-cpu log a failure to detach during test cleanup the volume mke2fs was/is running on, example [3]:
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/block_device.py", line 379, in driver_detach
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server virt_driver.detach_volume(context, connection_info, instance, mp,
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2705, in detach_volume
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2445, in _detach_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_from_live_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2524, in _detach_from_live_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server raise exception.DeviceDetachFailed(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 from the live domain config. Device is still attached to the guest.
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server
I'm presume that the volume detach fails because mke2fs is still running or hanging on the volume.
[1] https://github.com/openstack/tempest/blob/2a68b13/tempest/scenario/manager.py#L1144
[2] https://github.com/openstack/tempest/blob/2a68b13/tempest/lib/common/ssh.py#L207
[3] https://zuul.opendev.org/t/openstack/build/a0942015a536404799c26d95163a5374/log/compute1/logs/screen-n-cpu.txt#62841
I tried a patch https:/ /review. opendev. org/c/openstack /tempest/ +/886991 a couple of weeks ago and CI failed on it with the same mke2fs timeout, so it didn't appear to have any effect 🫤 I couldn't think of what other reason it could be timing out ... it must be something inherently wrong in the volume itself? SSH succeeds but then the mke2fs command times out.