create_timestamp times out and fails on Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb'

Bug #2024859 reported by melanie witt
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tempest
Confirmed
Undecided
Unassigned

Bug Description

Noticed this frequently lately causing CI gate failures, example:

2023-06-22 19:45:50,760 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:50,803 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,004 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,005 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,005 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
2023-06-22 19:45:51,005 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,032 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,222 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,222 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,492 93264 DEBUG [tempest.lib.common.utils.test_utils] Call _wait_for_volume_available_on_system returns true in 0.486690 seconds
2023-06-22 19:45:51,492 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
2023-06-22 19:45:51,492 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,523 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:51,708 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:51,709 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:51,922 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:51,957 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:52,130 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:52,130 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:45:52,131 93264 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb
2023-06-22 19:45:52,131 93264 INFO [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
2023-06-22 19:45:52,155 93264 INFO [paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
2023-06-22 19:45:52,335 93264 INFO [paramiko.transport] Authentication (publickey) successful!
2023-06-22 19:45:52,336 93264 INFO [tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
2023-06-22 19:49:08,382 93264 ERROR [tempest.lib.common.utils.linux.remote_client] (TestStampPattern:test_stamp_pattern) Executing command on 172.24.5.17 failed. Error: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.

[...]

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/opt/stack/tempest/tempest/scenario/manager.py", line 932, in nova_volume_detach
    waiters.wait_for_volume_attachment_remove_from_server(
  File "/opt/stack/tempest/tempest/common/waiters.py", line 416, in wait_for_volume_attachment_remove_from_server
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Volume a24921ea-8b2e-4b55-9044-e3e41117f715 failed to detach from server 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 within the required time (196 s) from the compute API perspective
}}}

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/scenario/test_stamp_pattern.py", line 104, in test_stamp_pattern
    timestamp = self.create_timestamp(ip_for_server,
  File "/opt/stack/tempest/tempest/scenario/manager.py", line 1144, in create_timestamp
    ssh_client.make_fs(dev_name, fs=fs)
  File "/opt/stack/tempest/tempest/common/utils/linux/remote_client.py", line 188, in make_fs
    self.exec_command(cmd_mkfs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 115, in exec_command
    return self.ssh_client.exec_command(cmd)
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 207, in exec_command
    raise exceptions.TimeoutException(
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.

The test times out after create_timestamp runs mke2fs on the guest over ssh [1], where the default timeout is 300 seconds [2]:

    def create_timestamp(self, ip_address, dev_name=None, mount_path='/mnt',
                         private_key=None, server=None, username=None,
                         fs='ext4'):
        """Creates timestamp

        This wrapper utility does ssh, creates timestamp and returns the
        created timestamp.
        """
        ssh_client = self.get_remote_client(ip_address,
                                            private_key=private_key,
                                            server=server,
                                            username=username)

        if dev_name is not None:
            ssh_client.make_fs(dev_name, fs=fs)

Along with the mke2fs command timeout, I also see in the n-cpu log a failure to detach during test cleanup the volume mke2fs was/is running on, example [3]:

Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/block_device.py", line 379, in driver_detach
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server virt_driver.detach_volume(context, connection_info, instance, mp,
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2705, in detach_volume
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2445, in _detach_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_from_live_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2524, in _detach_from_live_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server raise exception.DeviceDetachFailed(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 from the live domain config. Device is still attached to the guest.
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server

I'm presume that the volume detach fails because mke2fs is still running or hanging on the volume.

[1] https://github.com/openstack/tempest/blob/2a68b13/tempest/scenario/manager.py#L1144
[2] https://github.com/openstack/tempest/blob/2a68b13/tempest/lib/common/ssh.py#L207
[3] https://zuul.opendev.org/t/openstack/build/a0942015a536404799c26d95163a5374/log/compute1/logs/screen-n-cpu.txt#62841

Revision history for this message
melanie witt (melwitt) wrote :

I tried a patch https://review.opendev.org/c/openstack/tempest/+/886991 a couple of weeks ago and CI failed on it with the same mke2fs timeout, so it didn't appear to have any effect 🫤 I couldn't think of what other reason it could be timing out ... it must be something inherently wrong in the volume itself? SSH succeeds but then the mke2fs command times out.

Martin Kopec (mkopec)
Changed in tempest:
status: New → Confirmed
Revision history for this message
Martin Kopec (mkopec) wrote :
Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
Martin Kopec (mkopec) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.