tempest

create_timestamp times out and fails on Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb'

Bug #2024859 reported by melanie witt on 2023-06-22

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tempest	Confirmed	Undecided	Unassigned

Bug Description

Noticed this frequently lately causing CI gate failures, example:

2023-06-22 19:45:50,760 93264 INFO 2023-06-22 19:45:50,803 93264 INFO 2023-06-22 19:45:51,004 93264 INFO 2023-06-22 19:45:51,005 93264 INFO 2023-06-22 19:45:51,005 93264 DEBUG 2023-06-22 19:45:51,005 93264 INFO 2023-06-22 19:45:51,032 93264 INFO 2023-06-22 19:45:51,222 93264 INFO 2023-06-22 19:45:51,222 93264 INFO 2023-06-22 19:45:51,492 93264 DEBUG 2023-06-22 19:45:51,492 93264 DEBUG 2023-06-22 19:45:51,492 93264 INFO 2023-06-22 19:45:51,523 93264 INFO 2023-06-22 19:45:51,708 93264 INFO 2023-06-22 19:45:51,709 93264 INFO 2023-06-22 19:45:51,922 93264 INFO 2023-06-22 19:45:51,957 93264 INFO 2023-06-22 19:45:52,130 93264 INFO 2023-06-22 19:45:52,130 93264 INFO 2023-06-22 19:45:52,131 93264 DEBUG 2023-06-22 19:45:52,131 93264 INFO 2023-06-22 19:45:52,155 93264 INFO 2023-06-22 19:45:52,335 93264 INFO 2023-06-22 19:45:52,336 93264 INFO 2023-06-22 19:49:08,382 93264 ERROR Details: Command: 'set [tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
[paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
[paramiko.transport] Authentication (publickey) successful!
[tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
[tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
[tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
[paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
[paramiko.transport] Authentication (publickey) successful!
[tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
[tempest.lib.common.utils.test_utils] Call _wait_for_volume_available_on_system returns true in 0.486690 seconds
[tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; lsblk -lb --nodeps
[tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
[paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
[paramiko.transport] Authentication (publickey) successful!
[tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
[tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
[paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
[paramiko.transport] Authentication (publickey) successful!
[tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
[tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb
[tempest.lib.common.ssh] Creating ssh connection to '172.24.5.17:22' as 'cirros' with public key authentication
[paramiko.transport] Connected (version 2.0, client dropbear_2020.81)
[paramiko.transport] Authentication (publickey) successful!
[tempest.lib.common.ssh] ssh connection to cirros@172.24.5.17 successfully created
[tempest.lib.common.utils.linux.remote_client] (TestStampPattern:test_stamp_pattern) Executing command on 172.24.5.17 failed. Error: Request timed out
-eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.

[...]

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/lib/common/utils/test_utils.py", line 87, in call_and_ignore_notfound_exc
    return func(*args, **kwargs)
  File "/opt/stack/tempest/tempest/scenario/manager.py", line 932, in nova_volume_detach
    waiters.wait_for_volume_attachment_remove_from_server(
  File "/opt/stack/tempest/tempest/common/waiters.py", line 416, in wait_for_volume_attachment_remove_from_server
    raise lib_exc.TimeoutException(message)
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Volume a24921ea-8b2e-4b55-9044-e3e41117f715 failed to detach from server 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 within the required time (196 s) from the compute API perspective
}}}

Traceback (most recent call last):
  File "/opt/stack/tempest/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/opt/stack/tempest/tempest/scenario/test_stamp_pattern.py", line 104, in test_stamp_pattern
    timestamp = self.create_timestamp(ip_for_server,
  File "/opt/stack/tempest/tempest/scenario/manager.py", line 1144, in create_timestamp
    ssh_client.make_fs(dev_name, fs=fs)
  File "/opt/stack/tempest/tempest/common/utils/linux/remote_client.py", line 188, in make_fs
    self.exec_command(cmd_mkfs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/opt/stack/tempest/tempest/lib/common/utils/linux/remote_client.py", line 115, in exec_command
    return self.ssh_client.exec_command(cmd)
  File "/opt/stack/tempest/tempest/lib/common/ssh.py", line 207, in exec_command
    raise exceptions.TimeoutException(
tempest.lib.exceptions.TimeoutException: Request timed out
Details: Command: 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo mke2fs -t ext4 /dev/vdb' executed on host '172.24.5.17'.

The test times out after create_timestamp runs mke2fs on the guest over ssh [1], where the default timeout is 300 seconds [2]:

    def create_timestamp(self, ip_address, dev_name=None, mount_path='/mnt',
                         private_key=None, server=None, username=None,
                         fs='ext4'):
        """Creates timestamp

        This wrapper utility does ssh, creates timestamp and returns the
        created timestamp.
        """
        ssh_client = self.get_remote_client(ip_address,
                                            private_key=private_key,
                                            server=server,
                                            username=username)

if dev_name is not None:
ssh_client.make_fs(dev_name, fs=fs)

Along with the mke2fs command timeout, I also see in the n-cpu log a failure to detach during test cleanup the volume mke2fs was/is running on, example [3]:

Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/block_device.py", line 379, in driver_detach
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server virt_driver.detach_volume(context, connection_info, instance, mp,
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2705, in detach_volume
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2445, in _detach_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server self._detach_from_live_with_retry(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 2524, in _detach_from_live_with_retry
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server raise exception.DeviceDetachFailed(
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server nova.exception.DeviceDetachFailed: Device detach failed for vdb: Run out of retry while detaching device vdb with device alias virtio-disk1 from instance 84e2b9bd-2243-40c2-aaaa-1eb533b07f48 from the live domain config. Device is still attached to the guest.
Jun 22 19:52:32.830244 np0034409895 nova-compute[40324]: ERROR oslo_messaging.rpc.server

I'm presume that the volume detach fails because mke2fs is still running or hanging on the volume.

[1] https://github.com/openstack/tempest/blob/2a68b13/tempest/scenario/manager.py#L1144
[2] https://github.com/openstack/tempest/blob/2a68b13/tempest/lib/common/ssh.py#L207
[3] https://zuul.opendev.org/t/openstack/build/a0942015a536404799c26d95163a5374/log/compute1/logs/screen-n-cpu.txt#62841

Revision history for this message

melanie witt (melwitt) wrote on 2023-07-11:

I tried a patch https://review.opendev.org/c/openstack/tempest/+/886991 a couple of weeks ago and CI failed on it with the same mke2fs timeout, so it didn't appear to have any effect 🫤 I couldn't think of what other reason it could be timing out ... it must be something inherently wrong in the volume itself? SSH succeeds but then the mke2fs command times out.

Martin Kopec (mkopec) on 2023-07-27

Changed in tempest:
status:	New → Confirmed

Revision history for this message

Martin Kopec (mkopec) wrote on 2023-07-28:

This may help here:
https://review.opendev.org/c/openstack/tempest/+/884804

Revision history for this message

melanie witt (melwitt) wrote on 2023-07-31:

Opensearch query:

https://opensearch.logs.openstack.org/_dashboards/app/discover?security_tenant=global#/?_g=(filters:!(),refreshInterval:(pause:!t,value:0),time:(from:now-30d,to:now))&_a=(columns:!(build_branch,build_change,build_name,project,build_status,build_queue,filename),filters:!(),index:'94869730-aea8-11ec-9e6a-83741af3fdcd',interval:d,query:(language:lucene,query:'message:%22Details:%20Command:%20!'set%20-eu%20-o%20pipefail;%20PATH%3D$PATH:%2Fsbin:%2Fusr%2Fsbin;%20sudo%20mke2fs%20-t%20ext4%22%20AND%20filename:tempest_log.txt'),sort:!())

A variety of jobs are affected in the nova, cinder, and tempest projects, check and gate and periodic queues (though the vast majority are check queue), all on the master branch, all failures

Revision history for this message

Martin Kopec (mkopec) wrote on 2023-08-07:

maybe this one then? https://review.opendev.org/c/openstack/tempest/+/890350

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.