test_shelve_instance and test_shelve_volume_backed_instance fail because of missing /tmp/timestamp

Bug #2051268 reported by Abdullah
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tempest
In Progress
Undecided
Martin Kopec

Bug Description

Relevant run: https://solutions.qa.canonical.com/testruns/8befe312-65b9-466c-941e-3081a3d8f950/
Relevant test results: https://oil-jenkins.canonical.com/artifacts/8befe312-65b9-466c-941e-3081a3d8f950/generated/generated/openstack/tempest/results.html

Other artifacts from the run: https://oil-jenkins.canonical.com/artifacts/8befe312-65b9-466c-941e-3081a3d8f950/index.html

Example for test_shelve_instance:
```
2024-01-24 18:09:37,854 139089 INFO [tempest.common.waiters] State transition "SHELVED_OFFLOADED/spawning" ==> "ACTIVE/None" after 58 second wait
2024-01-24 18:09:37,855 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.105:22' as 'ubuntu' with public key authentication
2024-01-24 18:09:45,122 139089 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to ubuntu@10.246.182.105 ([Errno None] Unable to connect to port 22 on 10.246.182.105). Number attempts: 1. Retry after 2 seconds.
2024-01-24 18:09:47,899 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.105 successfully created
2024-01-24 18:09:47,900 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp
2024-01-24 18:09:47,901 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.105:22' as 'ubuntu' with public key authentication
2024-01-24 18:09:48,108 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.105 successfully created
2024-01-24 18:09:50,318 139089 ERROR [tempest.lib.common.utils.linux.remote_client] (TestShelveInstance:test_shelve_instance) Executing command on 10.246.182.105 failed. Error: Command 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp' failed, exit status: 1, stderr:
cat: /tmp/timestamp: No such file or directory
```

and at the end:
```
Traceback (most recent call last):
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/common/utils/__init__.py", line 70, in wrapper
    return f(*func_args, **func_kwargs)
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/scenario/test_shelve_instance.py", line 120, in test_shelve_volume_backed_instance
    self._create_server_then_shelve_and_unshelve(boot_from_volume=True)
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/scenario/test_shelve_instance.py", line 101, in _create_server_then_shelve_and_unshelve
    timestamp2 = self.get_timestamp(instance_ip,
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/scenario/manager.py", line 1163, in get_timestamp
    timestamp = ssh_client.exec_command('sudo cat %s/timestamp'
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 31, in wrapper
    return function(self, *args, **kwargs)
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/lib/common/utils/linux/remote_client.py", line 115, in exec_command
    return self.ssh_client.exec_command(cmd)
  File "/snap/tempest/192/lib/python3.10/site-packages/tempest/lib/common/ssh.py", line 238, in exec_command
    raise exceptions.SSHExecCommandFailed(
neutron_tempest_plugin.common.utils.SSHExecCommandFailed: Command 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp' failed, exit status: 1, stderr:
cat: /tmp/timestamp: No such file or directory
```

Tags: cdo-qa
Revision history for this message
Martin Kopec (mkopec) wrote :

Those tests create the timestamp on the server before reaching to this point ... if the file wasn't there, it means it wasn't created in the first place - that's the bug we need to investigate, however, from these logs we can't tell what went wrong when the file was supposed to be created (I can't open the links you shared, maybe they've expired already).

Revision history for this message
Jeffrey Chang (modern911) wrote :
Download full text (11.5 KiB)

Trying to post more log from same run

2024-01-24 18:10:52,234 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.211:22' as 'ubuntu' with public key authentication
2024-01-24 18:11:23,786 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.211 successfully created
2024-01-24 18:11:23,787 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo sh -c "date > /tmp/timestamp; sync"
2024-01-24 18:11:23,787 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.211:22' as 'ubuntu' with public key authentication
2024-01-24 18:11:23,966 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.211 successfully created
2024-01-24 18:11:26,386 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp
2024-01-24 18:11:26,386 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.211:22' as 'ubuntu' with public key authentication
2024-01-24 18:11:26,434 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.211 successfully created

2024-01-24 18:11:35,795 139089 INFO [tempest.common.waiters] State transition "ACTIVE/shelving" ==> "SHELVED_OFFLOADED/None" after 8 second wait

2024-01-24 18:11:38,907 139089 INFO [tempest.common.waiters] State transition "SHELVED_OFFLOADED/unshelving" ==> "SHELVED_OFFLOADED/spawning" after 2 second wait

2024-01-24 18:11:44,382 139089 INFO [tempest.common.waiters] State transition "SHELVED_OFFLOADED/spawning" ==> "ACTIVE/None" after 8 second wait

2024-01-24 18:11:44,383 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.211:22' as 'ubuntu' with public key authentication
2024-01-24 18:11:59,778 139089 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to ubuntu@10.246.182.211 ([Errno None] Unable to connect to port 22 on 10.246.182.211). Number attempts: 1. Retry after 2 seconds.
2024-01-24 18:12:02,282 139089 WARNING [tempest.lib.common.ssh] Failed to establish authenticated ssh connection to ubuntu@10.246.182.211 ([Errno None] Unable to connect to port 22 on 10.246.182.211). Number attempts: 2. Retry after 3 seconds.
2024-01-24 18:12:06,127 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.211 successfully created
2024-01-24 18:12:06,128 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp
2024-01-24 18:12:06,129 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.211:22' as 'ubuntu' with public key authentication
2024-01-24 18:12:06,509 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.211 successfully created
2024-01-24 18:12:07,387 139089 ERROR [tempest.lib.common.utils.linux.remote_client] (TestShelveInstance:test_shelve_volume_backed_instance) Executing command on 10.246.182.211 failed. Error: Command 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; s...

Revision history for this message
Jadon Naas (jadonn) wrote :

In the test run for this bug, the test output shows that the file /tmp/timestamp was created successfully before the instance was shelved. Here is the relevant log output:

2024-01-24 18:07:23,371 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.105 successfully created
2024-01-24 18:07:23,372 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo sh -c "date > /tmp/timestamp; sync"
2024-01-24 18:07:23,372 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.105:22' as 'ubuntu' with public key authentication
2024-01-24 18:07:24,102 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.105 successfully created
2024-01-24 18:07:24,620 139089 DEBUG [tempest.lib.common.utils.linux.remote_client] Remote command: set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /tmp/timestamp
2024-01-24 18:07:24,621 139089 INFO [tempest.lib.common.ssh] Creating ssh connection to '10.246.182.105:22' as 'ubuntu' with public key authentication
2024-01-24 18:07:24,798 139089 INFO [tempest.lib.common.ssh] ssh connection to ubuntu@10.246.182.105 successfully created

It looks like when the server is shelved /tmp/ is being cleared out. I replicated this behavior manually using an OpenStack instance with the image at https://cloud-images.ubuntu.com/releases/focal/release/ubuntu-20.04-server-cloudimg-amd64.img. I made the timestamp, confirmed the file was created, shelved the instance, unshelved the instance, and the timestamp file was gone.

I did see that /tmp is located on the device for cloudimg-rootfs. I think that /tmp is getting cleared out because the cloudimg-rootfs is provisioned from the original image again when unshelving the instance.

Revision history for this message
Billy Olsen (billy-olsen) wrote :

Hmm, I think that the shelve/unshelve test is in error. The /tmp directory cannot be expected to persist across reboots of an instance. https://review.opendev.org/c/openstack/tempest/+/884804 changed the timestamp logic to use the /tmp directory, which works for most if not all other usages of this function but will not for the this test case as the instance is started/rebuilt and most linux systems do not expect /tmp to persist across reboots. Systemd has a unit that does /tmp cleaning for scenarios where /tmp is located on a local disk rather than using tmpfs. Some systems use tmpfs which resides in memory only. Ultimately, this should probably have the timestamp written to an alternate location such as $HOME/timestamp rather than /tmp/timestamp.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tempest (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tempest/+/913757

Changed in tempest:
status: New → In Progress
Revision history for this message
Martin Kopec (mkopec) wrote :

Thank you for investigating and confirming the issue. Let's see if something like this would be acceptable - https://review.opendev.org/c/openstack/tempest/+/913757

Changed in tempest:
assignee: nobody → Martin Kopec (mkopec)
Revision history for this message
Martin Kopec (mkopec) wrote :

I wonder what other location could be used. I tried ~/ but it didn't fly with the upstream jobs - https://review.opendev.org/c/openstack/tempest/+/913757

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.