test_cold_migrate_unshelved_instance failing with cat: can't open '/mnt/timestamp': No such file or directory

Bug #1906428 reported by Lee Yarwood on 2020-12-01
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Alexandre arents
tempest
Undecided
Unassigned

Bug Description

https://zuul.opendev.org/t/openstack/build/13400ea7d7af4dd88fca244b82301c79/log/job-output.txt#65297

2020-12-01 11:03:11.055150 | controller | 2020-12-01 10:52:58,178 102645 ERROR [tempest.lib.common.utils.linux.remote_client] (TestShelveInstance:test_cold_migrate_unshelved_instance) Executing command on 172.24.5.37 failed. Error: Command 'set -eu -o pipefail; PATH=$PATH:/sbin:/usr/sbin; sudo cat /mnt/timestamp', exit status: 1, stderr:
2020-12-01 11:03:11.055160 | controller | cat: can't open '/mnt/timestamp': No such file or directory

Add related test to Bug #1732428
https://review.opendev.org/c/openstack/tempest/+/743708

Changed in tempest:
status: New → Confirmed
Balazs Gibizer (balazs-gibizer) wrote :

It seems it fails all the time https://zuul.opendev.org/t/openstack/builds?job_name=nova-multi-cell&branch=master So I making this critical as blocking the nova CI

Changed in nova:
status: New → Confirmed
importance: Undecided → Critical
tags: added: gate-failure
Balazs Gibizer (balazs-gibizer) wrote :
Changed in nova:
status: Confirmed → In Progress
Lee Yarwood (lyarwood) wrote :

This looks more like an issue with cross cell resize.

The following pastebin shows example qemu-img commands from a failing run where the final cold migration / resize ends up using the original image with a fresh overlay, instead of the snapshot disk from the source host:

http://paste.openstack.org/show/800637/

I'm assuming that we've not passed the snapshot_id correctly as part of the cross cell resize:

https://github.com/openstack/nova/blob/f0efcae6975a99044ef7052453f905f60fcecac6/nova/compute/manager.py#L5906

Skipping the test for now and adding a DNM debug change to troubleshoot this more.

Lee Yarwood (lyarwood) wrote :

https://review.opendev.org/c/openstack/nova/+/765141 skips the test in the nova-multi-cell job.

Ghanshyam Mann (ghanshyammann) wrote :

either we can disable it explicitly in nova-cell-job or disable it in devstack for this job but we need to add job var for that https://review.opendev.org/c/openstack/nova/+/765141

Alexandre arents (aarents) wrote :

Agree with Lee that it is more a bug in nova:
https://review.opendev.org/c/openstack/nova/+/765561
And tempest job is correct and reveal the issue.

Changed in nova:
assignee: nobody → Alexandre arents (aarents)
Balazs Gibizer (balazs-gibizer) wrote :

The disablement of the test is merged https://review.opendev.org/c/openstack/nova/+/765141

Changed in nova:
importance: Critical → High
Martin Kopec (mkopec) wrote :

gerrit doesn't update status of the bugs automatically again ... this is supposed to be fixed for nova by https://review.opendev.org/c/openstack/nova/+/765561

Changed in nova:
status: In Progress → Fix Released
Martin Kopec (mkopec) wrote :

Based on the discussion above, it was agreed that the bug was on nova side (got fixed already) so marking this as Invalid for Tempest .. feel free to correct me

Changed in tempest:
status: Confirmed → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers