neutron-tempest-plugin jobs timing out on nested-virt nodes
Bug #1999249 reported by
Slawek Kaplonski
This bug affects 1 person
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
neutron |
Fix Released
|
Critical
|
Unassigned |
Bug Description
It seems that since we moved our nested-virt jobs to Ubuntu 22.04 with patch https:/
In case of such time out (probably) all tests which requires booting of vm and ssh to it are failing as vms seems to not be ready at all in given time, which is pretty long e.g. 900 seconds in some cases.
To post a comment you must log in.
I looked into it and was able to narrow it down further:-
- Happens randomly on jammy nested virt nodes on provider:- vexxhost- ca-ymq- 1(69 out of 150 runs TIMED_OUT), On other providers have not seen failure yet, ovh-gra1(0/65), ovh-bhs1(0/78)[3] type=qemu) [2] test-tool stuck at different step[4] on the affected node, so issue is outside of OpenStack/Nova test-tool with LIBGUESTFS_ BACKEND_ SETTINGS= force_tcg passes on the affected node.
- Guest VMs are not booting properly when the issue happens, from console logs they are just stuck, from one of the logs noticed sometimes vm boots till login prompt[1] but SSH time out, may be it was just slow
- On the affected nodes, guest vms boot fine when running with qemu(virt_
- Running libguestfs-
- Running libguestfs-
May be issue is with some compute nodes in vexxhost-ca-ym1-1 provider as issue is not seen always, need to take infra help to figure out the root cause.
If more data is required from L1 host we can get node on hold by running[0]
Until the root cause is known with the provider temporary we can do [2].
[0] https:/ /review. opendev. org/c/openstack /neutron- tempest- plugin/ +/867609 /860bd4f2227d1d addb95- ca0266261d8d95f 33f1974de7a62fd 54.ssl. cf5.rackcdn. com/866489/ 5/gate/ neutron- tempest- plugin- linuxbridge/ 8decd6e/ tmp74ty0f_ b /review. opendev. org/c/openstack /neutron- tempest- plugin/ +/867320/ 2 /zuul.openstack .org/api/ builds? job_name= neutron- tempest- plugin- openvswitch& job_name= neutron- tempest- plugin- linuxbridge& job_name= neutron- tempest- plugin- openvswitch- iptables_ hybrid& job_name= neutron- tempest- plugin- ovn&result= SUCCESS& limit=225' 2>/dev/null|jq -r .[].log_url); do URL=${log} zuul-info/ inventory. yaml && curl -L ${URL} 2>/dev/null|zgrep provider:;done
[1] https:/
[2] https:/
[3]
for log in $(curl 'https:/
for log in $(curl 'https:/ /zuul.openstack .org/api/ builds? job_name= neutron- tempest- plugin- openvswitch& job_name= neutron- tempest- plugin- linuxbridge& job_name= neutron- tempest- plugin- openvswitch- iptables_ hybrid& job_name= neutron- tempest- plugin- ovn&result= TIMED_OUT& limit=70' 2>/dev/null|jq -r .[].log_url); do URL=${log} zuul-info/ inventory. yaml && curl -L ${URL} 2>/dev/null|zgrep provider:;done
[4] test-tool ******* ******* ******* ******* ******* ******* ******* ******* ** ******* ******* ******* ******* ******* ******* ******* ******* ** /usr/sbin: /usr/bin: /bin:/usr/ local/sbin: /usr/local/ bin sJuHDwY/ scratch1. img" "raw" 104857600 sJuHDwY/ scratch1. img" "format:raw" "cachemode:unsafe"
$ sudo LIBGUESTFS_DEBUG=1 LIBGUESTFS_TRACE=1 libguestfs-
**
* IMPORTANT NOTICE
*
* When reporting bugs, include the COMPLETE, UNEDITED
* output below in your bug report.
*
**
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
libguestfs: trace: set_verbose true
libguestfs: trace: set_verbose = 0
LIBGUESTFS_DEBUG=1
LIBGUESTFS_TRACE=1
PATH=/sbin:
SELinux: sh: 1: getenforce: not found
libguestfs: trace: add_drive_scratch 104857600
libguestfs: trace: get_tmpdir
libguestfs: trace: get_tmpdir = "/tmp"
libguestfs: trace: disk_create "/tmp/libguestf
libguestfs: trace: disk_create = 0
libguestfs: trace: add_drive "/tmp/libguestf
libguestfs: trace: add_drive = 0
libguestfs: trace: add_drive...