scenario10 tempest random tempest failures in check / gate, cloud related

Bug #1861685 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Unassigned

Bug Description

https://0b51250e335a63ae9cd5-42908ff48a299d4a121e7b164480f8bf.ssl.cf5.rackcdn.com/704885/2/gate/tripleo-ci-centos-7-scenario010-standalone/a80439d/logs/undercloud/home/zuul/tempest.log

(cgoncalves) yes, this is a random failure. the amphora (a Nova instance) took way too long to boot (>20 minutes).
https://0b51250e335a63ae9cd5-42908ff48a299d4a121e7b164480f8bf.ssl.cf5.rackcdn.com/704885/2/gate/tripleo-ci-centos-7-scenario010-standalone/a80439d/logs/undercloud/var/log/containers/octavia/worker.log
This sort of problem could be mitigated if VMs would run on KVM (nested virtualization) instead of QEMU/TCG
Vexxhost, OVH, Fortnebula have nest virt enabled but job is configured to use QEMU because RAX nodepools do not have nest virt enabled. We could workaround that, ping cgoncalves

not sure, but looks as octavia tests issue
in how it registers resources to cleanup? actually seems that cleanup tries to delete the resource but it's unable since it is in PENDING_CREATE state, and then it's deps also fail to be deleted as in-use
  a) test timedout to reach some resource (LB) in ACTIVE
  b) then in cleanup some other resource (not sure what flavors are in octavia) wants to be deleted but cannot since in use (by the resource from a))
2020-01-31 07:03:53 | Body: {"loadbalancer": {"provider": "octavia", ... "provisioning_status": "PENDING_CREATE", ... "flavor_id": "00631b21-a6f0-46f6-8ce5-fcb2c7157330", ... "id": "72df51a3-1b74-476e-a7c8-cf4fbea943bb", ... }}
2020-01-31 07:03:53 | Details: {u'debuginfo': None, u'faultcode': u'Client', u'faultstring': u'Flavor 00631b21-a6f0-46f6-8ce5-fcb2c7157330 is in use and cannot be modified.'}, <traceback object at 0x7f8c362db2d8>), (<class 'tempest.lib.exceptions.Conflict'>, Conflict with state of target resource

but no idea what that second 'failed-to-cleanup' resource (flavor) is ... no other mention of that id there
so if octavia really cannot handle deletion of resources in 'pending_create' state, cleanup will there need to wait for it? [potentially indefinitelly||leave resource behind ... could be bug in octavia too? or it should never get stuck there ehm?]

Tags: alert tempest
Revision history for this message
chandan kumar (chkumar246) wrote :

After 31st jan, the job is passing http://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-7-scenario010-standalone&pipeline=gate

Still needed, we can look for a proper fix

Revision history for this message
chandan kumar (chkumar246) wrote :

copying the conversation from IRC
<cgoncalves> chkumar|rover, hey. so the failure was because Nova took too long to boot a VM. it is not an Octavia issue per se but impact its tests
* dsneddon has quit (Ping timeout: 268 seconds)
<cgoncalves> chkumar|rover, my suggestion is to set libvirt type=kvm, cpu_mode=host-passthrough whenever possible (OVH, vexxhost, limestone, fortnebula) and fall back to libvirt type=qemu when not (rackspace)
<cgoncalves> devstack does this today and there is a patch to further improve it: https://review.opendev.org/#/c/703324/
<cgoncalves> a follow-up patch in octavia side is https://review.opendev.org/#/c/702921/
<cgoncalves> more context in the commit message
<chkumar|rover> cgoncalves, great, I will take a look ont hat
<cgoncalves> chkumar|rover, these two patches are for performance improvements. we'd still need to check what happened in Nova that made the VM not boot up in like +20 minutes

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/705638

Revision history for this message
chandan kumar (chkumar246) wrote :

Based on tht code
tripleo-heat-templates/ci/environments on  sceanrio10 took 12m19s
❯ git grep 'nova::compute::libvirt::services::libvirt_virt_type'
multinode-3nodes-registry.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
ovb-ha.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario001-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario002-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario003-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario004-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario007-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario010-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario012-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu

It is enabled in multinode jobs, I am not sure putting these setting in standalone is going to work.

Revision history for this message
chandan kumar (chkumar246) wrote :
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-2 → ussuri-3
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Chandan Kumar (raukadah) (<email address hidden>) on branch: master
Review: https://review.opendev.org/705638
Reason: Not working on it currently, so abandoning it

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.