tripleo

scenario10 tempest random tempest failures in check / gate, cloud related

Bug #1861685 reported by wes hayutin on 2020-02-03

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	High	Unassigned	tripleo ussuri-3 "tripleo ussuri-3"

Bug Description

https://0b51250e335a63ae9cd5-42908ff48a299d4a121e7b164480f8bf.ssl.cf5.rackcdn.com/704885/2/gate/tripleo-ci-centos-7-scenario010-standalone/a80439d/logs/undercloud/home/zuul/tempest.log

(cgoncalves) yes, this is a random failure. the amphora (a Nova instance) took way too long to boot (>20 minutes).
https://0b51250e335a63ae9cd5-42908ff48a299d4a121e7b164480f8bf.ssl.cf5.rackcdn.com/704885/2/gate/tripleo-ci-centos-7-scenario010-standalone/a80439d/logs/undercloud/var/log/containers/octavia/worker.log
This sort of problem could be mitigated if VMs would run on KVM (nested virtualization) instead of QEMU/TCG
Vexxhost, OVH, Fortnebula have nest virt enabled but job is configured to use QEMU because RAX nodepools do not have nest virt enabled. We could workaround that, ping cgoncalves

not sure, but looks as octavia tests issue
in how it registers resources to cleanup? actually seems that cleanup tries to delete the resource but it's unable since it is in PENDING_CREATE state, and then it's deps also fail to be deleted as in-use
a) test timedout to reach some resource (LB) in ACTIVE
b) then in cleanup some other resource (not sure what flavors are in octavia) wants to be deleted but cannot since in use (by the resource from a))
2020-01-31 07:03:53 | Body: {"loadbalancer": {"provider": "octavia", ... "provisioning_status": "PENDING_CREATE", ... "flavor_id": "00631b21-a6f0-46f6-8ce5-fcb2c7157330", ... "id": "72df51a3-1b74-476e-a7c8-cf4fbea943bb", ... }}
2020-01-31 07:03:53 | Details: {u'debuginfo': None, u'faultcode': u'Client', u'faultstring': u'Flavor 00631b21-a6f0-46f6-8ce5-fcb2c7157330 is in use and cannot be modified.'}, <traceback object at 0x7f8c362db2d8>), (<class 'tempest.lib.exceptions.Conflict'>, Conflict with state of target resource

but no idea what that second 'failed-to-cleanup' resource (flavor) is ... no other mention of that id there
so if octavia really cannot handle deletion of resources in 'pending_create' state, cleanup will there need to wait for it? [potentially indefinitelly||leave resource behind ... could be bug in octavia too? or it should never get stuck there ehm?]

Tags:

Revision history for this message

chandan kumar (chkumar246) wrote on 2020-02-04:

#1

After 31st jan, the job is passing http://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-7-scenario010-standalone&pipeline=gate

Still needed, we can look for a proper fix

Revision history for this message

chandan kumar (chkumar246) wrote on 2020-02-04:

#2

copying the conversation from IRC
<cgoncalves> chkumar|rover, hey. so the failure was because Nova took too long to boot a VM. it is not an Octavia issue per se but impact its tests
* dsneddon has quit (Ping timeout: 268 seconds)
<cgoncalves> chkumar|rover, my suggestion is to set libvirt type=kvm, cpu_mode=host-passthrough whenever possible (OVH, vexxhost, limestone, fortnebula) and fall back to libvirt type=qemu when not (rackspace)
<cgoncalves> devstack does this today and there is a patch to further improve it: https://review.opendev.org/#/c/703324/
<cgoncalves> a follow-up patch in octavia side is https://review.opendev.org/#/c/702921/
<cgoncalves> more context in the commit message
<chkumar|rover> cgoncalves, great, I will take a look ont hat
<cgoncalves> chkumar|rover, these two patches are for performance improvements. we'd still need to check what happened in Nova that made the VM not boot up in like +20 minutes

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-02-04: Related fix proposed to tripleo-heat-templates (master)

#3

Related fix proposed to branch: master
Review: https://review.opendev.org/705638

Revision history for this message

chandan kumar (chkumar246) wrote on 2020-02-04:

#4

Based on tht code
tripleo-heat-templates/ci/environments on  sceanrio10 took 12m19s
❯ git grep 'nova::compute::libvirt::services::libvirt_virt_type'
multinode-3nodes-registry.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
ovb-ha.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario001-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario002-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario003-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario004-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario007-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario010-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu
scenario012-multinode-containers.yaml: nova::compute::libvirt::services::libvirt_virt_type: qemu

It is enabled in multinode jobs, I am not sure putting these setting in standalone is going to work.

Revision history for this message

chandan kumar (chkumar246) wrote on 2020-02-04:

#5

It can be manipulated in standalone side https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/standalone/templates/standalone_config.yaml.j2#L32 by setting standalone_libvirt_type and take a look at devstack hack https://github.com/openstack/devstack/blob/master/lib/nova#L253-L267

wes hayutin (weshayutin) on 2020-02-10

Changed in tripleo:
milestone:	ussuri-2 → ussuri-3

Revision history for this message

wes hayutin (weshayutin) wrote on 2020-02-12:

#6

http://zuul.openstack.org/builds?job_name=tripleo-ci-centos-7-scenario010-standalone

Changed in tripleo:
status:	Triaged → Fix Released

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-07: Change abandoned on tripleo-heat-templates (master)

#7

Change abandoned by Chandan Kumar (raukadah) (<email address hidden>) on branch: master
Review: https://review.opendev.org/705638
Reason: Not working on it currently, so abandoning it

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.