Live migration fails intermittently for boot-from-volume VM with volume stuck at attaching state even after vm deletion

Bug #1904594 reported by Elena Taivan
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Elena Taivan

Bug Description

Brief Description
-----------------

Several live-migration tests failed silently (vm did not move to other host), these vms are booted from cinder volume, and volumes were observed to stuck at attaching status even after vm deletion.

Severity
--------
Major

Steps to Reproduce
------------------

Create a dedicated vm with 2vcpu, 2G disk, 1G ram, and boot from cinder volume using tis-centos-guest
Live migrate the vm

Expected Behavior
------------------

Live migration passes

Actual Behavior
----------------

Live migration fails intermittently with boot-from-volume vm

Reproducibility
---------------
Intermittent - there are at least 15 other live-migration with cinder volume were successful

System Configuration
--------------------
Duplex

Last Pass
---------

Same lab/load

Timestamp/Logs
--------------

[2020-06-13 07:31:41,425] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne flavor create --ram=1024 --disk=2 --vcpus=2 cpu_pol'
[2020-06-13 07:31:43,547] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne flavor set --property hw:mem_page_size=large b4cb31a6-78c0-49f2-9bf2-d282479e3274'

[2020-06-13 07:34:51,875] 314 DEBUG MainThread ssh.send :: Send 'nova --os-username 'tenant1' --os-password 'Li69nux*' --os-project-name tenant1 --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne boot --poll --boot-volume=132f2598-c3e5-46ce-9997-5a8ce490ba12 --flavor=b4cb31a6-78c0-49f2-9bf2-d282479e3274 --key-name=keypair-tenant1 --nic net-id=d2bf89cc-e1f9-4836-854d-cf5cdc96e752 --nic net-id=6d5d3278-4b84-4dea-b9b8-dabfdc70c31e tenant1-cpu_pol_dedicated_2-12'

[2020-06-13 07:39:00,575] 314 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne live-migration d1c30271-39d5-432d-938a-b7dd33efbc22'

[2020-06-13 07:41:17,587] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne server list --a'

[sysadmin@controller-0 ~(keystone_admin)]$

 [2020-06-13 07:41:19,725] 314 DEBUG MainThread ssh.send :: Send 'openstack --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-identity-api-version 3 --os-interface internal --os-region-name RegionOne volume list --a --long'
[2020-06-13 07:41:21,650] 436 DEBUG MainThread ssh.expect :: Output:
----------------------------------------------+-----------------------------------------------------------

ID Name Status Size Type Bootable Attached to Properties
----------------------------------------------+-----------------------------------------------------------

132f2598-c3e5-46ce-9997-5a8ce490ba12 cpu_pol-5 attaching 2 ceph-store true
----------------------------------------------+-----------------------------------------------------------
[sysadmin@controller-0 ~(keystone_admin)]$

Test Activity
-------------
Regression Testing

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to config (master)

Reviewed: https://review.opendev.org/763044
Committed: https://opendev.org/starlingx/config/commit/a37d5c2da6acd9cee1bf499eface0ff891eea41f
Submitter: Zuul
Branch: master

commit a37d5c2da6acd9cee1bf499eface0ff891eea41f
Author: Elena Taivan <email address hidden>
Date: Tue Nov 17 16:53:56 2020 +0000

    Fix live-migration for boot-from-volume VM

    The reason why live-migration is failing on AIO-DX
    is because cinder-volume is hanging while running
    'ceph mon dump' command.

    'ceph mon dump' command reads ceph.conf file to get the
    addresses of all the monitors. It leads to delays
    if those addresses don't point to valid monitors.

    On AIO-DX setup, ceph.conf had 3 monitors and this
    caused delays of up to 1 minute when 'ceph mon dump'
    was executed inside cinder pod.

    To get the correct response time for 'ceph mon dump'
    in the cinder pod, a single monitor pointing to the floating
    IP should be used for AIO-DX.

    Closes-bug: 1904594
    Change-Id: I70ecd5c468383358f7566061b65bf11917fe904a
    Signed-off-by: Elena Taivan <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Elena Taivan (etaivan)
tags: added: stx.5.0 stx.config stx.distro.openstack
Changed in starlingx:
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.