Neturon vm cold migrate with crypto VF can't reach VERIFY_RESIZE status

Bug #1888171 reported by Yvonne Ding
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
High
Austin Sun

Bug Description

Brief Description
-----------------
Boot a vm successfully and become active with crypto VF. When perform cold migrate vm from compute node, the status can't reach to VERIFY_RESIZE and timed out.

Severity
--------
Major

Steps to Reproduce
------------------
1. Boot a vm with pci-sriov nics and flavor flvaor_qat_vf_1
2. Cold migrate above vm from compute node
3. Wait for vm status change to VERIFY_RESIZE

TC-name:
test_ea_vm_with_crypto_vfs

Expected Behavior
-----------------
VM status has reached to VERIFY_RESIZE status by cold migrate

Actual Behavior
----------------
VM status keeps to ACTIVE status by cold migrate

Reproducibility
---------------
reproducible

System Configuration
--------------------
Regular standard 2+2

Lab-name:
wcp_7_10

Branch/Pull Time/Commit
-----------------------
BUILD_ID="r/stx.4.0"

Timestamp/Logs
--------------
[2020-07-17 03:13:02,799] 1535 INFO MainThread vm_helper.cold_migrate_vm:: Cold migrating VM f83041ba-8b16-4c0e-81d5-0ba7f19c8133 from compute-1...
[2020-07-17 03:13:02,799] 1649 DEBUG MainThread ssh.get_active_controller:: Getting active controller client for wcp_7_10
[2020-07-17 03:13:02,800] 479 DEBUG MainThread ssh.exec_cmd:: Executing command...
[2020-07-17 03:13:02,800] 314 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://keystone.openstack.svc.cluster.local/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne migrate --poll f83041ba-8b16-4c0e-81d5-0ba7f19c8133'
[2020-07-17 03:13:11,417] 436 DEBUG MainThread ssh.expect :: Output:

Server migrating... 0% complete
Server migrating... 100% complete
Finished
[sysadmin@controller-0 ~(keystone_admin)]$
[2020-07-17 03:13:11,418] 314 DEBUG MainThread ssh.send :: Send 'echo $?'
[2020-07-17 03:13:11,521] 436 DEBUG MainThread ssh.expect :: Output:
0
[sysadmin@controller-0 ~(keystone_admin)]$
[2020-07-17 03:13:11,522] 1544 INFO MainThread vm_helper.cold_migrate_vm:: Waiting for VM status change to VERIFY_RESIZE

......
[2020-07-17 03:18:12,226] 61 DEBUG MainThread conftest.update_results:: ***Failure at test call: /home/svc-cgcsauto/wassp-repos.new/testcases/cgcs/CGCSAuto/keywords/vm_helper.py:1816: utils.exceptions.VMTimeout: VM operation timed out.
***Details: _flavors = {'flavor_none': '8224f108-ddf9-4d43-a93f-03d560e7de0c', 'flavor_qat_vf_1': '2bf5f478-e897-4c82-8b24-00a1d1c7b70d', 'flavor_qat_vf_32': '176b2373-a911-48bc-ad37-851d1b1013d4', 'flavor_qat_vf_33': 'bcf8a88c-4d00-46a6-9914-856eedc76321', ...}
hosts_pci_device_info = {'compute-1': [{'device_id': '0435', 'device_name': 'dh895xcc', 'pci_address': '0000:09:00.0', 'pci_alias': 'qat-dh895xcc-vf', ...}]}

......
> raise exceptions.VMTimeout(err_msg)
E utils.exceptions.VMTimeout: VM operation timed out.
E Details: Timed out waiting for vm status: ['VERIFY_RESIZE', 'ERROR']. Actual vm status: ACTIVE

logs of .tar and TIS_AUTOMATION.log as below,
https://files.starlingx.kube.cengn.ca/launchpad/1888171

Test Activity
-------------

Yvonne Ding (yding)
description: updated
Revision history for this message
Frank Miller (sensfan22) wrote :

TC fails and may be related to recent upversion to Ussuri. Marking as stx.4.0 gating for now as cold migration of VMs are main functionality. Assigning to distro.openstack PL to determine how to proceed.

Changed in starlingx:
status: New → Triaged
importance: Undecided → High
tags: added: stx.4.0 stx.distro.openstack
Changed in starlingx:
assignee: nobody → yong hu (yhu6)
Austin Sun (sunausti)
Changed in starlingx:
assignee: yong hu (yhu6) → Austin Sun (sunausti)
Revision history for this message
Austin Sun (sunausti) wrote :
Download full text (3.4 KiB)

[2020-07-17 03:10:06,434] Booting VM tenant1-vm_with_pci_device-1 "pci_passthrough:alias": "qat-dh895xcc-vf:1"
[2020-07-17 03:10:43,146] tenant1-vm_with_pci_device-1 is running (tenant1-mgmt-net=192.168.121.80)
[2020-07-17 03:10:53,469] ssh connect to 192.168.121.80 and check QAT devices
[2020-07-17 03:13:02,799] start code migration tenant1-vm_with_pci_device-1
[2020-07-17 03:13:11,522] check status and Wait VM status change to VERIFY_RESIZE
[2020-07-17 03:18:12,226] test script return failure

The Nova compute log from compute-1 is starting from 2020-07-17T03:21:04 , so it is useless for this issue.

From the test script collect :
compute-0:
[2020-07-17 03:19:17,251] 314 DEBUG MainThread ssh.send :: Send 'system --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.144.1:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-endpoint-type internalURL --os-region-name RegionOne host-device-list --nowrap compute-0'
[2020-07-17 03:19:18,900] 436 DEBUG MainThread ssh.expect :: Output:
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_08_00_0 | 0000:08:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
compute-1:
[2020-07-17 03:08:16,699] 436 DEBUG MainThread ssh.expect :: Output:
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| name | address | class id | vendor id | device id | class name | vendor name | device name | numa_node | enabled |
+------------------+--------------+----------+-----------+-----------+---------------------------+---------------------------------+----------------------------------------+-----------+---------+
| pci_0000_09_00_0 | 0000:09:00.0 | 0b4000 | 8086 | 0435 | Co-processor | Intel Corporation | DH895XCC Series QAT | 0 | True |
| pci_0000_0c_00_0 | 0000:0c:00.0 | 030000 | 102b | 0522 | VGA compatible controller | Matrox Electronics Systems Ltd. | MGA G200e [Pilot] ServerEngines (SEP1) | 0 | True |
+------------------+--------------+----------+-----------+-----------+-----------...

Read more...

Changed in starlingx:
status: Triaged → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.