Live migration of UEFI booted instance failing unexpectedly

Bug #1792999 reported by Wendy Mitchell
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Undecided
Unassigned
StarlingX
Invalid
Medium
YU CHENGDE

Bug Description

Brief Description
-----------------
Live migration of UEFI booted instance failing unexpectedly (rollsback)

Severity
--------
Major

Steps to Reproduce
------------------
1: Get/Create ge_edge image
2: Create a flavor with 1 vcpu
3: Add following extra specs: {'hw:cpu_policy': 'dedicated'}
4: Create a volume from ge_edge image
5: Boot a ge_edge VM with above flavor from volume
6. Check topology of vm b69eb997-0477-43f5-a2dc-6840e13cadcd on controller, hypervisor and vm
7: Live migrate ge_edge VM
[2018-09-15 21:38:21,722]
8: Ping vm from NatBox after live migration
9: Check topology of vm b69eb997-0477-43f5-a2dc-6840e13cadcd on controller, hypervisor and vm
10: Cold migrate vm and check vm is moved to different host
11: Ping vm from NatBox after cold migration
[2018-09-15 21:40:10,847]
12: Check topology of vm b69eb997-0477-43f5-a2dc-6840e13cadcd on controller, hypervisor and vm
[2018-09-15 21:40:43,316]
13: Swact active controller
[2018-09-15 21:41:05,746]
14: Ensure ge_edge vm can still be live-migrated after swact
[2018-09-15 21:42:36,155] 262 DEBUG MainThread ssh.send :: Send 'nova --os-username 'admin' --os-password 'Li69nux*' --os-project-name admin --os-auth-url http://192.168.204.2:5000/v3 --os-user-domain-name Default --os-project-domain-name Default --os-region-name RegionOne live-migration b69eb997-0477-43f5-a2dc-6840e13cadcd'

Expected Behavior
------------------
Expected live migration success in step 14

Actual Behavior
----------------
[2018-09-15 21:43:26,974] 53 DEBUG MainThread conftest.update_results::
***Failure at test call: /home/svc-cgcsauto/wassp-repos.new/testcases/cgcs/CGCSAuto/keywords/vm_helper.py:1049: utils.exceptions.VMPostCheckFailed: Check failed post VM operation.

instance: b69eb997-0477-43f5-a2dc-6840e13cadcd
id=instance-00000063, name=tenant2-ge_edge-dedicated-migrate-120

instance is on compute-0 and is attempting to live migrate to compute-2 but the migration unexpectedly aborts
see nova-conductor.log
2018-09-15 21:42:41.042 84948 INFO nova.conductor.tasks.live_migrate [req-5f9433b4-ff15-4886-8dcb-b20c88798125 1476c2312b6144a9ae513a66b4666a14 a7f0ca6d827847c883e47b00a2e5cb4c - default default] Live migrating instance b69eb997-0477-43f5-a2dc-6840e13cadcd: source:compute-0 dest:compute-2

2018-09-15 21:42:47.317 54712 ERROR nova.virt.libvirt.driver [req-5f9433b4-ff15-4886-8dcb-b20c88798125 1476c2312b6144a9ae513a66b4666a14 a7f0ca6d827847c883e47b00a2e5cb4c - default default] [instance: b69eb997-0477-43f5-a2dc-6840e13cadcd] Live Migration failure: Failed to open file '/etc/nova/instances/b69eb997-0477-43f5-a2dc-6840e13cadcd/instance-00000063_VARS.fd': No such file or directory

2018-09-15 21:42:47.573 54712 WARNING nova.compute.manager [req-92548739-631c-4ddc-bc13-d3b535690142 20fad0edb889469cab36e8977901c9cd 29b6c88a54a14ca8a2ff424dc0e7c1f3 - default default] [instance: b69eb997-0477-43f5-a2dc-6840e13cadcd] Received unexpected event network-vif-unplugged-b3fff5d5-6f44-48b0-8789-e5cc6592bf0d for instance
2018-09-15 21:42:47.737 54712 ERROR nova.virt.libvirt.driver [req-5f9433b4-ff15-4886-8dcb-b20c88798125 1476c2312b6144a9ae513a66b4666a14 a7f0ca6d827847c883e47b00a2e5cb4c - default default] [instance: b69eb997-0477-43f5-a2dc-6840e13cadcd] Migration operation has aborted

nova-compute.log (on destination compute-2) - reports rollback
2018-09-15 21:42:49.506 60988 WARNING nova.compute.manager [req-952fcd77-943e-4077-98b6-0efdacc3bc88 20fad0edb889469cab36e8977901c9cd 29b6c88a54a14ca8a2ff424dc0e7c1f3 - default default] [instance: b69eb997-0477-43f5-a2dc-6840e13cadcd] Received unexpected event network-vif-plugged-b3fff5d5-6f44-48b0-8789-e5cc6592bf0d for instance
2018-09-15 21:42:50.504 60988 INFO nova.compute.manager [req-5f9433b4-ff15-4886-8dcb-b20c88798125 1476c2312b6144a9ae513a66b4666a14 a7f0ca6d827847c883e47b00a2e5cb4c - default default] [instance: b69eb997-0477-43f5-a2dc-6840e13cadcd] Rollback live-migration at destination.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
2 + X node

Branch/Pull Time/Commit
-----------------------
Master as of date: 2018-09-16_21-38-00

Timestamp/Logs
--------------
see above logs

Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Ghada Khalil (gkhalil) wrote :

Targeting stx.2019.03 as this is not a common guest use-case.
Note: this is a nova upstream bug. A nova bug needs to be opened and the fix proposed there.
Once that is done, the fix can be pushed to stx-staging until we rebase to a new version of openstack which includes the nova fix.

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Daniel Chavolla (dchavoll)
tags: added: stx.2019.03
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Daniel Chavolla (dchavoll) → Jack Ding (jackding)
Ken Young (kenyis)
tags: added: stx.2019.05
removed: stx.2019.03
Revision history for this message
Frank Miller (sensfan22) wrote :

This is an OpenStack nova issue tracked by https://bugs.launchpad.net/nova/+bug/1785123

Ken Young (kenyis)
tags: added: stx.2.0
removed: stx.2019.05
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: Jack Ding (jackding) → nobody
tags: added: stx.helpwanted
tags: removed: stx.helpwanted
Changed in starlingx:
assignee: nobody → Frank Miller (sensfan22)
Ghada Khalil (gkhalil)
tags: added: stx.retestneeded
Bruce Jones (brucej)
Changed in starlingx:
assignee: Frank Miller (sensfan22) → Bruce Jones (brucej)
Revision history for this message
yong hu (yhu6) wrote :

The default plan is to rebase stein before RC1 or cherry pick to Stx-Nova.
Chris is the owner of that patch.

yong hu (yhu6)
Changed in starlingx:
assignee: Bruce Jones (brucej) → Yao Zhou (yzuzhouyao-x)
yong hu (yhu6)
Changed in starlingx:
assignee: Yao Zhou (yzuzhouyao-x) → Shuquan Huang (shuquan)
yao (yaozhou)
Changed in starlingx:
assignee: Shuquan Huang (shuquan) → yao (yaozhou)
Revision history for this message
yong hu (yhu6) wrote :

seeing @Yao is working on a nova patch https://review.opendev.org/#/c/621646/ for https://bugs.launchpad.net/nova/+bug/1785123, which is a duplication of this current LP.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

As per agreement with the community, moving all unresolved medium priority bugs from stx.2.0 to stx.3.0

tags: added: stx.3.0
removed: stx.2.0
Revision history for this message
yong hu (yhu6) wrote :

@chengde, please help check whether this issue still exists in Nova with "Train" version

Changed in starlingx:
assignee: yao (yaozhou) → YU CHENGDE (chant)
Revision history for this message
Peng Peng (ppeng) wrote :

Test Result for: testcases/functional/nova/test_migrate_vms.py::test_migrate_vm_various_guest[ge_edge-1-1024-shared-image] - Test Passed

Issue was not reproduced on train
2019-11-21_20-00-00
wcp_3-6

Revision history for this message
Ghada Khalil (gkhalil) wrote :

@Yong Hu, Is it possible this was addressed in nova train? The associated nova bug seems to still be open.

Revision history for this message
yong hu (yhu6) wrote :

@chengde, you mentioned this issue was not observed in recent StarlingX with OS Train.
Please post your update here.

Revision history for this message
YU CHENGDE (chant) wrote :

After doing live migration with uefi mode instance in NOVA on train,
Issue didn't happened.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Closing. As per notes above, the issue is no longer reproduced with openstack train.

Changed in starlingx:
status: Triaged → Invalid
tags: removed: stx.retestneeded
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Closing the nova side of the bug as it is originate from starlingx but they cannot reproduce it any more.

Changed in nova:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.