unable to migrate vm with attached volumes

Bug #1665407 reported by Victor Galkin
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nova-powervm
Expired
Undecided
Unassigned

Bug Description

mitaka version of powervm
nvram_store=swift
cinder backend: storwize v7000

Resize vm with attached volume (to other host) leads to error:
Message
Instance rollback performed due to: Unable to rebuild virtual machine on new host. Error is The device with UDID 01M0lCTTIxNDUxMjQ2MDA1MDc2ODAyODExMDRGMjAwMDAwMDAwMDAwMDY5NA== was not found on any of the Virtual I/O Servers.
Code
500
Details
File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 375, in decorated_function return function(self, context, *args, **kwargs) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4055, in finish_resize self._set_instance_obj_error_state(context, instance) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4043, in finish_resize disk_info, image_meta) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4008, in _finish_resize old_instance_type) File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 220, in __exit__ self.force_reraise() File "/usr/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise six.reraise(self.type_, self.value, self.tb) File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 4003, in _finish_resize block_device_info, power_on) File "/usr/lib/python2.7/dist-packages/nova_powervm/virt/powervm/driver.py", line 1357, in finish_migration raise exception.InstanceFaultRollback(e)
Created
Feb. 16, 2017, 4:26 p.m.

Revision history for this message
Victor Galkin (vicglarson) wrote :
Download full text (75.7 KiB)

2017-02-16 19:25:22.530 29974 DEBUG nova.compute.manager [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Stashing vm_state: active _prep_resize /usr/lib/python2.7/dist-packages/nova/compute/manager.py:3747

2017-02-16 19:25:22.685 29974 DEBUG oslo_concurrency.lockutils [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] Lock "compute_resources" acquired by "nova.compute.resource_tracker.resize_claim" :: waited 0.000s inner /usr/lib/python2.7/dist-packages/oslo_concurrency/lockutils.py:273

2017-02-16 19:25:22.709 29974 DEBUG nova.compute.resource_tracker [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] Memory overhead for 8192 MB instance; 0 MB _move_claim /usr/lib/python2.7/dist-packages/nova/compute/resource_tracker.py:279

2017-02-16 19:25:22.732 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Attempting claim: memory 8192 MB, disk 20 GB, vcpus 4 CPU

2017-02-16 19:25:22.732 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Total memory: 2097152 MB, used: 51712.00 MB

2017-02-16 19:25:22.732 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] memory limit: 2097152.00 MB, free: 2045440.00 MB

2017-02-16 19:25:22.733 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Total disk: 16383 GB, used: 220.00 GB

2017-02-16 19:25:22.733 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] disk limit: 16383.00 GB, free: 16163.00 GB

2017-02-16 19:25:22.734 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Total vcpu: 48 VCPU, used: 25.00 VCPU

2017-02-16 19:25:22.734 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] vcpu limit: 384.00 VCPU, free: 359.00 VCPU

2017-02-16 19:25:22.735 29974 INFO nova.compute.claims [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880e6a32ef5adc 55c738a50140429493b2ab75a02ea6e7 - - -] [instance: b8d20bb5-a024-4cca-93e0-6c176b5823d1] Claim successful

2017-02-16 19:25:22.777 29974 INFO nova.compute.resource_tracker [req-77f4a5dc-01b4-4636-87fd-89954be65786 06134a41cd1b4ba583880...

Revision history for this message
Victor Galkin (vicglarson) wrote :

lol, perhaps it's here:

vif.plug() -> slot_mgr.build_map() -> slot_mgt.init_recreate_map() -> slot_mgr._pv_vscsi_vol_to_vio() -> vol_drv.is_volume_on_vios()

self.is_rebuild = (self.adapter and vol_drv_iter)

Mb it tries to find volumes when plug vif before tf_stg.ConnectVolume() task.

will try tomorrow to change task order.

Revision history for this message
Victor Galkin (vicglarson) wrote :

lel, it seems as expected behavior.
help

Revision history for this message
Drew Thorstensen (thorst) wrote :

Hey Victor.

So I must admit I'm a bit confused with this one.

The issue here appears that the volume with that UDID is not on the target VIOS. But we should see that the volume gets added to the target server via this method: https://github.com/openstack/nova/blob/stable/mitaka/nova/compute/manager.py#L3991-L3992

In IRC you noted that we can't take a peak at the env, so I will need to ask you to help us with the debug a bit.

Can you print out the block_device_info there prior to the run? I'm thinking that perhaps the volume UDIDs changed as part of the rebuild operation. If you print out the block_device_info, we'll be able to cross reference that with the new UDID from the error.

So we need both, a print out of the block_device_info and the error message (because that will have an updated UDID).

Thanks for reporting this and helping us debug!

Revision history for this message
Alexey 'Armenelruth' V.A. (armenelruth) wrote :

Drew,

we cannot copy+paste, because system is totally locked in DMZ.
but I can make a quote manually, like:

pvmctl cluster list has udid=<many chars>BGQQ== on both hosts
and nova wants <the same many>ZBMg== (as I can see in error message from nova show VMID)

Revision history for this message
Drew Thorstensen (thorst) wrote :

Thanks Alexey. So that definitely indicates the issues. This is very odd. Let me query our storage team to understand why that volume UDID is changing. That fundamentally seems wrong that the UDID (Universal Disk ID) changes...

Revision history for this message
Drew Thorstensen (thorst) wrote :

So I talked to a storage Cinder expert on this. It is very odd to both of us that the UDID is changing. We'd really like to know why that is but without access to the env, it seems unlikely that we can find it.

I'm thinking what we could do is introduce a config option that lets you just get the NVRAM bits of the slot_store, but not the slots.

This has some issues though. Your resize/rebuild won't necessarily be 100% correct, and it could confuse workloads when they're rebuilt.

I'm going to confer with the team on this in IRC.

Revision history for this message
Victor Galkin (vicglarson) wrote :

Drew, I'm not sure Alexey provided correct explanation.
He shown 'pvmctl cluster list' output. I think he compared ssp volume with cinder volume.

We need some time (no access now) to recheck it.

Revision history for this message
Eric Fried (efried) wrote :

Any update on the recreate for this?

Changed in nova-powervm:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for nova-powervm because there has been no activity for 60 days.]

Changed in nova-powervm:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.