Issues with Nova Evacuate API

Bug #1362244 reported by Majid
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
jichenjc

Bug Description

We have deployed OpenStack Icehouse with Legacy Networking setup and we are building our instances on shared storage (NFS). We want to protect VMs when a host (compute node) fails. When we use Nova Evacuate API with "--on-shared-storage" option two issues happens:
1- Sometimes the evacuated VM will come up on new host but it will be re-built from the base image.
2- Sometimes the instance will not come up at all. For this case if we use "nova reboot --hard <VM>" the VM will come up on the new host.

Tags: compute
Tracy Jones (tjones-i)
tags: added: compute
Revision history for this message
melanie witt (melwitt) wrote :

Hi, thanks for your bug report. We need additional information to help diagnose the issue.

On 1, what is the behavior you expect when the VM comes up on the new host, other than being rebuilt from the base image?

On 2, can you check in the nova-compute.log on the origin host and the new host to look for ERROR messages related to the migrated VM? There should be messages there that indicate what went wrong so we can pinpoint the problem.

Changed in nova:
status: New → Incomplete
Revision history for this message
Aaron Smith (aaron-smith) wrote :

We are seeing this issue as well. However, we always see the original disk on shared storage being deleted and recreated.

I found a section in the nova\compute\manager.py file
where it looks like the "recreate" flag is not being passed down to the driver level.
If I add the recreate variable to the dictionary, the evacuate always completes successfully
and preserves the user data.

~2539
           kwargs = dict(
                context=context,
                instance=instance,
                image_meta=image_meta,
                injected_files=files,
                admin_password=new_pass,
                bdms=bdms,
                detach_block_devices=detach_block_devices,
                attach_block_devices=self._prep_block_device,
                block_device_info=block_device_info,
                network_info=network_info,
                preserve_ephemeral=preserve_ephemeral)
            try:
                self.driver.rebuild(**kwargs)
            except NotImplementedError:
                # NOTE(rpodolyaka): driver doesn't provide specialized version
                # of rebuild, fall back to the default implementation
                self._rebuild_default_impl(**kwargs)
            instance.power_state = self._get_power_state(context, instance)

Revision history for this message
melanie witt (melwitt) wrote :

@Aaron Thanks for confirming this bug and sharing your finding!

Changed in nova:
importance: Undecided → High
status: Incomplete → Triaged
jichenjc (jichenjc)
Changed in nova:
assignee: nobody → jichenjc (jichenjc)
Revision history for this message
jichenjc (jichenjc) wrote :

I submitted a patch for this bug and add recreate flag

however, I am not sure whether it can fix the problem pointed by original bug description
because the 'recreate' flag not passed lead to the instance will always be destroyed (recreate always False)
not sure this will lead to problem or not , I don't have NFS env so I only use Partial bug not Close bug here

if not recreate:
            self.driver.destroy(context, instance, network_info,
                                block_device_info=block_device_info)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/123454

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Alex Xu (xuhj) wrote :

Actually the rebuild_instance in compute manager check the shared storage https://github.com/openstack/nova/blob/454760d46cbda1f8677a3d8caa5bb47e43f6d9cf/nova/compute/manager.py#L2718

but it just print a log, didn't use that variable doing anything.

Revision history for this message
jichenjc (jichenjc) wrote :

commit 91d3272b975572d9866b7d959547e438142dc4fb fixed this problem
so marked Fix committed here

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by jichenjc (<email address hidden>) on branch: master
Review: https://review.openstack.org/123454
Reason: fix already committed

Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.