Should rollback the instance if finish_resize fails

Bug #1396003 reported by Charlotte Han
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Low
Nalini Varshney

Bug Description

Resize failed in finish_resize function, the instance disappeared and can not rollback.

log is :
http://paste.openstack.org/show/137968/

Charlotte Han (hanrong)
tags: added: in-stable-icehouse
Madhurya (madhurya-jesu)
Changed in nova:
assignee: nobody → Madhurya (madhurya-jesu)
Madhurya (madhurya-jesu)
Changed in nova:
status: New → Incomplete
Revision history for this message
Madhurya (madhurya-jesu) wrote :

Hi Rong Han, can you please elaborate your bug. When I changed the '-allow_resize_to_same_host=true' and '-allow_migrate_to_same_host=true' in nova.conf and restarted nova-compute and nova-scheduler, I was able to resize the instance. So can please post your nova.conf file.

Revision history for this message
Charlotte Han (hanrong) wrote :

# Allow destination machine to match source for resize. Useful
# when testing in single-host environments. (boolean value)
#allow_resize_to_same_host=false

Revision history for this message
Charlotte Han (hanrong) wrote :

I use the default configuration of 'allow_resize_to_same_host=false' in nova.conf.

Revision history for this message
Robert Collins (lifeless) wrote :

So it sounds like the default configuration will break vms on resize?

Changed in nova:
status: Incomplete → New
Changed in nova:
status: New → Confirmed
importance: Undecided → Low
Jian Wen (wenjianhn)
tags: removed: in-stable-icehouse
Revision history for this message
Jian Wen (wenjianhn) wrote :

Can anyone else reproduce this bug?

Jian Wen (wenjianhn)
Changed in nova:
assignee: Madhurya (madhurya-jesu) → nobody
Revision history for this message
Charlotte Han (hanrong) wrote :

Before resize instance, stop neutron-openvswitch-agent, then resize/migrate instance, the instance is error.

Revision history for this message
Jian Wen (wenjianhn) wrote :

I reproduced the bug by adding an exception to finish_migration().
" vif_type=binding_failed" means neutron failed to bind the port to the compute host.
You cannot rollback the instance since Neutron doesn't support so.
I don't think we can fix the bug in Nova.

p.s.
The error log is not friendly to users.

Changed in nova:
status: Confirmed → Invalid
Revision history for this message
Charlotte Han (hanrong) wrote :

if destination host's neutron server is unavailable, we should not resize to this destination host, or we can let it revert to it's source host.

Revision history for this message
Jian Wen (wenjianhn) wrote : Re: [Bug 1396003] Re: Resize failed in finish_resize function, the instance disappeared and can not rollback.

You are right.

Looks like we are able to revert it manually now.
See bug 1296519.

On Thu, Nov 19, 2015 at 3:39 PM, Rong Han ZTE <email address hidden> wrote:

> if destination host's neutron server is unavailable, we should not
> resize to this destination host, or we can let it revert to it's source
> host.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1396003
>
> Title:
> Resize failed in finish_resize function, the instance disappeared and
> can not rollback.
>
> Status in OpenStack Compute (nova):
> Invalid
>
> Bug description:
> Resize failed in finish_resize function, the instance disappeared and
> can not rollback.
>
> log is :
> http://paste.openstack.org/show/137968/
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/nova/+bug/1396003/+subscriptions
>

--
Best,

Jian

Changed in nova:
status: Invalid → Confirmed
summary: - Resize failed in finish_resize function, the instance disappeared and
- can not rollback.
+ Should rollback the instance if finish_resize fails
Revision history for this message
Charlotte Han (hanrong) wrote :

I try the same operation with kilo version, the same error is occurring.

Revision history for this message
Charlotte Han (hanrong) wrote :

and I reset-state --active, then hard reboot this instance, the instance is error.
log is as follow:
http://paste.openstack.org/show/479397/

Revision history for this message
Charlotte Han (hanrong) wrote :

I try the same operation with Mikata version, the same error is occurring.

see https://bugs.launchpad.net/nova/+bug/1586309, delete operation could not clear source compute node resource.

The openvswitch agent service on Destination compute node is active, but public net is not available for some reason I don't know on destination node.

Devstack openstack environment, two networks were created, I had two compute nodes.
When I created instance using public net, resize failed with Unexpected vif_type.
When I created instance using private net, resize successfully.
So I think when I chose some network which can not resize to another compute node successfully, please let this instance rollback to source compute node.
[stack@SBCJSlot5Rack2Centos7 instances]$ nova net-list
+--------------------------------------+---------+------+
| ID | Label | CIDR |
+--------------------------------------+---------+------+
| 44652d4f-95ee-4785-8c16-de74bbb722c0 | public | None |
| d40836c3-f036-4151-bbaa-cf92125ce552 | private | None |

Changed in nova:
assignee: nobody → Charlotte Han (hanrong)
status: Confirmed → In Progress
Revision history for this message
Charlotte Han (hanrong) wrote :

Can we use "revert_resize" to recover the error state of the instances after they had been migrated/resized failed?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/334747

Charlotte Han (hanrong)
Changed in nova:
importance: Low → High
Revision history for this message
Sean Dague (sdague) wrote :

Patch in merge conflict

Changed in nova:
importance: High → Low
status: In Progress → Confirmed
assignee: Charlotte Han (hanrong) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/334747
Reason: This review is > 6 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in nova:
assignee: nobody → Nalini Varshney (varshneyg)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.