nova should allow evacuate for an instance in the Error state

Bug #1298061 reported by Chris Friesen on 2014-03-26
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Chris Friesen
Ubuntu Cloud Archive
Medium
Unassigned
Icehouse
Medium
Unassigned
nova (Ubuntu)
Medium
Unassigned
Trusty
Medium
Seyeong Kim

Bug Description

[Impact]

 * Instances in error state cannot be evacuated.

[Test Case]

 * nova evacuate <error_state_instance> <another_compute_host>
 * nova refuses to evacuate the instance because of its state

[Regression Potential]

 * Cherry picked from upstream
   - removed unnecessary argument passing
   - add allowing ERROR state before evacuating.
 * actually, in code, added one parameter, and removed unused one.
   so very low regression possibility.
 * Tested on juju+maas test env.
 * Passed tempest smoke tests locally.

Note: one simple way to put an instance into error state is to directly change its database record, for example "update instances set vm_state='error' where uuid='XXXXXXXX'"

We currently allow reboot/rebuild/rescue for an instance in the Error state if the instance has successfully booted at least once.

We should allow "evacuate" as well, since it is essentially a "rebuild" on a different compute node.

This would be useful in a number of cases, in particular if an initial evacuation attempt fails (putting the instance into the Error state).

@Chris, we may need to consider more, what is the state if we evacuate an error VM to other hosts?

Currently, evacuate only support two states: ACTIVE and STOPPED. If the VM is ACTIVE, after evacuate, its state is still ACTIVE; if the VM is STOPPED, after evacuate, its state is still STOPPED.

For ERROR VM, we cannot decide its state after evacuate, comments?

Chris Friesen (cbf123) wrote :

I think it would make the most sense to come up ACTIVE when evacuating from the ERROR state. The main reason why we would evacuate an instance at all is because it isn't running and we want it to run--if we didn't want it to be running we probably wouldn't have evacuated it in the first place, we could just wait and see if the compute node comes back up.

That said, I'm not totally happy with how we represent VMs that were on a compute node that died. It seems to me that we should leave the vm_state as-is and have something else that indicates that they're not actually in the desired state. If we had that then if we attempted to evacuate and failed we wouldn't set the vm_state to ERROR, we'd leave it in the previous state and have some other way of indicating a problem.

Tracy Jones (tjones-i) on 2014-03-27
tags: added: compute

@Chris, thanks for the comments, agree, seems moving the VM to ACTIVE is a good choice.

melanie witt (melwitt) wrote :

Triaging based on this similar bug "Instance in Error state should allow reboot / rebuild":

https://bugs.launchpad.net/nova/+bug/1183946

Changed in nova:
importance: Undecided → Medium
status: New → Confirmed
tags: added: api
Chris Friesen (cbf123) on 2014-04-07
Changed in nova:
assignee: nobody → Chris Friesen (cbf123)
haruka tanizawa (h-tanizawa) wrote :

How is the state of progress?
Thanks.

Fix proposed to branch: master
Review: https://review.openstack.org/100920

Changed in nova:
status: Confirmed → In Progress
Chris Friesen (cbf123) wrote :

@Haruka, sorry for the delay. I got sidetracked and forgot about this.

haruka tanizawa (h-tanizawa) wrote :

Thank you for your patch :)
I had same problem.

Chris Friesen (cbf123) wrote :

The change is ready to go in, if anyone feels like reviewing it...

Reviewed: https://review.openstack.org/100920
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=2f8dfc0da2fd7f13185c4638aa74013be617cf11
Submitter: Jenkins
Branch: master

commit 2f8dfc0da2fd7f13185c4638aa74013be617cf11
Author: Chris Friesen <email address hidden>
Date: Fri Mar 14 11:37:55 2014 -0600

    Allow evacuate from vm_state=Error

    We currently allow reboot/rebuild/rescue for an instance in the Error state.
    This commit allows "evacuate" as well, since it is essentially a "rebuild"
    on a different compute node.

    This is useful in a number of cases, in particular if an initial evacuation
    attempt fails.

    Change-Id: I3f513eb738c91fe71767308f57251629639efd6a
    Closes-Bug: 1298061

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-10-16
Changed in nova:
milestone: juno-2 → 2014.2
Liang Chen (cbjchen) on 2016-09-13
description: updated
tags: added: sts
Liang Chen (cbjchen) wrote :
Liang Chen (cbjchen) on 2016-09-13
Changed in nova (Ubuntu Trusty):
assignee: nobody → Liang Chen (cbjchen)
status: New → In Progress
James Page (james-page) on 2016-09-13
Changed in nova (Ubuntu):
status: New → Fix Released
Changed in nova (Ubuntu Trusty):
importance: Undecided → Medium
Changed in nova (Ubuntu):
importance: Undecided → Medium
Liang Chen (cbjchen) on 2016-09-13
description: updated
Liang Chen (cbjchen) on 2016-09-14
description: updated
description: updated
description: updated
Seyeong Kim (seyeongkim) on 2016-11-17
tags: added: sts-sru
Corey Bryant (corey.bryant) wrote :

Liang, thanks for the patches. LGTM. I'll upload once a local build passes.

no longer affects: cloud-archive
Changed in cloud-archive:
status: New → Invalid
status: Invalid → Fix Released
importance: Undecided → Medium
Corey Bryant (corey.bryant) wrote :

Uploaded to trusty review queue and awaiting sru team review. https://launchpad.net/ubuntu/trusty/+queue?queue_state=1&queue_text=

Robie Basak (racb) wrote :

"Regression Potential: None" is not acceptable. Please review the process documentation (fairly recently updated to make clearer) and fix: "a discussion of how regressions are most likely to manifest, or may manifest even if it is unlikely, as a result of this change. It is assumed that any SRU candidate patch is well-tested before upload and has a low overall risk of regression, but it's important to make the effort to think about what could happen in the event of a regression."

Robie Basak (racb) on 2017-01-25
Changed in nova (Ubuntu Trusty):
status: In Progress → Incomplete
Seyeong Kim (seyeongkim) on 2017-01-26
description: updated
Seyeong Kim (seyeongkim) wrote :

I updated regression section, please review it
Thanks.

Changed in nova (Ubuntu Trusty):
status: Incomplete → In Progress
assignee: Liang Chen (cbjchen) → Seyeong Kim (xtrusia)

Hello Chris, or anyone else affected,

Accepted nova into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/nova/1:2014.1.5-0ubuntu1.6 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how to enable and use -proposed.Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in nova (Ubuntu Trusty):
status: In Progress → Fix Committed
tags: added: verification-needed
Seyeong Kim (seyeongkim) wrote :

ii nova-common 1:2014.1.5-0ubuntu1.6 all OpenStack Compute - common files
ii nova-compute 1:2014.1.5-0ubuntu1.6 all OpenStack Compute - compute node base
ii nova-compute-kvm 1:2014.1.5-0ubuntu1.6 all OpenStack Compute - compute node (KVM)
ii nova-compute-libvirt 1:2014.1.5-0ubuntu1.6 all OpenStack Compute - compute node libvirt support
ii python-nova 1:2014.1.5-0ubuntu1.6 all OpenStack Compute Python libraries

deployed openstack-base with juju on maas

created trusty-test instance

maas-node-02(source) - stopped nova-compute service
maas-node-03(destination)

nova evacuate --password 123qwe trusty-test maas-node-03

then got password as output

Thanks.

tags: added: verification-done
removed: verification-needed
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package nova - 1:2014.1.5-0ubuntu1.6

---------------
nova (1:2014.1.5-0ubuntu1.6) trusty; urgency=medium

  * Allow evacuate for an instance in the Error state (LP: #1298061)
    - d/p/remove_useless_state_check.patch remove unnecessary task_state check
    - d/p/evacuate_error_vm.patch Allow evacuate from error state

 -- Liang Chen <email address hidden> Fri, 09 Sep 2016 17:41:48 +0800

Changed in nova (Ubuntu Trusty):
status: Fix Committed → Fix Released

The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Louis Bouchard (louis) on 2017-03-22
tags: added: sts-sru-done
removed: sts-sru
Dan Streetman (ddstreet) on 2019-06-04
tags: removed: sts
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers