OpenStack Compute (nova)

soft deleted instance is deleted when error restoring

Bug #1932268 reported by HYSong on 2021-06-17

6

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Confirmed	Undecided	Unassigned

Bug Description

The SOFT_DELETED instance will be deleted when executing restore instance failed.

restore instance:
instance.task_state = task_states.RESTORING
instance.deleted_at = None

If `self.driver.restore(instance)` or `self._power_on(context, instance)` in `nova/compute/manager.py` execute failed, instance.task_state will revert to None due to `@reverts_task_state`.

The instance will be filtered in _reclaim_queued_deletes task and will be deleted incorrectly.

filters = {'vm_state': vm_states.SOFT_DELETED,
'task_state': None,
'host': self.host}

Revision history for this message

Balazs Gibizer (balazs-gibizer) wrote on 2021-06-17:

#1

What do you suggests? What state should the VM be in after a failed restore? The VM is still soft-deleted on the hypervisor as the restore is failed and no task is ongoing as that restore task failed. For me it is the expected behavior that if the restore fails and then the reclaim timer hits then the VM is deleted.

Closing this as Invalid. Please set it back to New if you disagree.

Changed in nova:
status:	New → Invalid

Revision history for this message

norman shen (jshen28) wrote on 2021-06-18:

#2

err, for me restore a vm which is in soft_delete state should be able to execute multiple times as long as the deadline is not met...

internally, there might be different problems causing restoration failed but not all of them are fatal.. for example, there might be permission issues in the vm folder which could be solved manually and vm could still be restored next time..

Revision history for this message

HYSong (songhongyuan) wrote on 2021-06-18:

#3

I suggest to execute 'instance.deleted_at = None' after self.driver.restore(instance) or self._power_on(context, instance) finished, and the vm will not be deleted when restoring failed.

Changed in nova:
status:	Invalid → New

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-06-18: Fix proposed to nova (master)

#4

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/796985

Revision history for this message

Artom Lifshitz (notartom) wrote on 2022-05-09:

#5

I think I agree with the reporter here. Imagine something like this:

1. Soft-delete instance
2. Try to restore
3. Restore fails
4. Try to restore again
5. Repeat 3 and 4 until restore successful.

In the current code, if we race with the reclaim timer popping between 3 and 4, as step 3.5 of sorts, the _reclaim_queued_deletes() -> _deleted_old_enough() call will end up with a instance.deleted_at = None, and delete the instance, which is not great UX.

Changed in nova:
status:	New → Confirmed

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.