soft deleted instance is deleted when error restoring

Bug #1932268 reported by HYSong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Undecided
Unassigned

Bug Description

The SOFT_DELETED instance will be deleted when executing restore instance failed.

restore instance:
instance.task_state = task_states.RESTORING
instance.deleted_at = None

If `self.driver.restore(instance)` or `self._power_on(context, instance)` in `nova/compute/manager.py` execute failed, instance.task_state will revert to None due to `@reverts_task_state`.

The instance will be filtered in _reclaim_queued_deletes task and will be deleted incorrectly.

filters = {'vm_state': vm_states.SOFT_DELETED,
           'task_state': None,
           'host': self.host}

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

What do you suggests? What state should the VM be in after a failed restore? The VM is still soft-deleted on the hypervisor as the restore is failed and no task is ongoing as that restore task failed. For me it is the expected behavior that if the restore fails and then the reclaim timer hits then the VM is deleted.

Closing this as Invalid. Please set it back to New if you disagree.

Changed in nova:
status: New → Invalid
Revision history for this message
norman shen (jshen28) wrote :

err, for me restore a vm which is in soft_delete state should be able to execute multiple times as long as the deadline is not met...

internally, there might be different problems causing restoration failed but not all of them are fatal.. for example, there might be permission issues in the vm folder which could be solved manually and vm could still be restored next time..

Revision history for this message
HYSong (songhongyuan) wrote :

I suggest to execute 'instance.deleted_at = None' after self.driver.restore(instance) or self._power_on(context, instance) finished, and the vm will not be deleted when restoring failed.

Changed in nova:
status: Invalid → New
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/nova/+/796985

Revision history for this message
Artom Lifshitz (notartom) wrote :

I think I agree with the reporter here. Imagine something like this:

1. Soft-delete instance
2. Try to restore
3. Restore fails
4. Try to restore again
5. Repeat 3 and 4 until restore successful.

In the current code, if we race with the reclaim timer popping between 3 and 4, as step 3.5 of sorts, the _reclaim_queued_deletes() -> _deleted_old_enough() call will end up with a instance.deleted_at = None, and delete the instance, which is not great UX.

Changed in nova:
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.