the function of _resize_instance lack of exception handling

Bug #1919401 reported by HYSong
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Expired
Undecided
Unassigned

Bug Description

Env info:
openstack version: rocky
storage back-end: ceph
hypervisor: qemu/KVM

Sample traceback:
==================================
[req-69c94c9a-6ee4-4936-8ce5-9a23b7aea89a 00b865b2a29e47f8b57a62ac624bdfa4 9edd1f98bf2f47e885f7077a066c83dd - default default]
[instance: 642ab2df-4dc2-4ca8-9bbd-ab19c72352df]
Setting instance vm_state to ERROR: OSError: [Errno 39] Directory not empty
Traceback (most recent call last):
  File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 8333, in _error_out_instance_on_exception
    yield
  File "/var/lib/openstack/lib/python2.7/site-packages/nova/compute/manager.py", line 4693, in _resize_instance
    timeout, retry_interval)
  File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 8668, in migrate_disk_and_power_off
    shared_storage)
  File "/var/lib/openstack/lib/python2.7/site-packages/oslo_utils/excutils.py", line 220, in __exit__
    self.force_reraise()
  File "/var/lib/openstack/lib/python2.7/site-packages/oslo_utils/excutils.py", line 196, in force_reraise
    six.reraise(self.type_, self.value, self.tb)
  File "/var/lib/openstack/lib/python2.7/site-packages/nova/virt/libvirt/driver.py", line 8622, in migrate_disk_and_power_off
    os.rename(inst_base, inst_base_resize)
OSError: [Errno 39] Directory not empty

Description:
==================================
1. Executing VM resize error, and the the dir of inst_base_resize has been created by `os.rename(inst_base, inst_base_resize)` in the function of migrate_disk_and_power_off. The function of `_error_out_instance_on_exception` in _resize_instance just catch exceptions and can not rollback dir.

2. Executing command of `openstack server set` to recover VM status to active.

3. Executing VM resize error again, and Exception in Sample traceback. The operation of `os.rename(inst_base, inst_base_resize)` failed because of the dir of inst_base_resize has console.log.

4. Whether or not execute `self._cleanup_remote_migration` before `os.rename(inst_base, inst_base_resize)`? Is there any methods to optimize exception handling in the function of _resize_instance ?

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Could you be a bit more specific? Where / how did nova fail during the first resize? Looking at migrate_disk_and_power_off(), that does the original rename, it has exception handling that cleans up the change [1][2].

I'm marking this as Incomplete until my questions are answered. Please set it back to New when you did so.

[1] https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/driver.py#L8491
[2]https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/driver.py#L8352

Changed in nova:
status: New → Incomplete
tags: added: resize
tags: added: comute libvirt
Revision history for this message
HYSong (songhongyuan) wrote :

@Balazs Gibizer (balazs-gibizer)

Hi, I think is could be failed when Nova connect to Cinder API with probabilities.[1][2]

[1] https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4295
[2] https://github.com/openstack/nova/blob/stable/rocky/nova/compute/manager.py#L4764

Changed in nova:
status: Incomplete → New
HYSong (songhongyuan)
tags: added: compute
removed: comute
Revision history for this message
Sylvain Bauza (sylvain-bauza) wrote :

Could you please test it against master or a earlier release ?
As Rocky is in Extended Maintenance, it could be difficult to provide new bugfixes and we won't anyway provide new stable release versions for it.

https://releases.openstack.org/#release-series

Changed in nova:
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for OpenStack Compute (nova) because there has been no activity for 60 days.]

Changed in nova:
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.