resize fails with FileExistsError if earlier resize attempt failed to cleanup

Bug #1960230 reported by Tobias Urdin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Tobias Urdin
Xena
Fix Released
Undecided
Unassigned

Bug Description

This bug is related to resize with the libvirt driver

If you are performing a resize and it fails the _cleanup_remote_migration() [1] function in the libvirt driver will try to cleanup the /var/lib/nova/instances/<uuid>_resize directory on the remote side [2] - if this fails the <uuid>_resize directory will be left behind and block any future resize attempts.

2021-12-14 14:40:12.535 175177 INFO nova.virt.libvirt.driver [req-9d3477d4-3bb2-456f-9be6-dce9893b0e95 23d6aa8884ab44ef9f214ad195d273c0 050c556faa5944a8953126c867313770 - default default] [instance: 99287438-c37b-44b0-834e-55685b6e83eb] Deletion of /var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize failed

Then on next resize attempt a long time later

2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 10429, in migrate_disk_and_power_off
2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server os.rename(inst_base, inst_base_resize)
2022-02-04 13:07:31.255 175177 ERROR oslo_messaging.rpc.server FileExistsError: [Errno 17] File exists: '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb' -> '/var/lib/nova/instances/99287438-c37b-44b0-834e-55685b6e83eb_resize'

This is happens here [3] because os.rename tries to rename the /var/lib/nova/instances/<uuid> dir to <uuid>_resize that already exists and fails with FileExistsError.

We should check if the directory exists before trying to rename and delete it before.

[1] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10773
[2] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10965
[3] https://opendev.org/openstack/nova/src/branch/stable/xena/nova/virt/libvirt/driver.py#L10915

Changed in nova:
status: New → In Progress
Changed in nova:
importance: Undecided → Medium
assignee: nobody → Tobias Urdin (tobias-urdin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.opendev.org/c/openstack/nova/+/827865
Committed: https://opendev.org/openstack/nova/commit/9111b99f739d41c092db8d01712a5aa72388b5fb
Submitter: "Zuul (22348)"
Branch: master

commit 9111b99f739d41c092db8d01712a5aa72388b5fb
Author: Tobias Urdin <email address hidden>
Date: Fri Feb 4 15:01:36 2022 +0100

    Cleanup old resize instances dir before resize

    If there is a failed resize that also failed the cleanup
    process performed by _cleanup_remote_migration() the retry
    of the resize will fail because it cannot rename the current
    instances directory to _resize.

    This renames the _cleanup_failed_migration() that does the
    same logic we want to _cleanup_failed_instance_base() and
    uses it for both migration and resize cleanup of directory.

    It then simply calls _cleanup_failed_instances_base() with
    the resize dir path before trying a resize.

    Closes-Bug: 1960230
    Change-Id: I7412b16be310632da59a6139df9f0913281b5d77

Changed in nova:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/xena)

Fix proposed to branch: stable/xena
Review: https://review.opendev.org/c/openstack/nova/+/828407

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 25.0.0.0rc1

This issue was fixed in the openstack/nova 25.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/xena)

Reviewed: https://review.opendev.org/c/openstack/nova/+/828407
Committed: https://opendev.org/openstack/nova/commit/31179f62f1c832e0e894d04e9c9dd59978577cc0
Submitter: "Zuul (22348)"
Branch: stable/xena

commit 31179f62f1c832e0e894d04e9c9dd59978577cc0
Author: Tobias Urdin <email address hidden>
Date: Fri Feb 4 15:01:36 2022 +0100

    Cleanup old resize instances dir before resize

    If there is a failed resize that also failed the cleanup
    process performed by _cleanup_remote_migration() the retry
    of the resize will fail because it cannot rename the current
    instances directory to _resize.

    This renames the _cleanup_failed_migration() that does the
    same logic we want to _cleanup_failed_instance_base() and
    uses it for both migration and resize cleanup of directory.

    It then simply calls _cleanup_failed_instances_base() with
    the resize dir path before trying a resize.

    Closes-Bug: 1960230
    Change-Id: I7412b16be310632da59a6139df9f0913281b5d77
    (cherry picked from commit 9111b99f739d41c092db8d01712a5aa72388b5fb)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 24.1.1

This issue was fixed in the openstack/nova 24.1.1 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/nova/+/864691

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/864730

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864692

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/nova/+/864691
Committed: https://opendev.org/openstack/nova/commit/bdbeb34a17f851fa7e6483e28cde951d26df7951
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit bdbeb34a17f851fa7e6483e28cde951d26df7951
Author: Tobias Urdin <email address hidden>
Date: Fri Feb 4 15:01:36 2022 +0100

    Cleanup old resize instances dir before resize

    If there is a failed resize that also failed the cleanup
    process performed by _cleanup_remote_migration() the retry
    of the resize will fail because it cannot rename the current
    instances directory to _resize.

    This renames the _cleanup_failed_migration() that does the
    same logic we want to _cleanup_failed_instance_base() and
    uses it for both migration and resize cleanup of directory.

    It then simply calls _cleanup_failed_instances_base() with
    the resize dir path before trying a resize.

    Closes-Bug: 1960230
    Change-Id: I7412b16be310632da59a6139df9f0913281b5d77
    (cherry picked from commit 9111b99f739d41c092db8d01712a5aa72388b5fb)
    (cherry picked from commit 31179f62f1c832e0e894d04e9c9dd59978577cc0)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/train)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/train
Review: https://review.opendev.org/c/openstack/nova/+/864692
Reason: stable/train branch of nova projects' have been tagged as End of Life. All open patches have to be abandoned in order to be able to delete the branch.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/ussuri)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/nova/+/864732

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/victoria)

Change abandoned by "Alexey Stupnikov <email address hidden>" on branch: stable/victoria
Review: https://review.opendev.org/c/openstack/nova/+/864730

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova wallaby-eom

This issue was fixed in the openstack/nova wallaby-eom release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.