OSError occours when try to resize-confirm an instance with status 'VERIFY_RESIZE' using NFS bankend (KVM)

Bug #1248019 reported by Chen Xiao
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Chen Xiao
Havana
Fix Released
Undecided
Unassigned

Bug Description

when using at least two compute nodes using KVM, and use NFS share_storage to test resize an instance.
The configuration of NFS used the introduction about live-migration using NFS in community doc.

when executed command "nova resize ae6f9472-3080-4e86-8a52-f8e642081d15", can work well, and the instance's state will change to "VERIFY_RESIZE', Then I resize-confirm it, nova met the issue as follow:

{u'message': u"[Errno 39] Directory not empty: '/KVM/stack/data/nova/instances/ae6f9472-3080-4e86-8a52-f8e642081d15_resize'", u'code': 500, u'details': u' File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 263, in decorated_function |
| | return function(self, context, *args, **kwargs) |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2700, in confirm_resize |
| | do_confirm_resize(context, instance, migration_id) |
| | File "/usr/lib/python2.6/site-packages/nova/openstack/common/lockutils.py", line 246, in inner |
| | return f(*args, **kwargs) |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2697, in do_confirm_resize |
| | migration=migration) |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2724, in _confirm_resize |
| | network_info) |
| | File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 4623, in confirm_migration |
| | self._cleanup_resize(instance, network_info) |
| | File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 1018, in _cleanup_resize |
| | shutil.rmtree(target) |
| | File "/usr/lib64/python2.6/shutil.py", line 221, in rmtree |
| | onerror(os.rmdir, path, sys.exc_info()) |
| | File "/usr/lib64/python2.6/shutil.py", line 219, in rmtree |
| | os.rmdir(path) |
| | ', u'created': u'2013-10-22T15:10:50Z'}

cd /KVM/stack/data/nova/instances/be962096-a539-46c7-ae66-9ea383809e9b_resize
[root@cc be962096-a539-46c7-ae66-9ea383809e9b_resize]# ls -al
total 24340
drwxr-xr-x 2 nobody nobody 4096 Oct 18 2013 .
drwxrwxrwx 14 root root 4096 Oct 18 2013 ..
-rw-r--r-- 1 nobody nobody 25034752 Oct 18 2013 .nfs000000000714002e00000001

Chen Xiao (chenxiao)
description: updated
summary: - IOError occours when resize-confirm an instance with status
+ IOError occours when try to resize-confirm an instance with status
'VERIFY_RESIZE' using NFS bankend
summary: IOError occours when try to resize-confirm an instance with status
- 'VERIFY_RESIZE' using NFS bankend
+ 'VERIFY_RESIZE' using NFS bankend (KVM)
Chen Xiao (chenxiao)
description: updated
description: updated
Chen Xiao (chenxiao)
summary: - IOError occours when try to resize-confirm an instance with status
+ OSError occours when try to resize-confirm an instance with status
'VERIFY_RESIZE' using NFS bankend (KVM)
Chen Xiao (chenxiao)
Changed in nova:
assignee: nobody → Chen Xiao (chenxiao)
status: New → In Progress
Revision history for this message
Chen Xiao (chenxiao) wrote :

Sorry, I do not explain the detail
when executed command "nova resize ae6f9472-3080-4e86-8a52-f8e642081d15", can work well, and the instance's state will change to "VERIFY_RESIZE', Then I resize-confirm it, OpenStack will try to delete a directory which contains instance files (Debug to shutil.rmtree(target)), but in NFS , the directory will be accessed by qemu-kvm, according NFS working mechanism, it will create .nfsxxxx in the directory, then show the error ""[Errno 39] Directory not empty:".

These .nfs files are apparently created when you delete a directory that is accessed by a process.
http://librelist.com/browser//libgit2/2012/4/3/file-locking-in-libgit2/#7a19b8c37c5235ddfd60ff8a137c94ef

I see one discuss about NFS said, according to kernel nfs-client code, NFS will rename opening files to .nfsxxx, but every file it is about to unlink. Files that are not held open just then disappear so fast that it normally does not matter and you do not see the renamed file. Maybe in our case the files are not actually open, but the nfs just takes a while to send back an ack for the rpc call.
so use 'retry' to delete the useless files

Revision history for this message
Chen Xiao (chenxiao) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/55250
Committed: http://github.com/openstack/nova/commit/17d131661740e452e9729d9ac8881e6fede23dd7
Submitter: Jenkins
Branch: master

commit 17d131661740e452e9729d9ac8881e6fede23dd7
Author: chenxiao <email address hidden>
Date: Tue Nov 5 18:04:42 2013 +0800

    Libvirt:Instance resize confirm issue against NFS

    when I test resize-confirm on NFS (KVM) using at least two
    compute nodes, OSError will occur when try to cleanup the
    backup instance directory and will make instance status
    "Error".

    One way to cleanup resize is using delay_on_retry and
    maximum attempts setting. Replace rmtree() because
    execute() have more parameters choices.

    Closes-Bug: #1248019

    Change-Id: Ifb9a500bbba805af0317307c2e7d6903dcd02ad1

Changed in nova:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/havana)

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/61764

Changed in nova:
milestone: none → icehouse-2
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/havana
Review: https://review.openstack.org/74245

Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-2 → 2014.1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/havana)

Reviewed: https://review.openstack.org/74245
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=384c5487a53aab41110668f5ae4564514248fd26
Submitter: Jenkins
Branch: stable/havana

commit 384c5487a53aab41110668f5ae4564514248fd26
Author: chenxiao <email address hidden>
Date: Tue Nov 5 18:04:42 2013 +0800

    Libvirt:Instance resize confirm issue against NFS

    when I test resize-confirm on NFS (KVM) using at least two
    compute nodes, OSError will occur when try to cleanup the
    backup instance directory and will make instance status
    "Error".

    One way to cleanup resize is using delay_on_retry and
    maximum attempts setting. Replace rmtree() because
    execute() have more parameters choices.

    Closes-Bug: #1248019

    Change-Id: Ifb9a500bbba805af0317307c2e7d6903dcd02ad1
    (cherry picked from commit 17d131661740e452e9729d9ac8881e6fede23dd7)

tags: added: in-stable-havana
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.