resize-revert might be failed when using slow NFS

Bug #1206312 reported by Guangya Liu (Jay Lau)
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Guangya Liu (Jay Lau)

Bug Description

I did the followig steps with a script.
1) resize vm
2) resize-revert vm
3) goto 1)

Sometimes found resize failed with following error:
[root@rhel8233 ~]# nova show vm1
+-------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Property | Value |
+-------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+
| status | ERROR |
| updated | 2013-07-15T03:11:31Z |
| OS-EXT-STS:task_state | None |
| OS-EXT-SRV-ATTR:host | rhel8234 |
| key_name | None |
| image | cirros-0.3.0-x86_64 (9e66c9bb-ed95-42ab-b8f6-70a0f88b876a) |
| hostId | 0ddc9ca22d1deceff5ac4f83526f950fac6bb9b7c185c899985e60e1 |
| OS-EXT-STS:vm_state | error |
| OS-EXT-SRV-ATTR:instance_name | instance-00000068 |
| OS-SRV-USG:launched_at | 2013-07-15T02:59:20.000000 |
| OS-EXT-SRV-ATTR:hypervisor_hostname | rhel8234 |
| flavor | m1.tiny (1) |
| id | fb33fb20-260c-4096-9448-f4bdb5b2e617 |
| security_groups | [{u'name': u'default'}] |
| OS-SRV-USG:terminated_at | None |
| user_id | e18753eee1ec42349ef4d48f6a0cba01 |
| name | vm1 |
| created | 2013-07-15T02:45:22Z |
| tenant_id | 333416515ef44d51a3e6d03e003458ba |
| OS-DCF:diskConfig | MANUAL |
| metadata | {} |
| accessIPv4 | |
| accessIPv6 | |
| fault | {u'message': u'OSError', u'code': 500, u'details': u'[Errno 2] No such file or directory: \'/var/lib/nova/instances/fb33fb20-260c-4096-9448-f4bdb5b2e617\' |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 238, in decorated_function |
| | return function(self, context, *args, **kwargs) |
| | File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 2463, in finish_revert_resize |
| | block_device_info, power_on) |
| | File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 3892, in finish_revert_migration |
| | self._cleanup_failed_migration(inst_base) |
| | File "/usr/lib/python2.6/site-packages/nova/virt/libvirt/driver.py", line 3876, in _cleanup_failed_migration |
| | shutil.rmtree(inst_base) |
| | File "/usr/lib64/python2.6/shutil.py", line 204, in rmtree |
| | onerror(os.listdir, path, sys.exc_info()) |
| | File "/usr/lib64/python2.6/shutil.py", line 202, in rmtree |
| | names = os.listdir(path) |
| | ', u'created': u'2013-07-15T03:11:31Z'} |
| OS-EXT-STS:power_state | 1 |
| OS-EXT-AZ:availability_zone | nova |
| config_drive | |
+-------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+

After some debug, found that in driver.py:: finish_revert_migration()

def finish_revert_migration(self, instance, network_info,
                                block_device_info=None, power_on=True):
        LOG.debug(_("Starting finish_revert_migration"),
                   instance=instance)

        inst_base = libvirt_utils.get_instance_path(instance)
        inst_base_resize = inst_base + "_resize"

        # NOTE(danms): if we're recovering from a failed migration,
        # make sure we don't have a left-over same-host base directory
        # that would conflict. Also, don't fail on the rename if the
        # failure happened early.
        if os.path.exists(inst_base_resize):
            if os.path.exists(inst_base): <<< This can sometimes be true even the inst_base has been destroyed in manager.py::revert_resize
                self._cleanup_failed_migration(inst_base) <<< exception here, since inst_base cannot be found
            utils.execute('mv', inst_base_resize, inst_base)

Tags: libvirt
Revision history for this message
Guangya Liu (Jay Lau) (jay-lau-513) wrote :
Download full text (5.2 KiB)

Check nova compute log:

resize target host report the VM was destroyed in 23:56:18
2013-07-29 23:56:18.260 8105 INFO nova.virt.libvirt.driver [req-c7d7450a-e665-416f-a392-9b760f123fc9 2785a0d1141c4502b573155988382a27 1394848571b8408188a239da2a1bb351] [instance: 7bccaf62-ea4d-4e27-b4ff-d715fa2f9e9d] Deleting instance files /var/lib/nova/instances/7bccaf62-ea4d-4e27-b4ff-d715fa2f9e9d

But on source host of the VM, it can still detect that the os.path.exists(inst_base) is true. (inst_base: /var/lib/nova/instances/7bccaf62-ea4d-4e27-b4ff-d715fa2f9e9d)
2013-07-29 23:56:19.704 8365 ERROR nova.compute.manager [req-c7d7450a-e665-416f-a392-9b760f123fc9 2785a0d1141c4502b573155988382a27 1394848571b8408188a239da2a1bb351] [instance: 7bccaf62-ea4d-4e27-b4ff-d715fa2f9e9d] [Errno 2] No such file or directory: '/var/lib/nova/instances/7bccaf62-ea4d-4e27-b4ff-d715fa2f9e9d'. Setting instance vm_state to ERROR
2013-07-29 23:56:20.003 8365 ERROR nova.openstack.common.rpc.amqp [req-c7d7450a-e665-416f-a392-9b760f123fc9 2785a0d1141c4502b573155988382a27 1394848571b8408188a239da2a1bb351] Exception during message handling
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp Traceback (most recent call last):
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/amqp.py", line 421, in _process_data
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp **args)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 100, in wrapped
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp temp_level, payload)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/exception.py", line 77, in wrapped
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp return f(self, context, *args, **kw)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 235, in decorated_function
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp pass
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 221, in decorated_function
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp return function(self, context, *args, **kwargs)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 286, in decorated_function
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp function(self, context, *args, **kwargs)
2013-07-29 23:56:20.003 8365 TRACE nova.openstack.common.rpc.amqp File "/usr/lib/python2.6/site-packages/nova/compute/manager.py", line 263, in decor...

Read more...

Changed in nova:
assignee: nobody → Jay Lau (jay-lau-513)
summary: - resize-revert might be failed when using NFS
+ resize-revert might be failed when using slow NFS
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/39201

Changed in nova:
status: New → In Progress
Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/39201
Committed: http://github.com/openstack/nova/commit/9ba82ba8c66561e00698700f00be1b953a0a5a99
Submitter: Jenkins
Branch: master

commit 9ba82ba8c66561e00698700f00be1b953a0a5a99
Author: Jay Lau <email address hidden>
Date: Tue Jul 30 11:13:14 2013 +0800

    libvirt: ignore false exception due to slow NFS on resize-revert

    When resize revert one VM instance, if the NFS is slow, the NFS
    mounted on source host might not respond quickly. This will cause
    that even if the VM instance configuration files was deleted from
    resize target host, the NFS on source host may still think the
    files were not deleted. Then in driver.py::finish_revert_migration,
    the driver try to delete the file again, this will lead nova compute
    throw exception.

    The fix was ignore exception for such case: if failed to delete the
    instance with error no as 2 (No such file or directory)

    Fix bug 1206312

    Change-Id: I89990c1750a7fd7f421cab03780e7704ee661de1

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
milestone: none → havana-rc1
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: havana-rc1 → 2013.2
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.