XenAPI: Race condition in wait for coalesce

Bug #1282822 reported by Bob Ball
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
High
Bob Ball

Bug Description

wait for coalesce scans the SR then checks if the GC has finished.
The GC might finish between the two calls, so the state of the system is pre-GC but the GC claims not to be running.

This is a race which can cause an error when actually the state is now correct.

2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 2455, in backup_instance
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher task_states.IMAGE_BACKUP)
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/compute/manager.py", line 2521, in _snapshot_instance
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher update_task_state)
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/xenapi/driver.py", line 261, in snapshot
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher self._vmops.snapshot(context, instance, image_id, update_task_state)
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/xenapi/vmops.py", line 750, in snapshot
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher post_snapshot_callback=update_task_state) as vdi_uuids:
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/usr/lib/python2.7/contextlib.py", line 17, in __enter__
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher return self.gen.next()
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 790, in _snapshot_attached_here_impl
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher original_parent_uuid)
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher File "/opt/stack/nova/nova/virt/xenapi/vm_utils.py", line 2114, in _wait_for_vhd_coalesce
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher raise exception.NovaException(msg)
2014-02-20 20:26:55.336 TRACE oslo.messaging.rpc.dispatcher NovaException: VHD coalesce: Garbage collection not running, giving up...

Tags: xenserver
Revision history for this message
Bob Ball (bob-ball) wrote :

It's also not clear that the new behaviour is correct. Consider reverting to the old behaviour.
An alternative might be to walk the VDI path and test to see if more coalescing is possible rather than waiting for a specific coalesce.

description: updated
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/75366

Changed in nova:
status: New → In Progress
Revision history for this message
John Garbutt (johngarbutt) wrote :

Its not critical, downgrading, but leaving high due to CI impact

Changed in nova:
importance: Critical → High
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/75366
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ee772ea0c3449020b5bff4cf88aae4d88bae49c1
Submitter: Jenkins
Branch: master

commit ee772ea0c3449020b5bff4cf88aae4d88bae49c1
Author: Bob Ball <email address hidden>
Date: Fri Feb 21 10:56:16 2014 +0000

    Partially revert "XenAPI: Monitor the GC when coalescing"

    This partially reverts commit 270d4f1d6b100c802a65b31d35e406528aa7fd27.

    The plugin has been left so the plugin version is not made incompatible
    by removing a call, but the call to it from vm_utils has been removed.

    I expect this call will be re-introduced in future when we have identified
    the cause of the race condition and the appropriate usage.

    The original fix was made to fix bug 1258169 where the number of attempts
    made were not sufficient, so this revert does not reduce the number of
    attempts.

    Change-Id: I473b81b9970990b877d1886bb28a96888cc05f98
    Closes-bug: 1282822

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: icehouse-3 → 2014.1
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.