move_vhds_into_sr - invalid cookie
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
OpenStack Compute (nova) |
Fix Released
|
Medium
|
John Garbutt |
Bug Description
When moving VHDs on the filesystem a coalesce may be in progress. The result of this is that the VHD file is not valid when it is copied as it is being actively changed - and the VHD cookie is invalid.
Seen in XenServer CI: http://
2014-08-28 12:26:37.538 | Traceback (most recent call last):
2014-08-28 12:26:37.543 | File "tempest/
2014-08-28 12:26:37.550 | self.client.
2014-08-28 12:26:37.556 | File "tempest/
2014-08-28 12:26:37.563 | raise_on_
2014-08-28 12:26:37.570 | File "tempest/
2014-08-28 12:26:37.577 | server_
2014-08-28 12:26:37.583 | BuildErrorExcep
2014-08-28 12:26:37.589 | Details: {u'message': u'[\'XENAPI_
Changed in nova: | |
milestone: | juno-3 → none |
Changed in nova: | |
milestone: | none → juno-rc1 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | juno-rc1 → 2014.2 |
I think the easiest fix here is to repair the VHDs on import.
My current theory is that because 'wait_for_coalesce' assumes (and has always assumed) that a single coalesce is going to happen (which is not necessarily correct) we might be trying to copy the VHDs while a coalesce is in progress.
Coalesce of the chain a-b-c (where c is the leaf) happens by:
1) Copying the blocks changed in b into a (to give a'-b-c)
2) Re-parenting C to a' (a'-c b)
3) Deleting b.
During step 1, the size of the VHD is extended, the new blocks written, and an updated footer put at the end of the extended VHD. If a file-level copy of the VHD is made after it has been extended but before the new footer is written the footer will be invalid. For this reason Citrix XenServer is looking at moving towards ignoring the footer and only using a 'backup footer' which is actually at the head of the VHD. This change is likely to be too invasive to be considered for a hotfix.
It seems that this can be repaired with the (very cheap) vhd-util repair option.
This may have been exasperated by https:/ /review. openstack. org/#/c/ 93827/ to fix https:/ /launchpad. net/bugs/ 1317792.
The behaviour of bug 1317792 was as follows:
1) The chain a-b-c was imported
2) c was snapshotted giving a-b-c-d; we waited for 'd' to coalesce back into 'b'
3) b was coalesced into a, giving a-c-d
4) c was coalesced into a, giving a-d
5) 'wait_for_coalesce' failed with a timeout
The fix for this issue was to wait for 'd' to coalesce back into anything other than 'c'; in this case 'a' or 'b'. As such, we might stop waiting at step 3 meaning the copy happens while 'c' was being coalesced.
Even without this fix, the above scenario could have occurred if the GC decided to coalesce 'c' first then the copy happened while 'b' was being coalesced.
Copying the VHDs in this state and fixing them up afterwards is, in my view, preferable to reverting to the previous behaviour.
In terms of moving forward without breaking bug 1317792 again, I think the following are options: sm/lock. py). We're nervous about this as there have been deadlocks in the past with multiple threads locking (e.g. process 1 locks A, process 2 locks B, process 1 wants the lock for B). If there are other processes trying to lock the same things as SM then we're likely to see more issues with deadlocks or timeouts for valid SR operations.
1) Use vhd-util repair to fix up the VHDs after the fact. As described above, the VHDs will still be valid as b is not removed from the chain until c is re-parented to a'. As such, any 'incorrect' data in a' will not be read because it is guaranteed that b contains the correct data.
2) Changing wait_for_coalesce to wait for _all_coalescing to be complete, based on XenServer's understanding of whether the GC is still running (would need a XAPI plugin to poll the GC to make sure it's not running at the point we copy)
3) Adding a XAPI plugin to manually lock the SR or VDI (using /opt/xensource/
4) (least preferred) add more logic to Nova to guess when it thinks GC will be able to coalesce or not. We currently have some logic that looks for siblings but if we were to follow options 2 or 3 then we can probably delete ...