XenAPI: Race - import followed by upload / waiting for snapshot coalesce

Bug #1317792 reported by Bob Ball
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Bob Ball

Bug Description

In a nutshell, so everyone doesn't have to read the rest of the bug description:
If a tree is imported with more than two VDIs, then the _snapshot_attached_here method will never succeed as the condition it is waiting for (the parent being equal to the original parent again) will not occur.

The following tree is imported from glance:
 May 8 13:41:10 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
 May 8 13:41:10 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
 May 8 13:41:10 localhost SMGC: [14336]................ eccebf0b(40.000M/2.008M)

If we immediately try to perform an operation that uses a snapshot (e.g. resize), we get the 'original' parent as 863cc348 - which means that later we will expect 863cc348 to be the new parent:
2014-05-08 13:41:23.769 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 863cc348-7a28-4dd0-a976-142eb887256c from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026

We then take a snapshot and wait for coalesce so we have as few VHDs in the chain as possible:
May 8 13:41:24 localhost SM: [24744] ['/usr/sbin/td-util', 'snapshot', 'vhd', '/var/run/sr-mount/16e0a27c-007c-d083-7459-ee038c92f8a4/eccebf0b-00b9-4df5-b558-8feaf835e850.vhd', '151c921e-dc28-4a66-b286-65cd43bf9a6f.vhd']
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 151c921e-dc28-4a66-b286-65cd43bf9a6f from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:41:26.688 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 151c921e-dc28-4a66-b286-65cd43bf9a6f doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

The snapshot has the effect of creating two children (ecce is the one we're referencing and b554 keeps the snapshot from being deleted):
May 8 13:41:34 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:41:34 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
May 8 13:41:34 localhost SMGC: [14336]................ *151c921e(40.000M/14.032M)
May 8 13:41:34 localhost SMGC: [14336].................... b55436a7(40.000M/5.000K)
May 8 13:41:34 localhost SMGC: [14336].................... eccebf0b(40.000M/4.012M)

Coalescing then happens, destroying first 863cc348 then 151c921, merging the differences up the chain into 814ea7ed:
May 8 13:41:59 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:41:59 localhost SMGC: [14336]............ *863cc348(40.000M/2.008M)
May 8 13:41:59 localhost SMGC: [14336]............ *151c921e(40.000M/14.032M)
May 8 13:41:59 localhost SMGC: [14336]................ b55436a7(40.000M/5.000K)
May 8 13:41:59 localhost SMGC: [14336]................ eccebf0b(40.000M/16.036M)

May 8 13:42:11 localhost SMGC: [14336]........ *814ea7ed(40.000M/40.079M)
May 8 13:42:11 localhost SMGC: [14336]............ *151c921e(40.000M/14.032M)
May 8 13:42:11 localhost SMGC: [14336]................ b55436a7(40.000M/5.000K)
May 8 13:42:11 localhost SMGC: [14336]................ eccebf0b(40.000M/16.036M)

May 8 13:42:36 localhost SMGC: [5186]........ *814ea7ed(40.000M/40.079M)
May 8 13:42:36 localhost SMGC: [5186]............ b55436a7(40.000M/5.000K)
May 8 13:42:36 localhost SMGC: [5186]............ eccebf0b(40.000M/16.036M)

During this time, we're waiting for coalesce and checking the parent. Eventually we get to the static state where Nova is waiting for the parent of eccebf0b to be the deleted VDI 863cc348 - which can never happen now.
2014-05-08 13:42:41.981 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] VHD eccebf0b-00b9-4df5-b558-8feaf835e850 has parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd from (pid=19364) _get_vhd_parent_uuid /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2026
2014-05-08 13:42:41.982 DEBUG nova.virt.xenapi.vm_utils [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] Parent 814ea7ed-4b99-44cc-8e89-1a8d2e5509bd doesn't match original parent 863cc348-7a28-4dd0-a976-142eb887256c, waiting for coalesce... from (pid=19364) _wait_for_vhd_coalesce /opt/stack/nova/nova/virt/xenapi/vm_utils.py:2119

2014-05-08 13:44:00.678 ERROR nova.virt.xenapi.vmops [req-62887aaa-cd8a-43d1-841a-bd1c144a0155 ServerDiskConfigTestJSON-620298595 ServerDiskConfigTestJSON-1956337666] [instance: 7f8db936-9634-47db-852e-7a43bbc992eb] _migrate_disk_resizing_up failed. Restoring orig vm due_to: VHD coalesce attempts exceeded (20), giving up....

We're lucky in a way that 863cc348 was destroyed before 151c921, because otherwise this race might not have been noticed if Nova picked the right time to claim all coalescing had finished - which could have given more problems when uploading the VHDs.

Tags: xenserver
Bob Ball (bob-ball)
description: updated
Changed in nova:
importance: Undecided → Medium
tags: added: xenserver
Bob Ball (bob-ball)
description: updated
Bob Ball (bob-ball)
description: updated
Bob Ball (bob-ball)
Changed in nova:
status: New → Confirmed
assignee: nobody → Bob Ball (bob-ball)
Changed in nova:
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/93827
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ae2a27ce19f3e24d4a8c713a73e617f4cd71d4b4
Submitter: Jenkins
Branch: master

commit ae2a27ce19f3e24d4a8c713a73e617f4cd71d4b4
Author: Bob Ball <email address hidden>
Date: Fri May 16 00:22:14 2014 +0100

    XenAPI: Tolerate multiple coalesces

    VHD coalescing might coalesce more than one VDI while waiting
    and might coalesce the 'grandparent' before the parent. Wait
    for the parent to be any of the original tree instead of just
    the original parent.

    Closes bug: 1317792

    Change-Id: I82c4db0e8a36c41a86adf4bd32304a4bfdabebbf

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → juno-1
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in nova:
milestone: juno-1 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/143109

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/icehouse)

Change abandoned by Bob Ball (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/143109

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.