Live migration: "Disk of instance is too large" when using a volume stored on NFS

Bug #1356552 reported by Cyril Roelandt
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Undecided
Cyril Roelandt
Icehouse
Fix Released
Undecided
Unassigned
Juno
Fix Released
Undecided
John Griffith

Bug Description

When live-migrating an instance that has a Cinder volume (stored on NFS) attached, the operation fails if the volume size is bigger than the space left on the destination node. This should not happen, since this volume does not have to be migrated. Here is how to reproduce the bug on a cluster with one control node and two compute nodes, using the NFS backend of Cinder.

$ nova boot --flavor m1.tiny --image 173241e-babb-45c7-a35f-b9b62e8ced78 test_vm
...

$ nova volume-create --display-name test_volume 100
...
| id | 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 |
...

$ nova volume-attach test_vm 6b9e1d03-3f53-4454-add9-a8c32d82c7e6 auto
...

$ nova show test_vm | grep OS-EXT-SRV-ATTR:host
| OS-EXT-SRV-ATTR:host | t1-cpunode0 |

$ nova service-list | grep nova-compute
| nova-compute | t1-cpunode0 | nova | enabled | up | 2014-08-13T19:14:40.000000 | - |
| nova-compute | t1-cpunode1 | nova | enabled | up | 2014-08-13T19:14:41.000000 | - |

Now, let's say I want to live-migrate test_vm to t1-cpunode1:

$ nova live-migration --block-migrate test_vm t1-cpunode1
ERROR: Migration pre-check error: Unable to migrate a0d9c991-7931-4710-8684-282b1df4cca6: Disk of instance is too large(available on destination host:46170898432 < need:108447924224) (HTTP 400) (Request-ID: req-b4f00867-df51-44be-8f97-577be385d536)

In nova/virt/libvirt/driver.py, _assert_dest_node_has_enough_disk() calls get_instance_disk_info(), which in turn, calls _get_instance_disk_info(). In this method, we see that volume devices are not taken into account when computing the amount of space needed to migrate an instance:

...
            if disk_type != 'file':
                LOG.debug('skipping %s since it looks like volume', path)
                continue

            if target in volume_devices:
                LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
                          'volume', {'path': path, 'target': target})
                continue
...

But for some reason, we never get into these conditions.

If we ssh the compute where the instance currently lies, we can get more information about it:

$ virsh dumpxml 11
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source file='/var/lib/nova/mnt/84751739e625d0ea9609a65dd9c0a6f1/volume-6b9e1d03-3f53-4454-add9-a8c32d82c7e6'/>
      <target dev='vdb' bus='virtio'/>
      <serial>6b9e1d03-3f53-4454-add9-a8c32d82c7e6</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </disk>
...

The disk type is "file", which might explain why this volume is not skipped in the code snippet shown above. When we use the default Cinder backend, we get something such as:

    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source dev='/dev/disk/by-path/ip-192.168.200.250:3260-iscsi-iqn.2010-10.org.openstack:volume-47ecc6a6-8af9-4011-a53f-14a71d14f50b-lun-1'/>
      <target dev='vdb' bus='virtio'/>
      <serial>47ecc6a6-8af9-4011-a53f-14a71d14f50b</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07'
function='0x0'/>
    </disk>

I think that the code in LibvirtNFSVolumeDriver.connect_volume() might be wrong: conf.source_type should be set to something else than "file" (and some other changes might be needed), but I must admit I'm not a libvirt expert.

Any thoughts ?

tags: added: libvirt
Changed in nova:
assignee: nobody → Vladik Romanovsky (vladik-romanovsky)
Revision history for this message
Cyril Roelandt (cyril-roelandt) wrote :

Also, I think it's worth noticing that_assert_dest_node_has_enough_disk() calls self.get_instance_disk_info(instance['name']), which means that get_instance_disk_info() has a block_device_info parameter equal to None, and _get_instance_disk_info() as well. In the end, block_device_info_get_mapping() returns an empty list, and volume_devices is an empty set, which explains why we never get in the following condition:

            if target in volume_devices:
                LOG.debug('skipping disk %(path)s (%(target)s) as it is a '
                          'volume', {'path': path, 'target': target})
                continue

So, we either have a problem with the libvirt configuration written by Nova for NFS volumes, or with volume devices not being properly detected when calling _assert_dest_node_has_enough_disk().

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/115041

Changed in nova:
assignee: Vladik Romanovsky (vladik-romanovsky) → Cyril Roelandt (cyril-roelandt)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/115041
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=671aa9f8b7ca5274696f83bde0d4822ee431b837
Submitter: Jenkins
Branch: master

commit 671aa9f8b7ca5274696f83bde0d4822ee431b837
Author: Cyril Roelandt <email address hidden>
Date: Mon Aug 18 17:45:35 2014 +0000

    libvirt: Make sure volumes are well detected during block migration

    Current implementation of live migration in libvirt incorrectly includes
    block devices on shared storage (e.g., NFS) when computing destination
    storage requirements. Since these volumes are already on shared storage
    they do not need to be migrated. As a result, migration fails if the
    amount of free space on the shared drive is less than the size of the
    volume to be migrated. The problem is addressed by adding a
    block_device_info parameter to check_can_live_migrate_source() to allow
    volumes to be filtered correctly when computing migration space
    requirements.

    This only fixes the issue on libvirt: it is unclear whether other
    implementations suffer from the same issue.

    Thanks to Florent Flament for spotting and fixing an issue while trying out
    this patch.

    Co-Authored-By: Florent Flament <email address hidden>
    Change-Id: Iac7d2cd2a70800fd89864463ca45c030c47411b0
    Closes-Bug: #1356552

Changed in nova:
status: In Progress → Fix Committed
tags: added: juno-backport-potential
Thierry Carrez (ttx)
Changed in nova:
milestone: none → kilo-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/icehouse)

Change abandoned by Sean Dague (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/117631
Reason: This review is > 4 weeks without comment and currently blocked by a core reviewer with a -2. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and contacting the reviewer with the -2 on this review to ensure you address their concerns.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/155863

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/156937

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/juno)

Reviewed: https://review.openstack.org/155863
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=42cae28241cd0c213201d036bfbe13fb118e4bee
Submitter: Jenkins
Branch: stable/juno

commit 42cae28241cd0c213201d036bfbe13fb118e4bee
Author: Cyril Roelandt <email address hidden>
Date: Mon Aug 18 17:45:35 2014 +0000

    libvirt: Make sure volumes are well detected during block migration

    Current implementation of live migration in libvirt incorrectly includes
    block devices on shared storage (e.g., NFS) when computing destination
    storage requirements. Since these volumes are already on shared storage
    they do not need to be migrated. As a result, migration fails if the
    amount of free space on the shared drive is less than the size of the
    volume to be migrated. The problem is addressed by adding a
    block_device_info parameter to check_can_live_migrate_source() to allow
    volumes to be filtered correctly when computing migration space
    requirements.

    This only fixes the issue on libvirt: it is unclear whether other
    implementations suffer from the same issue.

    Thanks to Florent Flament for spotting and fixing an issue while trying out
    this patch.

    Co-Authored-By: Florent Flament <email address hidden>
    Change-Id: Iac7d2cd2a70800fd89864463ca45c030c47411b0
    Closes-Bug: #1356552
    (cherry picked from commit 671aa9f8b7ca5274696f83bde0d4822ee431b837)

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/icehouse)

Reviewed: https://review.openstack.org/156937
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f513a282e125075dc43ee05ffcfe0a62c336c059
Submitter: Jenkins
Branch: stable/icehouse

commit f513a282e125075dc43ee05ffcfe0a62c336c059
Author: Cyril Roelandt <email address hidden>
Date: Mon Aug 18 17:45:35 2014 +0000

    libvirt: Make sure volumes are well detected during block migration

    Current implementation of live migration in libvirt incorrectly includes
    block devices on shared storage (e.g., NFS) when computing destination
    storage requirements. Since these volumes are already on shared storage
    they do not need to be migrated. As a result, migration fails if the
    amount of free space on the shared drive is less than the size of the
    volume to be migrated. The problem is addressed by adding a
    block_device_info parameter to check_can_live_migrate_source() to allow
    volumes to be filtered correctly when computing migration space
    requirements.

    This only fixes the issue on libvirt: it is unclear whether other
    implementations suffer from the same issue.

    Thanks to Florent Flament for spotting and fixing an issue while trying out
    this patch.

    (cherry picked from commit 42cae28241cd0c213201d036bfbe13fb118e4bee)

    Conflicts:
     nova/tests/virt/libvirt/test_libvirt.py
     nova/virt/driver.py
     nova/virt/hyperv/driver.py
     nova/virt/xenapi/driver.py

    Co-Authored-By: Florent Flament <email address hidden>
    Closes-Bug: #1356552
    Change-Id: Iac7d2cd2a70800fd89864463ca45c030c47411b0

tags: added: in-stable-icehouse
Thierry Carrez (ttx)
Changed in nova:
milestone: kilo-1 → 2015.1.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.