ceph ephemeral info not updated during live migrate

Bug #1741364 reported by Walt Boring
60
This bug affects 11 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Confirmed
Medium
Unassigned

Bug Description

rbdimages information isn't updated during a live migration.

I have a user that has updated their ceph cluster and added a new monitor IP.
He adds a new monitor to his ceph cluster and updates ceph.conf on all of the openstack nodes to reflect the new monitor IP.

They have running VMs that are boot from ceph volume and also has attached ceph volumes.
He does a live migration to try and get nova to update the libvirt.xml and it seems that only the volumes section is updated, not the vms section.

This is a copy of xml after the live migrate.

    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='nova'>
        <secret type='ceph' uuid='820ccd0b-b180-4528-93ed-76ae82edf832'/>
      </auth>
      <source protocol='rbd' name='vms/3b97914e-3f9b-410a-b3d9-6c1a83244136_disk'> <-- this one is NOT changed, old ips
        <host name='192.168.200.12' port='6789'/>
        <host name='192.168.200.14' port='6789'/>
        <host name='192.168.200.24' port='6789'/>
        <host name='192.168.240.17' port='6789'/>
        <host name='192.168.240.23' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>
    <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <auth username='nova'>
        <secret type='ceph' uuid='820ccd0b-b180-4528-93ed-76ae82edf832'/>
      </auth>
      <source protocol='rbd' name='volumes/volume-6d04520d-0029-499c-af81-516a7ba37a54'> <-- this one is changed, new ips
        <host name='192.168.200.12' port='6789'/>
        <host name='192.168.200.14' port='6789'/>
        <host name='192.168.200.24' port='6789'/>
        <host name='192.168.210.15' port='6789'/>
        <host name='192.168.240.17' port='6789'/>
        <host name='192.168.240.23' port='6789'/>
      </source>
      <backingStore/>
      <target dev='vdb' bus='virtio'/>
      <serial>6d04520d-0029-499c-af81-516a7ba37a54</serial>
      <alias name='virtio-disk1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

Lee Yarwood (lyarwood)
tags: added: live-migration volumes
Dirk Mueller (dmllr)
description: updated
Matt Riedemann (mriedem)
tags: added: libvirt
Revision history for this message
Matt Riedemann (mriedem) wrote :

lyarwood has pointed this out before in IRC, but I wanted to mention it again here for clarity: the difference in those disks is one is an ephemeral disk based on the libvirt rbd imagebackend code in nova, and that one gets its values from nova.conf. The other is a volume and nova gets the config values from the cinder connection_info which gets refreshed during the live migration here:

https://github.com/openstack/nova/blob/74deea4d8f66a85e66ec79c72c9f257f562d5afd/nova/virt/libvirt/migration.py#L149

It appears we don't have something similar happening during live migration for ephemeral disk configuration with the rbd imagebackend.

During live migration from the source node, this is where we get the new xml for the guest to send to the destination node:

https://github.com/openstack/nova/blob/74deea4d8f66a85e66ec79c72c9f257f562d5afd/nova/virt/libvirt/driver.py#L6306

This code will get the ephemeral disk configuration:

https://github.com/openstack/nova/blob/74deea4d8f66a85e66ec79c72c9f257f562d5afd/nova/virt/libvirt/driver.py#L3603

And that's called from _get_guest_storage_config:

https://github.com/openstack/nova/blob/74deea4d8f66a85e66ec79c72c9f257f562d5afd/nova/virt/libvirt/driver.py#L3637

And we don't call either of those methods during a live migration so we don't get updated ephemeral disk config prior to starting the live migration.

Changed in nova:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Matt Riedemann (mriedem) wrote :

Note there is a patch for this issue in another bug:

https://bugs.launchpad.net/nova/+bug/1452641/comments/9

Lee Yarwood (lyarwood)
tags: removed: volumes
Revision history for this message
Maximilian Stinsky (mstinsky) wrote :

We hit the same problems with ephemeral volumes backed by ceph after we changed ceph mon ip's in our openstack deployment which is stein at the moment.

The patch that got mentioned did not work for us so we created a more or less hardcoded approach in novas migration.py to get instances to change the connection string while live migrating the instance.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.