live-migration --block-migrate fails with default libvirt flags

Bug #1441054 reported by Mathieu Rohon
22
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
Medium
Mathieu Rohon

Bug Description

while trying to live-migrate an instance with the --block-migrate option, I've got an error on the host which hosts the VM :

2015-04-07 11:01:32.554 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Starting monitoring of live migration from (pid=5202) _live_migration /opt/stack/nova/nova/virt/libvirt/driver.py:5642
2015-04-07 11:01:32.556 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Operation thread is still running from (pid=5202) _live_migration_monitor /opt/stack/nova/nova/virt/libvirt/driver.py:5494
2015-04-07 11:01:32.557 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Migration not running yet from (pid=5202) _live_migration_monitor /opt/stack/nova/nova/virt/libvirt/driver.py:5525
2015-04-07 11:01:33.142 INFO nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Migration running for 0 secs, memory 0% remaining; (bytes processed=0, remaining=0, total=0)
2015-04-07 11:01:33.277 ERROR nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Live Migration failure: End of file while reading data: Input/output error
2015-04-07 11:01:33.278 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Migration operation thread notification from (pid=5202) thread_finished /opt/stack/nova/nova/virt/libvirt/driver.py:5633
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/hub.py", line 457, in fire_timers
    timer()
  File "/usr/local/lib/python2.7/dist-packages/eventlet/hubs/timer.py", line 58, in __call__
    cb(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/event.py", line 168, in _do_send
    waiter.switch(result)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/greenthread.py", line 214, in main
    result = function(*args, **kwargs)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5428, in _live_migration_operation
    instance=instance)
  File "/usr/local/lib/python2.7/dist-packages/oslo_utils/excutils.py", line 85, in __exit__
    six.reraise(self.type_, self.value, self.tb)
  File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 5397, in _live_migration_operation
    CONF.libvirt.live_migration_bandwidth)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 183, in doit
    result = proxy_call(self._autowrap, f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 141, in proxy_call
    rv = execute(f, *args, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 122, in execute
    six.reraise(c, e, tb)
  File "/usr/local/lib/python2.7/dist-packages/eventlet/tpool.py", line 80, in tworker
    rv = meth(*args, **kwargs)
  File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1734, in migrateToURI2
    if ret == -1: raise libvirtError ('virDomainMigrateToURI2() failed', dom=self)
libvirtError: End of file while reading data: Input/output error
2015-04-07 11:01:33.644 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] VM running on src, migration failed from (pid=5202) _live_migration_monitor /opt/stack/nova/nova/virt/libvirt/driver.py:5500
2015-04-07 11:01:33.645 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Fixed incorrect job type to be 4 from (pid=5202) _live_migration_monitor /opt/stack/nova/nova/virt/libvirt/driver.py:5520
2015-04-07 11:01:33.645 ERROR nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Migration operation has aborted
2015-04-07 11:01:33.733 DEBUG nova.virt.libvirt.driver [-] [instance: 31b63d63-b392-4197-8864-b6d85dae438f] Live migration monitoring is all done from (pid=5202) _live_migration /opt/stack/nova/nova/virt/libvirt/driver.py:5653

live migration with block-migrate works fine when I remove the flag VIR_MIGRATE_TUNNELLED from the option block_migration_flag.

indeed, live block migration cannot occur in tunneling mode, as reported here :
https://wiki.openstack.org/wiki/OSSN/OSSN-0007

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/171098

Changed in nova:
assignee: nobody → Mathieu Rohon (mathieu-rohon)
status: New → In Progress
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

I noted the context in the commit, noting here too.

During migration, the VIR_MIGRATE TUNNELLED transports data
over libvirt's RPC layer (taking advantage of encryption). However, it
is not supported yet. From a discussion with upstream libvirt
developers:

    The support for Network Block Device (NBD) is just not implemented
    in libvirt for tunneled migration. It's becuase we'd have to
    implement some multiplexing of memory stream and all disks streams
    into a single libvirt stream. Or we'd have to provide new APIs for
    migrations that would support more than one stream and since
    tunneled migration is not the best idea anyway, we just decided not
    to implement NBD.

    The right solution is to implement TLS support for QEMU so that we
    can do secure non-tunneled migration.

On a very related note, Daniel Berrange said on upstream QEMU devel
list, he has in-progress (mid/long-term) work to support "Universal
encryption on QEMU I/O channels"[1]

[1] https://lists.gnu.org/archive/html/qemu-devel/2015-02/msg00529.html

tags: added: live-migrate
Changed in nova:
importance: Undecided → Medium
Paul Murray (pmurray)
tags: added: live-migration
removed: live-migrate
Revision history for this message
Mark McLoughlin (markmc) wrote :

Interestingly, in my testing of block migration over tunneled mode I'm seeing libvirt recognize that it can't do the NBD/drive-mirror type of migration but falls back to the older 'inc = true' approach which causes qemu to do the data transfer over the same channel as the memory. The logs show:

  qemuMigrationRun:4254 : Destination doesn't support NBD server Falling back to previous implementation.

but it works just fine. The QMP migrate command is:

  "migrate","arguments":{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"}

This is with libvirt-1.2.17-13.el7_2.2.x86_64 on RHEL 7.2 but it looks like this has been the behavior since the drive-mirror approach was added: http://libvirt.org/git/?p=libvirt.git;a=commitdiff;h=7b7600b3e6

Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

@markmc: Your observation matches my test, and it is valid behavior.

The current default flags Nova sets for live block migration are[1]:

    VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_LIVE,
    VIR_MIGRATE_TUNNELLED, VIR_MIGRATE_NON_SHARED_INC

So, since currently there's no support for NBD in TUNNELLED mode, when
you perform, it first throws a warning saying that:

-----------
2016-01-07 12:02:26.886+0000: 13202: warning : qemuMigrationBeginPhase:2654 : NBD in tunnelled migration is currently not supported
-----------

And, as you saw, it falls back to the older implementation:

-----------
2016-01-07 12:02:27.212+0000: 13202: debug : qemuMigrationDriveMirror:1727 : Destination doesn't support NBD server Falling back to previous implementation.
[...]
    2016-01-07 12:02:27.226+0000: 13202: debug : qemuMonitorJSONCommandWithFd:290 : Send command '{"execute":"migrate","arguments":{{"detach":true,"blk":false,"inc":true,"uri":"fd:migrate"},"id":"libvirt-18"}' for write with FD -1
-----------

I just did a test just with the above flags, but supplying them directly
to the libvirt wrapper `virsh`, to confirm the behavior:

    $ virsh migrate --verbose \
        --undefinesource \
        --copy-storage-inc \
        --p2p \
        --live \
        --tunnelled \
        cvm1 qemu+ssh://root@devstack2/system

[1] https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L165,L168

Changed in nova:
importance: Medium → Low
importance: Low → Medium
Revision history for this message
Kashyap Chamarthy (kashyapc) wrote :

Mathieu, you haven't mentioned what exact libvirt/QEMU/Nova versions you've tested with. The report is more than an year old. Can you please re-test (and mention the versions you've done your tests with).

From comment #3 and comment #4, you can see that the behavior you state is not reproducible currently.

With the below versions of using VIR_MIGRATE_TUNNELLED flag (which is the current default) with block-migrate works just fine.

I also tested with Nova, and I see identical behavior as to what I noticed with direct libvirt wrapper (see comment #4)

    $ nova live-migration --block-migrate vm1 devstack2

Live block migration completes successfully.

I tested (see comment #4) with these versions:

    $ rpm -q libvirt qemu-system-x86
    libvirt-1.2.13.1-3.fc22.x86_64
    qemu-system-x86-2.3.1-7.fc22.x86_64

Nova:

    $ git describe
    13.0.0.0b1-555-g6e2c516

Changed in nova:
status: In Progress → Incomplete
Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

I confirm that live-migration is working fine by default with debian jessie, even without explicitly removing the VIR_MIGRATE_TUNNELLED flag.

versions of packages :
libvirt0=1.2.9-9+deb8u1
qemu-kvm=1:2.1+dfsg-12+deb8u4

I also observe that libvirt tries the tunnel mode, but then fallback to 'previous implementation'.
I can see those two logs from libvirt :

warning : qemuMigrationBeginPhase:2332 : NBD in tunnelled migration is currently not supported
debug : qemuMigrationDriveMirror:1442 : Destination doesn't support NBD server Falling back to previous implementation.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (master)

Change abandoned by Mathieu Rohon (<email address hidden>) on branch: master
Review: https://review.openstack.org/171098
Reason: I confirm that newest libvirt allow to fallback to previous implementation for live-migration. This change is not needed anymore.

Changed in nova:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.