"nova live-migration" fails silently

Bug #1009974 reported by Florian Haas
30
This bug affects 4 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Invalid
High
Unassigned
openstack-manuals
Fix Released
High
Florian Haas
nova (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Originally posted to the openstack list; see https://lists.launchpad.net/openstack/msg12819.html for the original thread.

Symptom is relatively easy to describe: you run "nova live-migration <guest> <host>", and nothing happens.

A few words of background:

- System is Ubuntu precise with stock packages and regular updates, no external PPAs. nova-compute is at version 2012.1-0ubuntu2.1.

- libvirtd is running with the "-l" option and with a working TCP socket as described here:
http://docs.openstack.org/trunk/openstack-compute/admin/content/configuring-live-migrations.html

- /var/lib/nova/instances is on GlusterFS.

Now, if you're setting various --*vnc* flags in nova.conf, live migration fails even at the libvirt level (a similar issue has been
reported here recently, see https://lists.launchpad.net/openstack/msg12425.html).

# virsh migrate --live --p2p --domain instance-0000000a \
  --desturi qemu+tcp://skunk-x/system
error: Unable to read from monitor: Connection reset by peer

("skunk-x" is secondary IP address of the host "skunk", living in a dedicated network used for migrations).

This is in the libvirt.log on the source host:
2012-06-05 20:39:25.838+0000: 12241: error : virNetClientProgramDispatchError:174 : Unable to read from monitor: Connection reset by peer

At the same time, I am seeing this in the libvirtd log on the target host:
2012-06-05 20:39:25.394+0000: 6828: error : qemuMonitorIORead:513 : Unable to read from monitor: Connection reset by peer

Removing all --*vnc* flags from nova.conf resolved that issue for me.

Then, doing the same command as above resulted in a connection timeout, because even if I set "qemu+tcp://skunk-x/system" as the libvirt destination URI, libvirt opens a separate socket on an ephemeral port on skunk's primary interface, which in that case was being blocked by my iptables config:

# virsh migrate --live --p2p \
  --domain instance-0000000d --desturi qemu+tcp://skunk-x/system
error: unable to connect to server at 'skunk:49159': Connection timed out

Switching the migration to tunnelled mode solved that issue.

# virsh domstate instance-0000000d
running
# virsh migrate --live --p2p \
  --domain instance-0000000d --desturi qemu+tcp://skunk-x/system \
  --tunnelled
# virsh --connect qemu+tcp://skunk-x/system domstate instance-0000000d
running

So therefore, these are the flags that I'm using in my nova.conf:

--live_migration_uri="qemu+tcp://%s-x/system"
--live_migration_flag="VIR_MIGRATE_UNDEFINE_SOURCE,
VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_TUNNELLED"

(Note that "VIR_MIGRATE_UNDEFINE_SOURCE, VIR_MIGRATE_PEER2PEER" is the default for --live_migration_flag; VIR_MIGRATE_TUNNELLED is my addition. I've also tried migrating over the primary interface, without tunnelling. No change: works in libvirt, doesn't work with Nova.)

"nova live-migration <guest> <host>" returns an exit code of 0, and the only trace that I find of the migration in the logs is this, which is evidently from the pre_live_migration method.

2012-06-06 11:05:13 DEBUG nova.rpc.amqp [-] received {u'_context_roles':
[u'KeystoneServiceAdmin', u'admin', u'KeystoneAdmin'], u'_msg_id':
u'069c958b7c03482aa4f0dda00010eb10', u'_context_read_deleted': u'no',
u'_context_request_id': u'req-71c4ffea-4d3d-471c-98bc-8a27aaff8f2c',
u'args': {u'instance_id': 13, u'block_migration': False, u'disk': None},
u'_context_auth_token': '<SANITIZED>', u'_context_is_admin': True,
u'_context_project_id': u'9c929e61e7624fbe895ae0de38bd1471',
u'_context_timestamp': u'2012-06-06T09:05:09.992775',
u'_context_user_id': u'1c8c118c7c244d2d94cc516ab6f24c03', u'method':
u'pre_live_migration', u'_context_remote_address': u'10.43.0.2'} from
(pid=14437) _safe_log
/usr/lib/python2.7/dist-packages/nova/rpc/common.py:160

Looks like it never gets to live_migration.

Thierry Carrez (ttx)
Changed in nova:
importance: Undecided → High
status: New → Confirmed
Revision history for this message
Florian Haas (fghaas) wrote :

For anyone else affected by this issue, please see Bug #987473.

In this case, it was apparently caused by the presence of double quotes around the --live_migration_uri and
--live_migration_flag values. Removing the double quotes immediately made the issue go away.

Changed in nova:
status: Confirmed → Invalid
Changed in nova (Ubuntu):
status: New → Invalid
Revision history for this message
Florian Haas (fghaas) wrote :

Closing as invalid, as this is more of a documentation on general config-file parsing issue, rather than one caused by nova-compute directly.

Revision history for this message
Lorin Hochstein (lorinh) wrote :

Added this as a doc bug, here's a copy-paste from an email from Florian:

Apologies for the noise. I had mistakenly believed that since the list
of --live_migration_flags contained spaces, it was OK to quote the
--live_migration_* options. Evidently it's not, as removing the double
quotes made the problem go away. My guests now do migrate.

Anne/Thierry: I've thus far been unable to find any documentation that
says "don't use double quotes in nova.conf". In fact,
http://docs.openstack.org/trunk/openstack-compute/admin/content/compute-options-reference.html
uses them in several places, including the --live_migration* flags. That
quotes can break things appears to be a known issue as per
https://bugs.launchpad.net/nova/+bug/987473. Is this something that we
should mention in the documentation, or is this unexpected parser
behavior that needs to get fixed?

Revision history for this message
Marcelo Dieder (mdieder) wrote :

I have the same problem.

I changed the nova.conf:

--live_migration_uri=qemu+tcp://%s-x/system
--live_migration_flag=VIR_MIGRATE_UNDEFINE_SOURCE,VIR_MIGRATE_PEER2PEER, VIR_MIGRATE_TUNNELLED

and

nova live-migration bd8474bc-2c93-4df8-ae04-372f10471b96 cloud02

But nothing happens. In the logs I see that it passes the iptables rules (new-network) but nothing else happens.

Revision history for this message
Florian Haas (fghaas) wrote :

@mdieder, you do realize that "qemu+tcp://%s-x/system" will only work in a situation like the one I originally described? I.e. you have to have a second IP address on that box, which is resolvable as <hostname>-x from the other host. Also, that whitespace before your VIR_MIGRATE_TUNNELLED would likely cause that option to be ignored.

Revision history for this message
Florian Haas (fghaas) wrote :

http://review.openstack.org/8383 submitted to the openstack-manuals repo.

Florian Haas (fghaas)
Changed in openstack-manuals:
status: New → In Progress
Lorin Hochstein (lorinh)
Changed in openstack-manuals:
assignee: nobody → Florian Haas (fghaas)
Tom Fifield (fifieldt)
Changed in openstack-manuals:
importance: Undecided → High
Revision history for this message
Hokuto Hoshi (hok-kanny) wrote :

Hi,

I had the same problem, and I couldn't solve it by above ways (remove double quote and add live_migration_flag in nova.conf).

VM (on Host1 ) -> Host2
Host1's ip address is 192.168.0.1
Host2's ip address is 192.168.0.2

I set the configuration of VNC in nova.conf as below:
# on Host1
vncserver_proxyclient_address=192.168.0.1
vncserver_listen=192.168.0.1

# on Host2
vncserver_proxyclient_address=192.168.0.2
vncserver_listen=192.168.0.2

I got traffic logs of libvirt between servers and found the xml file and the entry of VNC settings that is transferred before migration.
It is maybe invalid because it still contains the ip address of Host1. I think it should contain the ip address of Host2.

    <graphics type='vnc' port='5900' autoport='yes' listen='192.168.0.1' keymap='ja'>
      <listen type='address' address='192.168.0.1'/>
    </graphics>

So, I changed nova.conf on Host1 and Host2:
# on Host1
vncserver_proxyclient_address=192.168.0.1
vncserver_listen=0.0.0.0

# on Host2
vncserver_proxyclient_address=192.168.0.2
vncserver_listen=0.0.0.0

After change, I restarted nova-compute, relaunched instances and tried live-migration. VNC and live-migration worked fine.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (master)

Reviewed: https://review.openstack.org/8383
Committed: http://github.com/openstack/openstack-manuals/commit/e5e6e85ccef04063c275b4fd420ebb4477325bb5
Submitter: Jenkins
Branch: master

commit e5e6e85ccef04063c275b4fd420ebb4477325bb5
Author: Florian Haas <email address hidden>
Date: Sun Jun 10 22:32:43 2012 +0200

    Remove double quotes from nova.conf options reference

    Using double quotes in nova.conf can seriously mess up Nova in
    all sorts of ways. Remove their erroneous use from the config
    documentation, and also emphasize that using them is a bad idea.

    bug 1009974

    Change-Id: I6b8423739ce19e7dd9ab426ea70bc8c330b2455a

Changed in openstack-manuals:
status: In Progress → Fix Released
Revision history for this message
Jimmy (cmingt) wrote :

Hi Hokuto Hoshi,

I had the same problem too.
Thank you very much, VNC and live-migration now works very well.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/8923
Committed: http://github.com/openstack/openstack-manuals/commit/306347d87079ae656a91b36c281a03125712e960
Submitter: Jenkins
Branch: master

commit 306347d87079ae656a91b36c281a03125712e960
Author: Lorin Hochstein <email address hidden>
Date: Mon Jun 25 09:16:38 2012 -0400

    Documents vncserver_listen flag when used with live migrations

    Live migration breaks if vncserver_listen is set to a specific host
    IP address.

    Documentation based on this comment from the openstack-operators
    mailing list:
    http://lists.openstack.org/pipermail/openstack-operators/2012-June/000965.html

    See also comments in bug 1009974.

    Change-Id: I21b1d3eb5380effda3e2bfcd4f3ea15ea43f096a

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-manuals (stable/essex)

Fix proposed to branch: stable/essex
Review: https://review.openstack.org/9179

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-manuals (stable/essex)

Reviewed: https://review.openstack.org/9179
Committed: http://github.com/openstack/openstack-manuals/commit/da29b30ead43a3b53d8092f63f3b2aaa5761d32a
Submitter: Jenkins
Branch: stable/essex

commit da29b30ead43a3b53d8092f63f3b2aaa5761d32a
Author: Lorin Hochstein <email address hidden>
Date: Mon Jun 25 09:16:38 2012 -0400

    Documents vncserver_listen flag when used with live migrations

    Live migration breaks if vncserver_listen is set to a specific host
    IP address.

    Documentation based on this comment from the openstack-operators
    mailing list:
    http://lists.openstack.org/pipermail/openstack-operators/2012-June/000965.html

    See also comments in bug 1009974.

    Cherry picked from https://review.openstack.org/8923

    Change-Id: I21b1d3eb5380effda3e2bfcd4f3ea15ea43f096a

tags: added: in-stable-essex
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.