OpenStack Compute (Nova)

live migrations do not update dnsmasq entries or setup networking on destination node when using multi_host

Reported by rackerjoe on 2012-02-22
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Critical
Vish Ishaya

Bug Description

When performing a live migration (kvm) when multi_host is set to True the following does not happen:

1) The networks(bridge and vlan) on the destination node are not setup by nova-network.
*) If this is not configured before the migration the instance will fail to start on the destination node and will rollback to the source node.

2) dnsmasq is not updated on the destination node.
*) dnsmasq hosts file is not updated on the migration destination and it will not reply to DHCP requests from the migrated instance.
*) Additionally, DHCP requests will still be answered by the source migration node until a new instance is created on that compute node. When that happens dnsmasq host files are re-written and dnsmasq is sent SIGHUP and it will no longer respond to DHCP requests from the migrated instance.

If both of the above occur the migrated instance will lose IP access upon the expiration of its lease.

I have included a patch that will fix this in the short-term but a more elegant resolution is required.

Tested and fixed on diablo/stable. This bug is also present in essex.

Vish Ishaya (vishvananda) wrote :

I discussed this a bit offline with the networking team. It seems a little challenging to do the correct implementation for essex, but here is the basic plan:

Notes below from Trey Morris:

we'll need to pull the network_setup functionality out of ip allocation/deallocation and add a callable trigger to that functionality to the network api. It looks like for allocate_ip the functionality is already split out into the _setup_network() function. We need to do something similar for deallocate_ip, like _teardown_network(), and create a setup_networks() function with a corresponding network_api call.

setup/unsetup_all could be optimized into one function with a default parameter, something like:

def setup_networks(self, context, teardown=False, **kwargs):
    if teardown:
        call_func = self._teardown_network
    else:
        call_func = self._setup_network

    *pull instance variables from kwargs*
     nw_info = self.get_instance_nw_info(instance_stuff...)
     for vif in nw_info:
         self.call_func(context, vif['network'])

The flow as I see it (from compute) would be
def live_migrate():
    self.network_api.setup_networks(instance, teardown=True)
    perform migrate as it was before
    self.network_api.setup_networks(instance)

allocate_fixed_ip would still call self._setup_network(context, network) and the single network would be configured just as it was before, and deallocate could do the same, only it would call self._teardown_network(context, network) instead of performing the teradown in-function.

My only addition might be that you would want to teardown the network on the old host after the migrate, which means you might have to pass the host in the call somewhere.

In the meantime the above patch will at least make things work.

Changed in nova:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Trey Morris (tr3buchet)
Vish Ishaya (vishvananda) wrote :

apparently this affects resize as well. And tr3buchet also discovered that deallocate is not working properly

Changed in nova:
importance: Medium → Critical

Fix proposed to branch: master
Review: https://review.openstack.org/4635

Changed in nova:
status: Triaged → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/4646

Changed in nova:
milestone: none → essex-4
Thierry Carrez (ttx) on 2012-03-01
Changed in nova:
milestone: essex-4 → essex-rc1
Changed in nova:
assignee: Trey Morris (tr3buchet) → Vish Ishaya (vishvananda)
Changed in nova:
assignee: Vish Ishaya (vishvananda) → Trey Morris (tr3buchet)

Reviewed: https://review.openstack.org/4646
Committed: http://github.com/openstack/nova/commit/0c7a54b3b44f849bf397bb4068ab16c576c3559c
Submitter: Jenkins
Branch: master

commit 0c7a54b3b44f849bf397bb4068ab16c576c3559c
Author: Trey Morris <email address hidden>
Date: Mon Feb 27 19:07:31 2012 -0600

    Setup and teardown networks during migration

    * fixes lp939060
    * live migration and resize nova appropriately setup
      and teardown networking related to network hosts
    * deallocate_fixed_ip is now run on the correct host
      resulting in the network structures being torn down
      correctly

    Change-Id: I2c86989ab7c6593bf346611cde8c043116d55bc5

Changed in nova:
status: In Progress → Fix Committed

Fix proposed to branch: master
Review: https://review.openstack.org/5038

Changed in nova:
assignee: Trey Morris (tr3buchet) → Vish Ishaya (vishvananda)
status: Fix Committed → In Progress

Fix proposed to branch: master
Review: https://review.openstack.org/5097

Changed in nova:
assignee: Vish Ishaya (vishvananda) → Brian Waldon (bcwaldon)
Brian Waldon (bcwaldon) on 2012-03-10
Changed in nova:
assignee: Brian Waldon (bcwaldon) → Vish Ishaya (vishvananda)

Reviewed: https://review.openstack.org/5097
Committed: http://github.com/openstack/nova/commit/81c1d70754543360e11e3aaba2ed403872b21302
Submitter: Jenkins
Branch: master

commit 81c1d70754543360e11e3aaba2ed403872b21302
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Mar 8 12:51:36 2012 -0800

    Clean up setup and teardown for dhcp managers

     * use update_dhcp on teardown, not release_dhcp
     * clean up setup / teardown to not require vif and address
     * make dnsmasq only configure allocated ips
     * prepares the fix for bug 939060

    Change-Id: Ie85860c5549339befee74c951ccb0d72a92f6d6c

Changed in nova:
status: In Progress → Fix Committed
Changed in nova:
status: Fix Committed → In Progress
Thierry Carrez (ttx) wrote :

Hrm, is this really still in progress or did the bot drink too much ?

Reviewed: https://review.openstack.org/5038
Committed: http://github.com/openstack/nova/commit/33def9e714fbd13a6dc4b755ade4841c971f7ae5
Submitter: Jenkins
Branch: master

commit 33def9e714fbd13a6dc4b755ade4841c971f7ae5
Author: Vishvananda Ishaya <email address hidden>
Date: Thu Mar 8 12:53:44 2012 -0800

    Fix live-migration in multi_host network

     * call teardown after live migration
     * call update a second time after migration for dhcp
     * moves the instance state update into post_live_migrate
     * completes the fix for bug 939060
     * fixes bug 947326

    Change-Id: I042567573b9bb46381c5447aa08e83cd1916b225

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2012-03-20
Changed in nova:
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2012-04-05
Changed in nova:
milestone: essex-rc1 → 2012.1
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers