Instance doesn't get an address via DHCP (nova-network) because of issue with live migration

Bug #1444497 reported by Timofey Durakov
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Timofey Durakov
Kilo
Fix Released
Undecided
Unassigned

Bug Description

When instance is migrated to another compute node, it's dhcp lease is not removed from the first compute node even after instance termination.
If a new instance got the same IP which was present in the previous instance created on the the first compute node where dhcp lease for this IP remains, then the dnsmasq refuse DHCP request of the IP address for a new instance with different MAC.

Steps to reproduce:
        Scenario:
            1. Create cluster (CentOS, nova-network with Flat-DHCP , Ceph for images and volumes)
            2. Add 1 node with controller and ceph OSD roles
            3. Add 2 node with compute and ceph OSD roles
            4. Deploy the cluster

            5. Create a VM
            6. Wait until the VM got IP address via DHCP (in VM console log)
            7. Migrate the VM to another compute node.
            8. Terminate the VM.

            9. Repeat stages from 5 to 8 several times (in my case - 4..6 times was enough) until a new instance stops receiving IP address via DHCP.
            10. Check dnsmasq-dhcp.log (/var/log/daemon.log on the compute node) for messages like :
=============================================
2014-11-09T20:28:29.671344+00:00 warning: not using configured address 10.0.0.2 because it is leased to fa:16:3e:65:70:be

This means that:
   I. An instance was created on the compute node-1 and got a dhcp lease:
==== nova-dhcpbridge.log
2014-11-09 20:12:03.811 27360 DEBUG nova.dhcpbridge [-] Called 'add' for mac 'fa:16:3e:65:70:be' with ip '10.0.0.2' main /usr/lib/python2.6/site-packages/nova/cmd/dhcpbridge.py:135

  II. When the instance was migrating from compute node-1 to node-3, 'dhcp_release' was not performed on compute node-1, please check the time range in the logs : 2014-11-09 20:14:36-37
==== Running.log (node-1)
2014-11-09T20:14:36.647588+00:00 debug: cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf conntrack -D -r 10.0.0.2
### But there is missing a command like: sudo nova-rootwrap /etc/nova/rootwrap.conf dhcp_release br100 10.0.0.2 fa:16:3e:65:70:be

  III. On the compute node-3, DHCP lease was added and it was successfully removed when the instance was terminated:
==== Running.log (node-3)
2014-11-09T20:15:17.250243+00:00 debug: cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf dhcp_release br100 10.0.0.2 fa:16:3e:65:70:be

  IV. When an another instance got the same address '10.0.0.2' and was created on node-1, it didn't get IP address via DHCP:
==== Running.log (node-1)
2014-11-09T20:28:29.671344+00:00 warning: not using configured address 10.0.0.2 because it is leased to fa:16:3e:65:70:be

Changed in nova:
status: New → In Progress
assignee: nobody → Timofey Durakov (tdurakov)
summary: Instance doesn't get an address via DHCP (nova-network) because of issue
- with live migration Edit
+ with live migration
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/173913

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/173913
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=d15c57a499fd7e87b3752ab007f344e055714f05
Submitter: Jenkins
Branch: master

commit d15c57a499fd7e87b3752ab007f344e055714f05
Author: Timofey Durakov <email address hidden>
Date: Wed Apr 15 17:29:00 2015 +0300

    Fixed nova-network dhcp-hostsfile update during live-migration

    During live migration _post_live_migration and
    post_live_migration_at_destination_method are executed
    simultaneously, because second one is called over rpc.cast
    In _post_live_migration method there was setup_network_on_host
    call with teardown=True, which expects new host in instances
    table db field. This update could be happened later, as it
    executes on destination node in second method. To guarantee
    execution order setup_network_on_host call, which cleans
    dhcp-hostfile is moved to destination node.
    Closes-Bug: #1444497

    Change-Id: I55f0c0148c937601e78f0beecc21b30a1164a690

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx)
Changed in nova:
milestone: none → liberty-1
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/kilo)

Fix proposed to branch: stable/kilo
Review: https://review.openstack.org/198340

Jay Pipes (jaypipes)
Changed in nova:
importance: Undecided → Medium
tags: added: kilo-backport-potential
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (stable/kilo)

Reviewed: https://review.openstack.org/198340
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=0cf44ff628cc9ed1e3092ac1c7746b696ac3b6aa
Submitter: Jenkins
Branch: stable/kilo

commit 0cf44ff628cc9ed1e3092ac1c7746b696ac3b6aa
Author: Timofey Durakov <email address hidden>
Date: Wed Apr 15 17:29:00 2015 +0300

    Fixed nova-network dhcp-hostsfile update during live-migration

    During live migration _post_live_migration and
    post_live_migration_at_destination_method are executed
    simultaneously, because second one is called over rpc.cast
    In _post_live_migration method there was setup_network_on_host
    call with teardown=True, which expects new host in instances
    table db field. This update could be happened later, as it
    executes on destination node in second method. To guarantee
    execution order setup_network_on_host call, which cleans
    dhcp-hostfile is moved to destination node.
    Closes-Bug: #1444497

    (cherry picked from commit d15c57a499fd7e87b3752ab007f344e055714f05)

    Change-Id: I55f0c0148c937601e78f0beecc21b30a1164a690

tags: added: in-stable-kilo
Thierry Carrez (ttx)
Changed in nova:
milestone: liberty-1 → 12.0.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.