Instance doesn't get an address via DHCP (nova-network) because of issue with live migration

Bug #1391010 reported by Dennis Dmitriev
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Fix Released
High
Timofey Durakov
6.0.x
Won't Fix
Medium
MOS Nova
6.1.x
Fix Released
High
Timofey Durakov

Bug Description

When instance is migrated to another compute node, it's dhcp lease is not removed from the first compute node even after instance termination.
If a new instance got the same IP which was present in the previous instance created on the the first compute node where dhcp lease for this IP remains, then the dnsmasq refuse DHCP request of the IP address for a new instance with different MAC.

Steps to reproduce:
        Scenario:
            1. Create cluster (CentOS, nova-network with Flat-DHCP , Ceph for images and volumes)
            2. Add 1 node with controller and ceph OSD roles
            3. Add 2 node with compute and ceph OSD roles
            4. Deploy the cluster

            5. Create a VM
            6. Wait until the VM got IP address via DHCP (in VM console log)
            7. Migrate the VM to another compute node.
            8. Terminate the VM.

            9. Repeat stages from 5 to 8 several times (in my case - 4..6 times was enough) until a new instance stops receiving IP address via DHCP.
            10. Check dnsmasq-dhcp.log (/var/log/daemon.log on the compute node) for messages like :
=============================================
2014-11-09T20:28:29.671344+00:00 warning: not using configured address 10.0.0.2 because it is leased to fa:16:3e:65:70:be

This means that:
   I. An instance was created on the compute node-1 and got a dhcp lease:
==== nova-dhcpbridge.log
2014-11-09 20:12:03.811 27360 DEBUG nova.dhcpbridge [-] Called 'add' for mac 'fa:16:3e:65:70:be' with ip '10.0.0.2' main /usr/lib/python2.6/site-packages/nova/cmd/dhcpbridge.py:135

  II. When the instance was migrating from compute node-1 to node-3, 'dhcp_release' was not performed on compute node-1, please check the time range in the logs : 2014-11-09 20:14:36-37
==== Running.log (node-1)
2014-11-09T20:14:36.647588+00:00 debug: cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf conntrack -D -r 10.0.0.2
### But there is missing a command like: sudo nova-rootwrap /etc/nova/rootwrap.conf dhcp_release br100 10.0.0.2 fa:16:3e:65:70:be

  III. On the compute node-3, DHCP lease was added and it was successfully removed when the instance was terminated:
==== Running.log (node-3)
2014-11-09T20:15:17.250243+00:00 debug: cmd (subprocess): sudo nova-rootwrap /etc/nova/rootwrap.conf dhcp_release br100 10.0.0.2 fa:16:3e:65:70:be

  IV. When an another instance got the same address '10.0.0.2' and was created on node-1, it didn't get IP address via DHCP:
==== Running.log (node-1)
2014-11-09T20:28:29.671344+00:00 warning: not using configured address 10.0.0.2 because it is leased to fa:16:3e:65:70:be

api: '1.0'
astute_sha: 3c374c9f7bfbdbcd7ce2f716cd704e3044e6fb41
auth_required: true
build_id: 2014-11-07_21-36-13
build_number: '84'
feature_groups:
- mirantis
fuellib_sha: c7b71bd1ee939b5a634715ac7b13c2936ad93a5e
fuelmain_sha: 77f6a3a4d398c62d89b2831cef2c4b47c2b2085e
nailgun_sha: 8330f6221e190db87fc5baa735fa719c85a2e02d
ostf_sha: 9c6fadca272427bb933bc459e14bb1bad7f614aa
production: docker
release: '6.0'
release_versions:
  2014.2-6.0:
    VERSION:
      api: '1.0'
      astute_sha: 3c374c9f7bfbdbcd7ce2f716cd704e3044e6fb41
      build_id: 2014-11-07_21-36-13
      build_number: '84'
      feature_groups:
      - mirantis
      fuellib_sha: c7b71bd1ee939b5a634715ac7b13c2936ad93a5e
      fuelmain_sha: 77f6a3a4d398c62d89b2831cef2c4b47c2b2085e
      nailgun_sha: 8330f6221e190db87fc5baa735fa719c85a2e02d
      ostf_sha: 9c6fadca272427bb933bc459e14bb1bad7f614aa
      production: docker
      release: '6.0'

Tags: nova upgrades
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :
description: updated
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Ubuntu also affected.

affects: fuel → mos
Changed in mos:
milestone: 6.0 → none
milestone: none → 6.0
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
Dennis Dmitriev (ddmitriev) wrote :

Corresponding check of the issue added to the system test 'migrate_vm_backed_with_ceph':
https://review.openstack.org/135584

Changed in mos:
milestone: 6.0 → 6.0.1
Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :

This issue is still reproduced on our CI
Instance has been deleted, but it's DHCP lease for IP:10.0.0.2 with MAC:fa:16:3e:24:fe:33 still remains on the compute node node-3.test.domain.local

Do we have plans to fix it in this release ?

Revision history for this message
Andrey Sledzinskiy (asledzinskiy) wrote :
Revision history for this message
Timofey Durakov (tdurakov) wrote :
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-6.1/2014.2)

Reviewed: https://review.fuel-infra.org/5797
Submitter: Roman Podoliaka <email address hidden>
Branch: openstack-ci/fuel-6.1/2014.2

Commit: 193735ca11d8d55bb9b581b3ba4bde71f74a2ff9
Author: Timofey Durakov <email address hidden>
Date: Thu Apr 23 21:20:50 2015

Fixed nova-network dhcp-hostsfile update during live-migration

During live migration _post_live_migration and
post_live_migration_at_destination_method are executed
simultaneously, because second one is called over rpc.cast
In _post_live_migration method there was setup_network_on_host
call with teardown=True, which expects new host in instances
table db field. This update could be happened later, as it
executes on destination node in second method. To guarantee
execution order setup_network_on_host call, which cleans
dhcp-hostfile is moved to destination node.

Change-Id: I52bea4db7608abcf73fa781d2c0aaf3eaeb2f468
Closes-Bug: #1391010
Upstream-Bug: #1444497

Revision history for this message
OSCI Robot (oscirobot) wrote :

Reviewed: https://review.fuel-infra.org/5797
Committed: https://review.fuel-infra.org/gitweb?p=openstack/nova.git;a=commitdiff;h=193735ca11d8d55bb9b581b3ba4bde71f74a2ff9
Submitter: Roman Podoliaka
Branch: openstack-ci/fuel-6.1/2014.2

commit 193735ca11d8d55bb9b581b3ba4bde71f74a2ff9
Author: Timofey Durakov <email address hidden>

Fixed nova-network dhcp-hostsfile update during live-migration

During live migration _post_live_migration and
post_live_migration_at_destination_method are executed
simultaneously, because second one is called over rpc.cast
In _post_live_migration method there was setup_network_on_host
call with teardown=True, which expects new host in instances
table db field. This update could be happened later, as it
executes on destination node in second method. To guarantee
execution order setup_network_on_host call, which cleans
dhcp-hostfile is moved to destination node.

Change-Id: I52bea4db7608abcf73fa781d2c0aaf3eaeb2f468
Closes-Bug: #1391010
Upstream-Bug: #1444497

tags: added: upgrades
Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Fix proposed to branch: openstack-ci/fuel-7.0/2015.1.0
Change author: Timofey Durakov <email address hidden>
Review: https://review.fuel-infra.org/8260

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix merged to openstack/nova (openstack-ci/fuel-7.0/2015.1.0)

Reviewed: https://review.fuel-infra.org/8260
Submitter: mos-infra-ci <>
Branch: openstack-ci/fuel-7.0/2015.1.0

Commit: 057e4171761445f5295538c947d0ab10b431ec49
Author: Timofey Durakov <email address hidden>
Date: Thu Jul 16 08:22:39 2015

Fixed nova-network dhcp-hostsfile update during live-migration

During live migration _post_live_migration and
post_live_migration_at_destination_method are executed
simultaneously, because second one is called over rpc.cast
In _post_live_migration method there was setup_network_on_host
call with teardown=True, which expects new host in instances
table db field. This update could be happened later, as it
executes on destination node in second method. To guarantee
execution order setup_network_on_host call, which cleans
dhcp-hostfile is moved to destination node.

Change-Id: I52bea4db7608abcf73fa781d2c0aaf3eaeb2f468
Closes-Bug: #1391010
Upstream-Bug: #1444497

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Fix proposed to openstack/nova (openstack-ci/fuel-8.0/liberty)

Fix proposed to branch: openstack-ci/fuel-8.0/liberty
Change author: Timofey Durakov <email address hidden>
Review: https://review.fuel-infra.org/13278

Revision history for this message
Fuel Devops McRobotson (fuel-devops-robot) wrote : Change abandoned on openstack/nova (openstack-ci/fuel-8.0/liberty)

Change abandoned by Roman Podoliaka <email address hidden> on branch: openstack-ci/fuel-8.0/liberty
Review: https://review.fuel-infra.org/13278
Reason: Made it to upstream with another change-id - I55f0c0148c937601e78f0beecc21b30a1164a690

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.