dnsmasq dhcp lease is not cleaned up after instance termination

Bug #1655794 reported by Daniel Messer
28
This bug affects 5 people
Affects Status Importance Assigned to Milestone
kolla
New
Undecided
Unassigned

Bug Description

On a kolla 3.0.1 AiO deployment inside a Fedora 25 VM (packages updated) spawning new instances fails because tenant network DHCP leases are not getting cleaned up in dnsmasq.

How to reproduce:

- Install Kolla AiO according to Quick Start Guide using most current container binary images for CentOS or Ubuntu
- Deploy minimum OpenStack Admin infrastruture (Provider network, image, flavor)
- Spawn a tenant VM instance in a tenant network
- watch the DHCP lease for the instance get created in /var/lib/neutron/dhcp/<UUID>/leases
- delete instance
- watch the dhcp_release call being placed correctly in neutron-dhcp-agent log with debug enabled:

2017-01-12 01:26:33.769 7 DEBUG neutron.agent.linux.utils [req-e466e6ba-bedb-463f-bb9a-cbbc9985ef9d d5928079114a48f8ad247fbe12701311 c7b85840e87f4095bcc170d9c1a8da81 - - -] Running
 command: ['sudo', 'neutron-rootwrap', '/etc/neutron/rootwrap.conf', 'ip', 'netns', 'exec', 'qdhcp-9640edab-3695-4ebb-90e0-d2762ee8e98a', 'dhcp_release', 'tape2e74a87-93', '172.16.0.3', 'fa:16:3e:12:50:3c']

Expected behavior:

- lease is removed from dnsmasq lease file upon instance deletion

Actual behavior:

- lease is not removed from dnsmasq lease file
- further instances may not get IP addresses from dnsmasq because it's still hold by the orphaned lease, instance therefore do not spawn correctly because without IP they cannot get metadata
- observe log messages in dnsmasq hinting the root cause:

Jan 12 01:40:58 dnsmasq-dhcp[701]: not using configured address 172.16.0.3 because it is leased to fa:16:3e:12:50:3c

- with tcpdump on the dnsmasq tap interface no DHCP_RELEASE packet can be seen during dhcp_release call in neutron

Revision history for this message
Daniel Messer (dmesser) wrote :

Addendum: the behavior cannot be reproduced with either type of binary image when deployed on a VM running CentOS 7.3.1611

Revision history for this message
Eduardo Gonzalez (egonzalez90) wrote :

Under CentOS, with centos-source images from master cannot reproduce the issue.
DHCP leases are correctly removed at neutron_dhcp_agent /var/lib/neutron/dhcp/<ID>/leases.

http://paste.openstack.org/show/594714/

May be an issue with Fedora.

Revision history for this message
Daniel Messer (dmesser) wrote :

@Eduardo: correct, see my previous message - I cannot produce it on CentOS either, nor on Ubuntu. However I have no idea what could cause it. Network traffic is flowing. What does this dhcp_release call depend on that is different between CentOS and Fedora?

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

we're having the same issue when deploying kolla 3.0.1 with ubuntu:latest as base container

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

I found this old bug in neutron :

https://bugs.launchpad.net/neutron/+bug/1271344

it has been fixed by an upgrade of dnsmasq. There might be a regression in latest dnsmasq?

Revision history for this message
Mathieu Rohon (mathieu-rohon) wrote :

I have the same issue when using kolla 3.0.3 and centos as base image.

Revision history for this message
Daniel Messer (dmesser) wrote :

@Mathieu: which OS are you using to run kolla?

Revision history for this message
Erwan Le Bonniec (noisynoise) wrote :

@Daniel: I work with Mathieu, we are experiencing this issue by running Kolla on Debian Jessie.

Revision history for this message
Fairbanks. (fairbanks) wrote :

Hello,
I'm not using Kolla, but experience the same problems.
Using both Newton and Ocata on Ubuntu 16.04.

Does anyone have this fixed or worked-around this?

Revision history for this message
Erwan Le Bonniec (noisynoise) wrote :

@Fairbanks : we used 'kolla-ansible reconfigure' to change the dhcp_lease_duration option to its minimum value, in '/etc/kolla/config/neutron/dhcp_agent.ini' :
[DEFAULT]
dhcp_lease_duration = 30

So problem still here but minimalized...

Revision history for this message
Fairbanks. (fairbanks) wrote :

Hello,

Well i figured it out.
In our case it was a kernel problem.
I was running kernel v4.8 (HWE) on 16.04.
When i switched to v4.4 (GA) it all worked like charm.
I think this has something to do with the hardware, since on other environments i can use the HWE kernel without any problems.

So, i don't know if this could be the same case here, but maybe try an other kernel version :).

Revision history for this message
Jorge Niedbalski (niedbalski) wrote :

I marked this bug as a duplicate of bug #1683982, in that bug we are tracking the kernel bug affecting some of the Ubuntu series.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.