[mos] dnsmasq (for neutron-dhcp-agent) is sometimes configured with duplicate leases

Bug #1295715 reported by Brad Durrow
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Mirantis OpenStack
Invalid
Medium
MOS Neutron

Bug Description

Today I had an instance that didn't get an IP on boot. I looked in the dnsmasq logs and saw this:
2014-03-21T14:52:34.197649+00:00 err: duplicate dhcp-host IP address 10.29.8.6 at line 16 of /var/lib/neutron/dhcp/50644057-b518-4e85-843a-3321c9a4073f/host

I confirmed in horizon that there was no instance with a duplicate IP.
I went to the node that the log came from and removed the first instance of 10.29.8.6
then killed dnsmasq with -HUP (then confirmed it was still up with the same PID).

I rebooted the instance and it got an IP this time.

{"build_id": "2013-12-27_00-24-14", "ostf_sha": "83ada35fec2664089e07fdc0d34861ae2a4d948a", "build_number": "214", "nailgun_sha": "af1598bcc9faf468d4d9265cc5c51fa8cea53136", "fuelmain_sha": "17eed776b30886851ae0042fa7a30184f5cd8eb6", "astute_sha": "6ce36837882399e0d3bb1ffdb2c3b2d8dcb84b54", "release": "4.0", "fuellib_sha": "eebe07913ee09311c8e7c9231f6785081327dc0e"}

Revision history for this message
Vladimir Kuklin (vkuklin) wrote :

Brad. We will try to reproduce the issue, but it is really hard to undestand which flow led to this problem. Would you please attach diagnostic snapshot as usual?

Changed in fuel:
milestone: none → 5.0
tags: added: backports-4.1.1
Changed in fuel:
importance: Undecided → Medium
status: New → Incomplete
Revision history for this message
Brad Durrow (l-brad) wrote :

This problem is quite a bit more serious now. I have several IPs that are duplicated in the lease file every time I add or remove an instance. I can reliably cause the problem by trying to launch an instance without a large enough root disk for the image.

Revision history for this message
Brad Durrow (l-brad) wrote :

I couldn't find the mac addresses of the duplicate (old) leases in any of the recent logs, so I thought it might have to do with the dhcp agent cache. `crm resource restart p_neutron-dhcp-agent` (at least temporarily) resolved the problem.

Revision history for this message
Mike Scherbakov (mihgen) wrote :

Thanks for posting and debugging this issue.
I have to move this to 5.1: we entered soft code freeze phase, so all bugs with Medium and Low are moved to the next release version. Please provide more details if you find, and feel free to raise questions / concerns about this issue in the mailing list.

Changed in fuel:
milestone: 5.0 → 5.1
Revision history for this message
tdsparrow (sqallowlee) wrote :

I met the same issue on neutron 2013.2.1. Only our vms will be destroyed in short time, so i can find mac address for the old record in dnsmasq. There're seven hosts with dhcp-agents and each of them report different duplication.

I suspect the reason of my issue is that dhcp agent cannot update the heartbeat timestamp on time, and agents_db.py took them as down, no release notification will be sent to this agent. my system uses the default conf for agent_down_time(5s) and report_interval(4s). After changing report_interval to 3 for one host, this issue was gone on the host for 12 hours.

It's almost the same logic in code from HEAD.

Revision history for this message
tdsparrow (sqallowlee) wrote :

I got it wrong, it's 2013.2, default value for agent_down_time has been changed to 9 in 2013.2.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

The bug could be fixed in 4.1.1 release, please try to reproduce and provide a feedback

Changed in fuel:
assignee: nobody → Fuel QA Team (fuel-qa)
Ilya Shakhat (shakhat)
Changed in mos:
assignee: nobody → MOS Neutron (mos-neutron)
Ilya Shakhat (shakhat)
Changed in mos:
status: New → Incomplete
Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

I think this is the upstream version of this issue
https://bugs.launchpad.net/neutron/+bug/1288493

Revision history for this message
Eugene Nikanorov (enikanorov) wrote :

Looks like this issue is applicable for havana, where the same IP address could be reused after allocate-deallocate operation.
Under the load this could lead to duplicate entries.

In icehouse ip generation logic was changed so the same IP address is not reused immediately after deallocation, so such issue may not appear that often, so it would be much harder to repro this with Icehouse or upstream.

tags: added: neutron
Dmitry Ilyin (idv1985)
summary: - dnsmasq (for neutron-dhcp-agent) is sometimes configured with duplicate
- leases
+ [mos] dnsmasq (for neutron-dhcp-agent) is sometimes configured with
+ duplicate leases
Changed in mos:
importance: Undecided → Medium
milestone: none → 6.0
Changed in fuel:
milestone: 5.1 → 6.0
Revision history for this message
Alexander Ignatov (aignatov) wrote :

Moved to Confirmed state because it's not clear whether it's fixed in upstream before Juno.

no longer affects: fuel
Changed in mos:
status: Incomplete → Confirmed
Revision history for this message
Ilya Shakhat (shakhat) wrote :

Unreproducible on 6.0, the issue suspected to be fixed in Icehouse

Changed in mos:
status: Confirmed → Won't Fix
Revision history for this message
Dmitry Mescheryakov (dmitrymex) wrote :

Invalid is a more proper state for unreproducible issues.

Changed in mos:
status: Won't Fix → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.