DHCPNAK after neutron-dhcp-agent restart

Bug #1345947 reported by Han Zhou
40
This bug affects 6 people
Affects Status Importance Assigned to Milestone
grenade
Invalid
Undecided
Han Zhou
neutron
Fix Released
High
Alexey I. Froloff
Icehouse
Fix Released
High
Kevin Bringard
Juno
Fix Released
High
Ihar Hrachyshka

Bug Description

After rolling out a configuration change, we restarted neutron-dhcp-agent service, and then dnsmasq logs start flooding: DHCPNAK ... lease not found.
DHCPNAK is replied by dnsmasq for all DHCPREQUEST renews from all VMs. However the MAC and IP pairs exist in host files.
The log flooding increases when more and more VMs start renewing and they keep retrying until IP expire and send DHCPDISCOVER and reinit the IP.
The log flooding gradually disappears when the VMs IP expire and send DHCPDISCOVER, to which dnsmasq respond DHCPOFFER properly.

Analysis:
I noticed that option --leasefile-ro is used in dnsmasq command when started by neutron dhcp-agent. According to dnsmasq manual, this option should be used together with --dhcp-script to customize the lease database. However, the option --dhcp-script was removed when fixing bug 1202392.
Because of this, dnsmasq will not save lease information in persistent storage, and when it is restarted, lease information is lost.

Solution:
Simply replace --leasefile-ro by --dhcp-leasefile=<path to dhcp runtime files>/lease would solve the problem. (patch attached)

Revision history for this message
Han Zhou (zhouhan) wrote :
Changed in neutron:
assignee: nobody → Han Zhou (zhouhan)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/108272

Changed in neutron:
status: New → In Progress
Han Zhou (zhouhan)
tags: added: low-hanging-fruit
tags: removed: low-hanging-fruit
Changed in neutron:
importance: Undecided → High
Changed in neutron:
importance: High → Low
Revision history for this message
Brian Haley (brian-haley) wrote :

This seems to be higher than a low priority. We recently did an upgrade and started seeing this today. One of the side-effects is that a VM could de-configure it's IP address for a short time (right after the NAK), leading to unreachability. It always seems to get the same IP on the next DISCOVER, but it's still noticable to customers.

Changed in neutron:
importance: Low → High
Revision history for this message
Armando Migliaccio (armando-migliaccio) wrote :

Grenade is affected because we need an upgrade patch to reflect the change in dhcp filters.

Changed in grenade:
status: New → Confirmed
assignee: nobody → Han Zhou (zhouhan)
Changed in grenade:
assignee: Han Zhou (zhouhan) → Armando Migliaccio (armando-migliaccio)
status: Confirmed → In Progress
Changed in grenade:
assignee: Armando Migliaccio (armando-migliaccio) → Han Zhou (zhouhan)
Changed in neutron:
assignee: Han Zhou (zhouhan) → Carl Baldwin (carl-baldwin)
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

I guess we should kill release note at: https://wiki.openstack.org/wiki/ReleaseNotes/Kilo#Known_Issues_6 Correct?

Revision history for this message
Han Zhou (zhouhan) wrote :

We should not kill the release note but update it so that dhcp.filter is applied for the dead code cleanup: https://review.openstack.org/#/c/152398/

Changed in neutron:
assignee: Carl Baldwin (carl-baldwin) → Alexey I. Froloff (raorn)
Revision history for this message
Ihar Hrachyshka (ihar-hrachyshka) wrote :

@Han, it seems that rootwrap filter update is not required though? Meaning, Kilo agent will still work with old filter?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.openstack.org/152080
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=74a16fde1c9972dc3c5d07215ca9d5e8f2e23d70
Submitter: Jenkins
Branch: master

commit 74a16fde1c9972dc3c5d07215ca9d5e8f2e23d70
Author: Alexey I. Froloff <email address hidden>
Date: Mon Feb 2 13:44:14 2015 +0300

    Pass '--dhcp-authoritative' option to dnsmasq

    When dnsmasq is restarted, it forgets about all leases (since it runs
    with leasefile-ro option). When client tries to renew its lease, dnsmasq
    sends DHCPNAK reply with message "lease not found". Then client shuts
    down the network and re-request lease from DHCP server (and gets exactly
    same IP address). There's a small network downtime which affects
    services, like zookeeper, running in VMs.

    Change-Id: Ieff0236670c1403b5d79ad8e50d7574c1b694e34
    Closes-Bug: #1345947

Changed in neutron:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/juno)

Fix proposed to branch: stable/juno
Review: https://review.openstack.org/153182

Thierry Carrez (ttx)
Changed in neutron:
milestone: none → kilo-2
status: Fix Committed → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/icehouse)

Fix proposed to branch: stable/icehouse
Review: https://review.openstack.org/154052

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/juno)

Reviewed: https://review.openstack.org/153182
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=ed799e38fe740621776aab51c95ec1db248af997
Submitter: Jenkins
Branch: stable/juno

commit ed799e38fe740621776aab51c95ec1db248af997
Author: Alexey I. Froloff <email address hidden>
Date: Mon Feb 2 13:44:14 2015 +0300

    Pass '--dhcp-authoritative' option to dnsmasq

    When dnsmasq is restarted, it forgets about all leases (since it runs
    with leasefile-ro option). When client tries to renew its lease, dnsmasq
    sends DHCPNAK reply with message "lease not found". Then client shuts
    down the network and re-request lease from DHCP server (and gets exactly
    same IP address). There's a small network downtime which affects
    services, like zookeeper, running in VMs.

    Change-Id: Ieff0236670c1403b5d79ad8e50d7574c1b694e34
    Closes-Bug: #1345947
    Co-Authored-By: Kevin Bringard <email address hidden>
    (cherry picked from commit 74a16fde1c9972dc3c5d07215ca9d5e8f2e23d70)

tags: added: in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/icehouse)

Reviewed: https://review.openstack.org/154052
Committed: https://git.openstack.org/cgit/openstack/neutron/commit/?id=c3a88a578f3e484b5dc2a227a46e8546e581f8a3
Submitter: Jenkins
Branch: stable/icehouse

commit c3a88a578f3e484b5dc2a227a46e8546e581f8a3
Author: Alexey I. Froloff <email address hidden>
Date: Mon Feb 2 13:44:14 2015 +0300

    Pass '--dhcp-authoritative' option to dnsmasq

    When dnsmasq is restarted, it forgets about all leases (since it runs
    with leasefile-ro option). When client tries to renew its lease, dnsmasq
    sends DHCPNAK reply with message "lease not found". Then client shuts
    down the network and re-request lease from DHCP server (and gets exactly
    same IP address). There's a small network downtime which affects
    services, like zookeeper, running in VMs.

    Change-Id: Ieff0236670c1403b5d79ad8e50d7574c1b694e34
    Closes-Bug: #1345947
    (cherry picked from commit 74a16fde1c9972dc3c5d07215ca9d5e8f2e23d70)

tags: added: in-stable-icehouse
Alan Pevec (apevec)
tags: removed: in-stable-icehouse in-stable-juno
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Carl Baldwin (<email address hidden>) on branch: master
Review: https://review.openstack.org/108272
Reason: Han, your work here has been greatly appreciated. I hope to see more contributions from you. Sometimes we do find a different way to go.

Thierry Carrez (ttx)
Changed in neutron:
milestone: kilo-2 → 2015.1.0
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/186003
Reason: The Kevin's patch is merged, so we don't need that one anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/juno)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/juno
Review: https://review.openstack.org/185996
Reason: Kevin's patch is merged, now it's not needed to revert.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: master
Review: https://review.openstack.org/108272
Reason: https://review.openstack.org/#/c/185486/ is merged handling both the scenario that this patch tried to solve, and a regression that was introduced by an alternative patch that ended merged in the tree; so no need for this patch anymore.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (stable/icehouse)

Change abandoned by Ihar Hrachyshka (<email address hidden>) on branch: stable/icehouse
Review: https://review.openstack.org/185971
Reason: Not needed anymore, Kevin's patch is merged.

Revision history for this message
Alexander Bozhenko (alexbozhenko) wrote :

The fix is introducing new issue that is fixed here:
https://bugs.launchpad.net/neutron/+bug/1457900

Roman Rufanov (rrufanov)
tags: added: customer-found support
Revision history for this message
Sean Dague (sdague) wrote :

This grenade bug was last updated over 180 days ago, as grenade
is a fast moving project and we'd like to get the tracker down to
currently actionable bugs, this is getting marked as Invalid. If the
issue still exists, please feel free to reopen it.

Changed in grenade:
status: In Progress → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.