dhcp-agent should send a grace ARP after assigning IP address in dhcp namespace

Bug #1672433 reported by George Shuklin
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

Normally dhcp agents should not provide routable services. There is one exception: monitoring. Checking dhcp agents availability by sending PING requests is very easy and sits well with existing monitoring frameworks. Outside of checking of availability of DHCP agent itself that check allows to test network connectivity between DHCP-agent and network equipment.

There is a specific scenario for DHCP agent when that check gives false reports.

Scenario:
1. Boot instance with a give IP, assure that instance is UP (replies to pings).
2. Delete instance.
3. Add dhcp agent to net network where IP (from step1) is allocated in such a way that it would take that IP (from step1).

Expected behavior: DHCP agent should answer pings.
Actual behavior: DHCP agent does not reply to pings for up to 4 hours, than spontaneously start to reply.

Reason: Instance (from step1) updated ARP table on the router. When instance was removed and DHCP agent start listen on that IP, it didn't send gracious (probe) ARP. Normal workflow for DHCP does not require it to send any traffic through router, therefore there is no reason for router to update entry in ARP table. As long as router keep old (invalid) entry pointing to old instance (from step1), DHCP couldn't reply to the pings because every incoming request is coming with wrong MAC destination address.

Proposal: dhcp agent should either:

1. Send some kind of network packet to network gateway (f.e. ping request).
2. Set arp_notify for network interface is uses (f.e.
net.ipv4.conf.tap22dad33f-d7.arp_notify=1), and configure network address _BEFORE_ bringing interface up. If address is configured after interface was brought up, it wouldn't send gracious ARP.

tags: added: l3-ipam-dhcp
description: updated
Revision history for this message
George Shuklin (george-shuklin) wrote :

Just in case, I think linux behavior here is not correct, I send bug to the Kernel bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194879

Changed in neutron:
status: New → Confirmed
assignee: nobody → Ihar Hrachyshka (ihar-hrachyshka)
importance: Undecided → Medium
Changed in neutron:
assignee: Ihar Hrachyshka (ihar-hrachyshka) → nobody
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.openstack.org/529807

Changed in neutron:
assignee: nobody → yong sheng gong (gongysh)
status: Confirmed → In Progress
Revision history for this message
Slawek Kaplonski (slaweq) wrote : auto-abandon-script

This bug has had a related patch abandoned and has been automatically un-assigned due to inactivity. Please re-assign yourself if you are continuing work or adjust the state as appropriate if it is no longer valid.

Changed in neutron:
assignee: yong sheng gong (gongysh) → nobody
status: In Progress → New
tags: added: timeout-abandon
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by Slawek Kaplonski (<email address hidden>) on branch: master
Review: https://review.openstack.org/529807
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Bug closed due to lack of activity, please feel free to reopen if needed.

Changed in neutron:
status: New → Won't Fix
Revision history for this message
George Shuklin (george-shuklin) wrote :

It's really sad that bug has no activity, but problem is present at full, as far as I understand. Linux upstream rejected idea of sending arp request after changing the link state, so the single place to solve it is neutron code.

Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hi George:

This is a very specific scenario, but let me check if I'm not wrong. What you are describing is a situation where a network initially has no DHCP service and a VM port is created, assigning an IP address. The VM is deleted (and the port) and then the network (the subnet in this case) enables the DHCP and the agent port has the same IP address. Is that correct?

Regards.

Changed in neutron:
status: Won't Fix → New
importance: Medium → Undecided
Revision history for this message
George Shuklin (george-shuklin) wrote :

Thank you for attention.

Yes, it's generally so, but you can boot with normal DHCP agent available (actually, you need to boot instance with DHCP agent to let it (instance) to get IP address). The bug starts at the moment, when instance is deleted, and other dhcp agent is created with the same address as it was for the instance.

If this happens, order of configuration for ip link set up and ip address add is such, that Linux do not send grace ARP, and top-of-rack switch is not updating MAC tables (both arp and fdb). As result, DHCP agent is not pingable from outer network for arbitrary long time.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.