find_ip_via_arp() does not ensure that ARP cache is primed

Bug #1279460 reported by Gavin Panella
32
This bug affects 4 people
Affects Status Importance Assigned to Milestone
MAAS
Won't Fix
High
Unassigned

Bug Description

find_ip_via_arp() is the way we find BMCs on a network without having to
track IP addresses. It reads the kernel's ARP cache (via `arp -n`).
However, the kernel's ARP cache gets populated as needed; if the kernel
has not needed to recently look-up a hardware address from the same IP
address then it's possible that the ARP cache will not contain a
matching record.

If a match is not found, we can prime the cache with a broadcast ping:

  ping -nbc 3 $broadcast_address

e.g.

  ping -nbc 3 192.168.1.255

This would send 3 broadcast pings 1 second apart. Fwiw, I chose 3 out of
the air; 1 ping might be enough, or perhaps we should do 10; we can
experiment.

This appears to help populate the cache so that we can then try
looking-up an IP address again.

Additionally, we should probably use `ip neigh show` as a more modern
replacement for `arp -n`, but maybe it doesn't matter. The output seems
more machine-readable at least.

There is also a user-space ARP daemon that might be worth investigating,
and there may be other options.

Related branches

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1279460] [NEW] find_ip_via_arp() does not ensure that ARP cache is primed

On Wednesday 12 Feb 2014 18:23:23 you wrote:
> ping -nbc 3 192.168.1.255

This doesn't work on my network:

$ ping -nbc 3 192.168.1.255
WARNING: pinging broadcast address
PING 192.168.1.255 (192.168.1.255) 56(84) bytes of data.

--- 192.168.1.255 ping statistics ---
3 packets transmitted, 0 received, 100% packet loss, time 2000ms

I'm not sure if you can rely on all devices responding to broadcast pings.
(Also I'm not sure if it populates the arp cache behind the scenes.)

What we might want to do instead is use a bastardisation of my BMC detection
script which will poke around the network a bit looking for BMCs on the IPMI
port. It uses nmap, but we could emulate its behaviour with a blind scan of
an IP range.

Revision history for this message
Gavin Panella (allenap) wrote :

> This doesn't work on my network:
>
> $ ping -nbc 3 192.168.1.255
> WARNING: pinging broadcast address
> PING 192.168.1.255 (192.168.1.255) 56(84) bytes of data.
>
> --- 192.168.1.255 ping statistics ---
> 3 packets transmitted, 0 received, 100% packet loss, time 2000ms

I'd like to know why this doesn't work. Can you double-check that your
network is 192.168.1.0/24?

As has been mentioned elsewhere, we could do a gentle ping scan with
nmap every so often if this doesn't work.

Revision history for this message
Julian Edwards (julian-edwards) wrote : Re: [Bug 1279460] Re: find_ip_via_arp() does not ensure that ARP cache is primed

On Thursday 13 Feb 2014 11:22:03 you wrote:
> > This doesn't work on my network:
> >
> > $ ping -nbc 3 192.168.1.255
> > WARNING: pinging broadcast address
> > PING 192.168.1.255 (192.168.1.255) 56(84) bytes of data.
> >
> > --- 192.168.1.255 ping statistics ---
> > 3 packets transmitted, 0 received, 100% packet loss, time 2000ms
>
> I'd like to know why this doesn't work. Can you double-check that your
> network is 192.168.1.0/24?

It definitely is.

    inet addr:192.168.1.105 Bcast:192.168.1.255 Mask:255.255.255.0

Not everything listens to ICMP broadcasts I expect.

> As has been mentioned elsewhere, we could do a gentle ping scan with
> nmap every so often if this doesn't work.

The other thing we can do is try to connect to a port on every IP in the range
we're interested in. It's probably 3 lines of code if we just enough to force
a SYN and then close the socket. That way, we save a dependency on nmap
(which puts the willies up some network admins).

Revision history for this message
Gavin Panella (allenap) wrote :

Fwiw, `nmap -sn -n 192.168.1.0/24 -oX -` should be a fairly gently probe, outputting XML that we can do something with. However, it's better when run as root (or perhaps there's a way to grant the necessary privileges to a mortal), because it's able to do an ARP scan; no need for ping/syn-ack/connect()/... scans. See the "reason" attributes in the respective XML dumps, or run nmap with --reason.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Check out python-nmap. It's in universe so if you want it in main, hurry up :)

Revision history for this message
Gavin Panella (allenap) wrote :

> Check out python-nmap. It's in universe so if you want it in main, hurry up :)

Neat. The code looks good too, though I've not checked for tests.

However, I think we have a fairly narrow use-case, plus we need to run nmap as root (or we need to figure out how to get CAP_NET_ADMIN, I think). I'm not sure the cost of getting this into main will pay off. We only need to run nmap then run an XPath expression or two on the output.

Revision history for this message
Julian Edwards (julian-edwards) wrote :

Another option is to look for the MAC in the leases that we scanned, and get the IP that way.

Revision history for this message
Raphaël Badin (rvb) wrote :

Instead of `nmap`, we could look in the DHCP lease file as a backstop.

Revision history for this message
Jeroen T. Vermeulen (jtv) wrote :

The good news is that provisioningserver.dhcp.leases already caches the leases. In lp:~jtv/maas/resolve-mac-from-leases I sketch out a change that uses the leases instead of the ARP cache for lookup.

Gavin Panella (allenap)
Changed in maas:
importance: High → Critical
Revision history for this message
Julian Edwards (julian-edwards) wrote :

The leases approach won't work after the static IP work lands, because IPs are not assigned until acquisition time.

In fact, the change that is about to land makes the AMT power driver useless as it stands, unless the AMT box itself is configured to use a static IP internally.

Changed in maas:
importance: Critical → High
Changed in maas:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.