Broadcast address detection is unreliable on Linux
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
EPICS Base |
Fix Released
|
Low
|
Michael Ritzert |
Bug Description
I noticed that I have a Linux system where broadcasts are not sent to all configured broadcast addresses. The reason is easily identified: EPICS base still uses the SIOCGIFBRDADDR way to identify the addresses in osiSockDiscover
2: eno1: <BROADCAST,
link/ether 44:39:d4:9d:25:b6 brd ff:ff:ff:ff:ff:ff
inet 10.49.126.61/24 brd 10.49.126.255 scope global dynamic noprefixroute eno1
valid_lft 834648sec preferred_lft 834648sec
inet 10.49.125.2/24 brd 10.49.125.255 scope global eno1
valid_lft forever preferred_lft forever
In this case, only 10.49.126.255 is detected by EPICS (twice):
ifreqNext() pifreq 0x55ccbc0cc5f0, size 0x28, ifr 0x0x55ccbc0cc618
osiSockDiscover
osiSockDiscover
osiSockDiscover
ifreqNext() pifreq 0x55ccbc0cc618, size 0x28, ifr 0x0x55ccbc0cc640
osiSockDiscover
osiSockDiscover
found broadcast addr = a317eff
osiSockDiscover
ifreqNext() pifreq 0x55ccbc0cc640, size 0x28, ifr 0x0x55ccbc0cc668
osiSockDiscover
osiSockDiscover
found broadcast addr = a317eff
osiSockDiscover
The workaround here would be to put the second IP on another interface, i.e. eno1:1, but I believe the software shouldn't dictate that.
The more modern way to identify all broadcast addresses is via AF_NETLINK, and the code is not overly complex. I actually do have a first patch ready, see the attachment. Obviously, this code should not be in the generic directory, and it should probably only be selected when #ifdef AF_NETLINK to also consider older versions of Linux (I think kernel ≥ 2.2 (from 1999) should work, but the oldest kernel I actually tested with the netlink code was 3.2.102). So this is just an RFC for now. Please excuse the bad formatting. I wasn't sure if there is a style file available for clang-format or the likes and didn't bother to do it by hand for this first demo.
With the patch, the behavior is changed in one subtle way: Interfaces that are down are not ignored, because this information is not provided for RTM_GETADDR but for RTM_GETLINK. So I'd have to keep track of which interfaces are down in between these two calls. For other corner cases (loopback, PTP), the behavior should be identical as explained in the code.
Obviously, osiLocalAddrOnce also still uses the old interface, but since it should only find any address, this should be OK.
Changed in epics-base: | |
status: | In Progress → Fix Released |
@mdavidsaver please comment. Is there any overlap here with RTEMS-5?
I suspect we'd want anything like this to get fixed on the 3.15 branch as well, in which case we should target this bug to both the 3.15 and 7.0 series.