Comment 37 for bug 1586528

Revision history for this message
Graham Bosworth (graham-bozikins) wrote :

Due to a lack of failures it has been a while since I last wrote, but I think I may have found a possible cause of the problem. Confirmation from other sources would be necessary, of course.

For the previous participants, and anyone else who lands here, I think that the failure may go like this:-

- a network problem prevents reception of a D.H.C.P. lease offer from the server
- NetworkManager starts a new D.H.C.P. client thread, but the old thread is still active and the lease has not expired
- these two steps may be repeated several times, starting several D.H.C.P. threads, with identical or different addresses, possibly receiving and transmitting traffic
- some time (possibly hours) later, another instance of the client has leased the address, the original client's lease expires
- avahi surrenders the address; the host has lost its network connection

In my case, some months earlier, it became apparent that the D.H.C.P. server was connected to a network switch port that was occasionally dropping packets - hence errors were reported for the server's and client's network adaptors. The packet retry count and network load increased. I changed the physical network arrangement and there were no more failures - until I switched off the server for maintenance.

Here is a test which might support this theory. Note that the server was not available between 05:38 and 07:17 on 2018-01-24. The client's network connection was not lost until 2018-01-30.

[LinuxMint] graham@yoyo $ ps -ef | grep -v "grep" | grep -e "PID" -e "dhc"
UID PID PPID C STIME TTY TIME CMD
root 2444 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 3327 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 4550 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 5070 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 6094 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 6616 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 7605 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 8179 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 9152 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 9442 1 0 Jan24 ? 00:00:00 /sbin/dhclient -v eth0
root 21279 11291 0 03:26 ? 00:00:00 /sbin/dhclient -d -q -sf /usr/lib/NetworkManager/nm-dhcp-helper -pf /var/run/dhclient-eth0.pid -lf /var/lib/NetworkManager/dhclient-3339a4b3-ef0a-3db1-acf1-9b810d040dfe-eth0.lease -cf /var/lib/NetworkManager/dhclient-eth0.conf eth0
Tue Jan 30 08:38:33 ~

The start times of those dhclient instances were:- 05:50 05:55 06:05 06:11 06:21 06:27 06:37 06:43 06:53 06:59.

The log server had many instances of exchanges between kevin the server and yoyo the client like this:-
2018-01-30T07:25:11+00:00 kevin dhcpd[3612]: DHCPREQUEST for 192.168.42.187 from 00:25:64:72:72:14 via enp0s19
2018-01-30T07:25:11+00:00 kevin dhcpd[3612]: DHCPACK on 192.168.42.187 to 00:25:64:72:72:14 via enp0s19
2018-01-30T07:25:17+00:00 yoyo dhclient[2444]: DHCPREQUEST of 192.168.42.161 on eth0 to 192.168.42.129 port 67 (xid=0x182b3059)
2018-01-30T07:25:17+00:00 kevin dhcpd[3612]: reuse_lease: lease age 306 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.42.161
2018-01-30T07:25:17+00:00 kevin dhcpd[3612]: DHCPREQUEST for 192.168.42.161 from 00:25:64:72:72:14 via enp0s19
2018-01-30T07:25:17+00:00 kevin dhcpd[3612]: DHCPACK on 192.168.42.161 to 00:25:64:72:72:14 via enp0s19
2018-01-30T07:25:18+00:00 yoyo dhclient[8179]: DHCPREQUEST of 192.168.42.162 on eth0 to 255.255.255.255 port 67 (xid=0x537943a)
2018-01-30T07:25:18+00:00 kevin dhcpd[3612]: reuse_lease: lease age 4397 (secs) under 25% threshold, reply with unaltered, existing lease for 192.168.42.162

I have tried to be brief in this report, but I can provide more information if anyone wants it.