NetworkManager does not update IPv4 address lifetime even though DHCP lease was successfully renewed
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
NetworkManager |
Confirmed
|
Medium
|
|||
network-manager (Ubuntu) |
Fix Released
|
Low
|
Unassigned | ||
Xenial |
Fix Released
|
Low
|
Unassigned |
Bug Description
SRU REQUEST:
Debdiff (nm-dhcp-
Fixed in current Ubuntu zesty and newer: Bionic uses NM 1.8.x. This bug was fixed upstream in 1.4.
[Impact]
* nm-dhcp-helper sometimes fails to notify NetworkManager of a DHCP
lease renewal due to a DBus race condition.
* Upstream NetworkManager 1.4 fixes the race condition by changing
nm-dhcp-helper's DBus notification from signal "Event" to method
"Notify".
* Original bug submitter backported NM 1.4's nm-dhcp-helper notification fix to NM 1.2. This SRU request applies that backported patch to Xenial's
NM 1.2.x.
[Test Case]
* Not reliably reproducible. Out of hundreds of machines, only a
dozen or so fail to notify NetworkManager of a DHCP lease
renewal about 30-50% of the time. (i.e. It's always the same
handful of machines that fail.)
* All such machines with the patched packages have been fine for weeks,
over many dozens of lease renewals.
[Regression Potential]
* The patch changes both nm-dhcp-helper and NetworkManager itself. As soon as the new packages are unpacked, the new nm-dhcp-helper will be used on DHCP lease renewals, with the new Notify mechanism. Since the running, old NetworkManager is still expecting Event notifications, the patched nm-dhcp-helper has fallback capability to Event.
* Once NetworkManager is restarted and is running the patched version, it will have the new Notify support.
[Other Info]
* Upstream bug w/ patch: https:/
* RHEL bug with links to the 1.4 commits from which the patch was
derived: https:/
* NOTE: The final comment on the upstream GNOME bug claims that the fix is incomplete. However, it is possible that the running NetworkManager was not restarted (see Regression Potential notes above), which is why nm-dhcp-helper is falling back to Event. The remainder of the log messages in that final comment are from a custom wrapper the submitter was running around nm-dhcp-helper. I have deployed the exact same patch (without said wrapper) to real-world systems and tested extensively, and see nothing but successful DHCP lease renewal notifications using D-Bus Notify, not D-Bus Event.
----
I've found an issue on some of our Xenial office machines, causing NetworkManager to drop its IP address lease in some cases when it shouldn't. I'm not sure if the actual bug is in NetworkManager or perhaps dbus or dhclient, but I'll do my best to help to figure out where it is.
What appears to happen:
* NetworkManager is informed of a new IPv4 lease.
* During the lease, dhclient keeps it fresh by renewing it using DHCPREQUESTs regularly.
* In spite of this, NetworkManager drops the IP address from the interface when the last reported lease time expires.
This happens on various machines, once every few days. We are using a failover DHCP configuration using two machines (192.168.0.3 'bonaire' and 192.168.0.4 'curacao').
The machine where I've done the debugging is called 'pampus' (192.168.0.166). As you can see in the logs, at 01:21:06 NetworkManager reports a new lease with lease time 7200.
jun 07 01:21:06 pampus dhclient[1532]: DHCPREQUEST of 192.168.0.166 on eth0 to 192.168.0.4 port 67 (xid=0x3295b440)
jun 07 01:21:06 pampus dhclient[1532]: DHCPACK of 192.168.0.166 from 192.168.0.4
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
jun 07 01:21:06 pampus NetworkManager[
After this, dhclient is supposed to keep the lease fresh, which it does. E.g. at 03:13:19 you can see a DHCPREQUEST and DHCPACK; I've seen this DHCPACK in a tcpdump and it contains a new lease time of 7200 seconds.
jun 07 03:13:19 pampus dhclient[1532]: DHCPREQUEST of 192.168.0.166 on eth0 to 192.168.0.4 port 67 (xid=0x3295b440)
jun 07 03:13:19 pampus dhclient[1532]: DHCPACK of 192.168.0.166 from 192.168.0.4
jun 07 03:13:19 pampus dhclient[1532]: bound to 192.168.0.166 -- renewal in 2708 seconds.
However, at 03:21:07 (exactly 2 hours and 1 second after the last lease reported by NetworkManager) Avahi and NTP report that the IP address is gone:
jun 07 03:21:07 pampus avahi-daemon[1167]: Withdrawing address record for 192.168.0.166 on eth0.
jun 07 03:21:07 pampus avahi-daemon[1167]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.0.166.
jun 07 03:21:07 pampus avahi-daemon[1167]: Interface eth0.IPv4 no longer relevant for mDNS.
jun 07 03:21:08 pampus ntpd[18832]: Deleting interface #3 eth0, 192.168.0.166#123, interface stats: received=2512, sent=2549, dropped=0, active_time=111819 secs
So I suspect NetworkManager dropped the IP address from the interface, because it wasn't informed by dhclient that the lease was renewed. The logs don't explicitly say this, so I may have to turn on more verbose debugging logs in NetworkManager or dhclient to verify this.
ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: network-manager 1.2.6-0ubuntu0.
ProcVersionSign
Uname: Linux 4.4.0-66-generic x86_64
NonfreeKernelMo
ApportVersion: 2.20.1-0ubuntu2.6
Architecture: amd64
Date: Wed Jun 7 14:48:59 2017
IfupdownConfig:
# interfaces(5) file used by ifup(8) and ifdown(8)
auto lo
iface lo inet loopback
InstallationDate: Installed on 2016-11-04 (214 days ago)
InstallationMedia: Ubuntu 14.04.5 LTS "Trusty Tahr" - Release amd64 (20160803)
IpRoute:
default via 192.168.0.5 dev eth0 proto static metric 100
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.166
192.168.0.0/24 dev eth0 proto kernel scope link src 192.168.0.166 metric 100
IwConfig:
lo no wireless extensions.
eth1 no wireless extensions.
eth0 no wireless extensions.
NetworkManager.
[main]
NetworkingEnab
WirelessEnable
WWANEnabled=true
WimaxEnabled=true
RfKill:
SourcePackage: network-manager
UpgradeStatus: No upgrade log present (probably fresh install)
nmcli-con:
NAME UUID TYPE TIMESTAMP TIMESTAMP-REAL AUTOCONNECT AUTOCONNECT-
Wired connection 1 37da1802-
Wired connection 2 a040d7fe-
nmcli-dev:
DEVICE TYPE STATE DBUS-PATH CONNECTION CON-UUID CON-PATH
eth0 ethernet connected /org/freedeskto
eth1 ethernet unavailable /org/freedeskto
lo loopback unmanaged /org/freedeskto
nmcli-nm:
RUNNING VERSION STATE STARTUP CONNECTIVITY NETWORKING WIFI-HW WIFI WWAN-HW WWAN
running 1.2.6 connected started full enabled enabled enabled enabled enabled
summary: |
- NetworkManager seems to drop IPv4 DHCP lease even though it was - successfully renewed + NetworkManager does not update IPv4 address lifetime even though DHCP + lease was successfully renewed |
Changed in network-manager: | |
importance: | Unknown → Medium |
status: | Unknown → Confirmed |
Changed in network-manager (Ubuntu): | |
status: | Confirmed → Fix Released |
Changed in network-manager (Ubuntu Xenial): | |
status: | New → Triaged |
importance: | Undecided → Low |
description: | updated |
description: | updated |
Changed in network-manager (Ubuntu Xenial): | |
assignee: | nobody → Luke Faraone (lfaraone) |
assignee: | Luke Faraone (lfaraone) → nobody |
By setting log_level to DEBUG, I could confirm from the logs that there is a miscommunication between dhclient and NetworkManager causing this issue.
It looks like it is not NetworkManager that removes the IPv4 address from the interface; the address is removed from the interface automatically by the kernel because its lifetime expired:
jun 13 19:26:21 cuba NetworkManager[ 28642]: <debug> [1497374781.0763] platform: address: adding or updating IPv4 address: 192.168.0.55/24 lft 7200sec pref 7200sec lifetime 99735-0[7200,7200] dev 3 src unknown 28642]: <debug> [1497374781.0763] platform: signal: address 4 changed: 192.168.0.55/24 lft 7200sec pref 7200sec lifetime 99735-99735[ 7200,7200] dev 3 src kernel 28642]: <debug> [1497381981.3191] platform: signal: address 4 removed: 192.168.0.55/24 lft 0sec pref 0sec lifetime 106935- 99735[7200, 7200] dev 3 src kernel
jun 13 19:26:21 cuba NetworkManager[
jun 13 21:26:21 cuba NetworkManager[
The "address: adding or updating IPv4 address" message comes right after receiving a message from dhclient, because it sent a DHCPREQUEST and received a DHCPACK:
jun 13 19:26:21 cuba dhclient[13154]: DHCPREQUEST of 192.168.0.55 on eth1 to 192.168.0.3 port 67 (xid=0xfd7483b) 28642]: <debug> [1497374781.0748] bus-manager: (dhcp) accepted connection 0x7f9ae000fc60 on private socket 28642]: <debug> [1497374781.0759] dhcp4 (eth1): DHCP reason 'RENEW' -> state 'bound'
jun 13 19:26:21 cuba dhclient[13154]: DHCPACK of 192.168.0.55 from 192.168.0.3
jun 13 19:26:21 cuba NetworkManager[
jun 13 19:26:21 cuba NetworkManager[
Within the 2 hours lifetime of the IP address, another DHCPREQUEST & DHCPACK occurs, and some communication is attempted between dhclient and NetworkManager, but this doesn't result in the state change as seen above:
jun 13 20:20:11 cuba dhclient[13154]: DHCPREQUEST of 192.168.0.55 on eth1 to 192.168.0.3 port 67 (xid=0xfd7483b) 28642]: <debug> [1497378011.6527] bus-manager: (dhcp) accepted connection 0x7f9ae0019060 on private socket 28642]: <debug> [1497378011.6527] bus-manager: (dhcp) closed connection 0x7f9ae0019060 on private socket
jun 13 20:20:11 cuba dhclient[13154]: DHCPACK of 192.168.0.55 from 192.168.0.3
jun 13 20:20:11 cuba NetworkManager[
jun 13 20:20:11 cuba NetworkManager[
jun 13 20:20:11 cuba dhclient[13154]: bound to 192.168.0.55 -- renewal in 2731 seconds.
So, crucially, there is an attempted communication between dhclient and NetworkManager, but this doesn't result in an update to the lifetime of the IPv4 address. I'll focus my investigation on this. I would appreciate a reply from maintainers or upstream that they are aware of this issue.