dhclient does not renew leases

Bug #1551351 reported by Stefan Bader on 2016-02-29
152
This bug affects 27 people
Affects Status Importance Assigned to Milestone
bind9 (Ubuntu)
High
Unassigned
Xenial
High
Unassigned
isc-dhcp (Ubuntu)
High
Unassigned
Xenial
High
Unassigned

Bug Description

Release: Xenial

I think this only recently started after some bind9 updates triggered a rebuild. When booting dhclient gets started and acquires an IP address, but it does seem to lock up somewhere as it does not renew the lease.
In my environment I set the lease time to 5 minutes, so I notice such things rather soon. I looked on the dhcp server side but saw any further dhcp messages come in from the client side.

Related bugs:
 * bug 1551415: systemctl stop networking hang / timeout

ProblemType: Bug
DistroRelease: Ubuntu 16.04
Package: isc-dhcp-client 4.3.3-5ubuntu8
ProcVersionSignature: Ubuntu 4.4.0-9.24-generic 4.4.3
Uname: Linux 4.4.0-9-generic x86_64
ApportVersion: 2.20-0ubuntu3
Architecture: amd64
Date: Mon Feb 29 12:32:52 2016
DhclientLeases:

SourcePackage: isc-dhcp
UpgradeStatus: No upgrade log present (probably fresh install)

Stefan Bader (smb) wrote :
Stefan Bader (smb) wrote :

As far as I can see, the dhclient process seems to be in some futex wait...

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in isc-dhcp (Ubuntu):
status: New → Confirmed
Doug Smythies (dsmythies) wrote :

This occurs also on my server, and as of two days ago.

My system is not renewing the lease of its IP address from my ISP. (while I have static IP adresses, my ISP still requires using dhcp). The expire time is 24 hours, and exactly 24 hours since my last re-boot, internet stopped working. Confirmed again after another 24 hours, but this time I was watching for it.

Rob Whyte (fudge) wrote :

I too have struggled with this on my server, leaving us without internet until sudo dhclient eth0 or the interace is taken down and brought back up.

Scott Moser (smoser) on 2016-03-01
description: updated
Changed in isc-dhcp (Ubuntu):
importance: Undecided → High
Jamie Strandboge (jdstrand) wrote :

FYI, I just filed 1551855 for dhcpd not stopping when sent the TERM signal. If Stephan's observation that dhclient is in a futex wait, I wonder if the dhcpd server not stopping is suffering from a similar wait.

Doug Smythies (dsmythies) wrote :

Well, that one got set as a duplicate of this one. Please see also bug #1551415.

I did manage to avoid having to re-boot, via the Rob Whyte method "sudo dhclient eth0", however my leases file is now out of date, but I did observe a new lease actually being granted via capturing the packets on tcpdump.

Stefan Bader (smb) wrote :

Doug, was that granted when running dhclient manually? Which would be the initial lease case. When I looked at this last one would get a new lease whenever dhclient got started (once). Same would happen if one killed the daemon (-11) and then did an ifdown && ifup. If there is a means of having a session without network (like vnc/spice screen of a VM or serial over lan).
The dhcp server did not log any requests after the initial one.

Jamie, maybe. A quick check could be cat /proc/<pid of daemon>/stack (as root).

Doug Smythies (dsmythies) wrote :

Stefan, I think the answer is yes. Just a few minutes before my lease was going to expire, I did: "sudo dhclient enp4s0"
However, I didn't notice any change in the leases file, so I expected internet it to stop working after the expire time, but it didn't stop working. So I looked at my tcpdump captures and realized that the lease was granted, it just didn't update the leases file.

I never killed anything first, and dhclient is still running... Oh, crap, I see I now have a bunch of dhclients running.

doug@DOUG-64:~$ ps aux | grep dhcl
root 1049 0.0 0.0 181364 2808 ? Ss Feb29 0:00 /sbin/dhclient -1 -v -pf /run/dhclient.enp4s0.pid -lf /var/lib/dhcp/dhclient.enp4s0.leases -I -df /var/lib/dhcp/dhclient6.enp4s0.leases enp4s0
root 7008 0.0 0.0 181364 2796 ? Ss 07:34 0:00 dhclient enp4s0
root 7368 0.0 0.0 181364 2744 ? Ss 08:01 0:00 dhclient enp4s0
root 7616 0.0 0.0 181364 2688 ? Ss 08:36 0:00 dhclient enp4s0
root 7810 0.0 0.0 181364 2816 ? Ss 08:36 0:00 dhclient enp4s0

I was just trying to avoid having a gap in internet access. I don't really want to do the ifdown && ifup method if I can avoid it.

Stefan Bader (smb) wrote :

@Dough, yes, so basically did not renew but obtained a new lease (which sensibly is the same IP as before). And while the dhcp server still wonders why a client ask for some IP _again_, it also silently marks the lease time as extended for another period. And since the old clients do not die (but play dead), you accumulate a new copy every time you run it (including the initial one).

@Seth, thanks for the link. Reading through it I think it is not only potentially but very much the problem we got. The upload of isc-dhcp that started the breakage was very likely the one "no change rebuild" that was done after upgrading to bind9(.19).

Sounds like one needs to manually twiddle the build back so isc-dhcp gets a bind9 lib that has threads and epoll disabled (as a work-around). Not sure how quickly that can be done. This may still have problems on terminating the process, so maybe needs adaptation of ifdown hooks as well. But that needs to be seen. First goal is to get dhclient/-server back working.

Stefan Bader (smb) wrote :

Looking at the changelog history of bind9, the change to 9.10 looks to be older (around December last year) but then only a recent upload to isc-dhcp mentions "Link with the non-export bind9 libraries". Which I think we can assume is what causes the problem. That went to the archive around noon on Feb-28th which would match my time-line of noticing it.

Stefan Bader (smb) on 2016-03-02
Changed in bind9 (Ubuntu Xenial):
status: New → Confirmed
importance: Undecided → High
Stefan Bader (smb) wrote :

For more data points on this, someone mentioned today that for him the leases get renewed (with dhclient started by network-manager). The command line is slightly different there:

/sbin/dhclient -d -q -sf <lease-script> -pf <pidfile> -lf <leasefile> -cf <conffile> <interface>

while on an ifupdown system where I see the stall it is:

/sbin/dhclient -1 -v -pf <pidfile> -lf <ipv4 leasefile> -I -df <ipv6 leasefile> <interface>

though it is the same command line used for ifupdown before. I got another Xenial/ifupdown based VM which was not recently updated (so leases get renewed still). I was able to attach to the daemon with "strace -p <pid> -ff" and could see that the main thread is in "select(7, [5 6], [], NULL, {276, 557923}". But there is other activity as well from time to time.
On the broken system the dhclient main thread is in "futex(0x7fb54906b0a4, FUTEX_WAIT_PRIVATE, 5, NULL" and there is no other activity. As a result the lease gets dropped which I can see on the server side as that is configured to do DNS updates when DHCP leases become active or inactive.

Stefan Bader (smb) wrote :

Hm, of course, if one starts dhclient in debug mode (foreground) by adding a "-d", everything does work. :/ Interestingly when I attach to it with strace, I see "strace: Process <pid> attached with 4 threads". The foreground process does not act on ctrl-c which at least is consistent with the broken behaviour.
When not working (in the background) strace will not find/report any threads running. The other thing to note (it was mentioned before) is that looking at /proc/<pid>/fd the process has a file handle for the lease file, but looking at the time stamps it does not get updated. Likely because that is something that one of the threads would do.
This also explains why dhclient under network-manager control works. Network-manager starts it in foreground/debug mode (with -d).

Stefan Bader (smb) wrote :

So maybe we should un-duplicate bug 1551415 as not acting on SIGHUP likely is something that could be different and also affect the network-manager case. I think I saw some comment there about shutdown/reboot times being increased. The problem of not keeping leases active is limited to systems using ifup/-down (or anything that starts dhclient in daemon mode).

Stefan Bader (smb) wrote :

I believe I got a vague feeling about what is going wrong. So the isc library which now is shared for bind9 and dhcp is compiled (because bind9 is rather wanted that way) with threads enabled. Before that the export version of the library was not. So an isc_app which dhclient creates and runs was more or less a loop handling events. Now this creates a series of threads.

The problem looks to be that dhclient (and probably dhcpd, too) set up the isc_app context and then do a fork() (in daemon mode). The man page of fork() clearly states that the child will be single threaded. So when dhclient calls dispatch() to run the isc_app this probably only runs a single control loop more or less waiting for the threads (which do not run, maybe not even exist anymore) to do the work.

The signal handling might be related to that as well. With threads only one of them should handle signals and somehow synchronize with the rest. Maybe the new library is not handling that correctly. Since this does not work regardless of running as daemon or not. But I have not understood all the details there.

Robie Basak (racb) on 2016-03-03
Changed in bind9 (Ubuntu Xenial):
milestone: none → ubuntu-16.04
Changed in isc-dhcp (Ubuntu Xenial):
milestone: none → ubuntu-16.04
Harry (harry33) wrote :

I have tested this bug this way.

1) With the newest setup with all upgrades installed (Ubuntu Gnome) there is 6-8 sec slow down when shutting down or rebooting. No delay at start up.

2) I then downgraded the package isc-dhcp-client (4.3.3-5ubuntu9) to isc-dhcp-client (4.3.3-5ubuunutu5). Also the packages libdns-export100 and libisc-export95 (both 9.9.5.dfsg-12.1ubuntu1) must be installed.

3) After one reboot (RAM) there is no more delays when shutting down or rebooting.

Note, that the bug appeared right after isc-dhcp-client (4.3.3-5ubuntu7) was installed on 19th February.

Doug Smythies (dsmythies) wrote :

From Stephan comment 14:

"Hm, of course, if one starts dhclient in debug mode (foreground) by adding a "-d", everything does work."

Yes, that does seem to work, but only for about another day, at least in my case. Thereafter, my ISP stopped returning my domain name, host name and some other stuff, and would only renew the lease for 1 hour instead of the normal 1 day. I suppose it doesn't matter because, I use supersede directives for that stuff anyhow. From my tcpdump captures, I have not been able to determine what, if anything, is different in the request, including the option parameter list request.

In my opinion, this bug report should have an importance level of critical.

Doug Smythies (dsmythies) wrote :

Correction 1: Stefan not Stephan, sorry.
Correction 2: Things broke down about another 1/2 day after the lease times changes to 1 hour. The client didn't seem to release this and resorted to broadcast type discover stuff. Then it was getting leases for 5 minutes, and the cycle continued until I re-booted.

Rob Whyte (fudge) wrote :

Since I am running dhclient -d eth0 I have not been disconnected. I do not intend to run dhclient this way, it is a part of the minimal system and should work in the background with no intervention. Who is actively working on this bug from Cannonical?

LaMont Jones (lamont) wrote :

The plan here is to restore the -export libraries for the time being, until upstream finishes the dhcp changes that are needed to have the common libs work. This is in-progress.

Stefan Bader (smb) wrote :

We have some testing packages that can be obtained by adding Lamont's PPA (sudo apt-add-ppa ppa:lamont/ppa). From local testing this looks to be solving at least the lease renewals. At least for me a normal kill still seems not enough to end the dhclient daemon.

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package isc-dhcp - 4.3.3-5ubuntu10

---------------
isc-dhcp (4.3.3-5ubuntu10) xenial; urgency=low

  [ Stefan Bader ]
  * Revert back to bind against bind9 export libraries
    (LP: #1556175, LP: #1551351)

  [ LaMont Jones ]
  * libirs-export requires libisccfg-export now

 -- Stefan Bader <email address hidden> Thu, 17 Mar 2016 17:45:53 +0100

Changed in isc-dhcp (Ubuntu Xenial):
status: Confirmed → Fix Released
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package bind9 - 1:9.10.3.dfsg.P4-3

---------------
bind9 (1:9.10.3.dfsg.P4-3) experimental; urgency=medium

  * Fix vcs links
  * build in debian/tmp, use bind9.install

 -- LaMont Jones <email address hidden> Fri, 18 Mar 2016 14:46:30 -0600

Changed in bind9 (Ubuntu Xenial):
status: Confirmed → Fix Released
Changed in isc-dhcp (Ubuntu):
milestone: ubuntu-16.04 → none
Changed in bind9 (Ubuntu):
milestone: ubuntu-16.04 → none

There's no package "isc-dhcp" - shouldn't this be fixed in the package isc-dhcp-client? I am still affected by this bug.

Jim Salter (jrssnet) wrote :

I am also still affected by this bug. I build small Ubuntu-based routers, and the ones running Xenial are very likely to fail to get new leases after the connection drops from the ISP, or if they're booted while the connection to the ISP / the ISP's edge device is down.

I'm having to resort to monkey-patching and cron-jobs to reliably get IP addresses on systems without NetworkManager, and it really stinks. =\

Florin Gherendi (florin-gf) wrote :
Download full text (6.9 KiB)

It happens to me after recent updates in 16.04 LTS. On boot I get an IP from the DHCP server, all is well, but the IP is dropped after a while, without complaint from network manager applet which still shows like the network is connected. Unplugging and plugging the ethernet cable solves the problem for the whole day. Looking in the syslog it appears that NetworkManager is started at boot time and then systemd tries to start it again, causing an IP conflict:

Nov 8 09:45:14 floring-desktop NetworkManager[1611]: <info> [1478591114.2788] device (eth0): link disconnected (deferring action for 4 seconds)
Nov 8 09:45:14 floring-desktop kernel: [ 1906.680617] tg3 0000:03:00.0 eth0: Link is down
Nov 8 09:45:16 floring-desktop kernel: [ 1908.580879] tg3 0000:03:00.0 eth0: Link is up at 100 Mbps, full duplex
Nov 8 09:45:16 floring-desktop kernel: [ 1908.580885] tg3 0000:03:00.0 eth0: Flow control is on for TX and on for RX
Nov 8 09:45:16 floring-desktop kernel: [ 1908.580888] tg3 0000:03:00.0 eth0: EEE is disabled
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.1791] device (eth0): link connected
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.1792] device (eth0): DHCPv4 lease renewal requested
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.1954] dhcp4 (eth0): canceled DHCP transaction, DHCP client pid 1915
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.1954] dhcp4 (eth0): state changed bound -> done
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.1960] dhcp4 (eth0): activation: beginning transaction (timeout in 45 seconds)
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2110] dhcp4 (eth0): dhclient started with pid 6862
Nov 8 09:45:16 floring-desktop dhclient[6862]: DHCPREQUEST of 10.4.0.227 on eth0 to 255.255.255.255 port 67 (xid=0x2ec87280)
Nov 8 09:45:16 floring-desktop dhclient[6862]: DHCPACK of 10.4.0.227 from 10.4.0.1
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2735] address 10.4.0.227
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2736] plen 24 (255.255.255.0)
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2736] gateway 10.4.0.1
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2736] server identifier 10.4.0.1
Nov 8 09:45:16 floring-desktop avahi-daemon[1280]: Joining mDNS multicast group on interface eth0.IPv4 with address 10.4.0.227.
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2736] lease time 600
[...]
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2737] dhcp4 (eth0): state changed unknown -> bound
Nov 8 09:45:16 floring-desktop avahi-daemon[1280]: New relevant interface eth0.IPv4 for mDNS.
Nov 8 09:45:16 floring-desktop avahi-daemon[1280]: Registering new address record for 10.4.0.227 on eth0.IPv4.
Nov 8 09:45:16 floring-desktop NetworkManager[1611]: <info> [1478591116.2765] policy: set 'Wired connection 1' (eth0) as default for IPv4 routing and DNS
Nov 8 09:45:16 floring-desktop dbus[1301]: [system...

Read more...

Pierre Clérissi (pci06) wrote :

Hi,
I've compiled and setup the isc dhclient v4.3.5 downloaded from the isc site and everything works fine.

I have compiled dhclient v4.3.5 from the isc site, installed and it does not work.
I have to create a new directory /var/db and dhclient 4.3.5 put dhclient6.leases in this directory
No trace in syslog, ifconfig eno1 seems OK but ping6 fails

Alon Albert (alonalbert) wrote :

I believe I am still affected by this bug.

$ dhclient --version
isc-dhclient-4.3.3

My lease is for 24 hours and every day, I get disconnected. If I disconnect and reconnect the Ethernet cable, connection is restored.

kern.log shows this pattern:

Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3687] address 10.0.0.6
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3957] plen 24 (255.255.255.0)
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3957] gateway 10.0.0.200
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3957] server identifier 10.0.0.200
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3958] lease time 86400
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3958] hostname 'kubuntu'
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3958] nameserver '8.8.8.8'
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.3958] nameserver '10.0.0.200'
Jan 24 03:17:12 kubuntu NetworkManager[1132]: <info> [1485220632.4108] dhcp4 (enp1s0): state changed bound -> bound
Jan 25 03:18:24 kubuntu kernel: [377372.984065] CIFS VFS: Server 10.0.0.2 has not responded in 120 seconds. Reconnecting...

You can see that on Jan 24 at 3:17 I got a 24 hour lease. On Jan 25 at 3:18, almost exactly 24 hours later, I get a network error. There is nothing in between.

This pattern repeats itself every day at around the same time.

Mike Carifio (carifio) wrote :

I did a fresh install of artful on a shuttle SZ170 with the following ethernet adapters according to `lshw`:

          *-network
                description: Ethernet interface
                product: I211 Gigabit Network Connection
                vendor: Intel Corporation
                physical id: 0
                bus info: pci@0000:03:00.0
                logical name: enp3s0
                version: 03
                serial: 80:ee:73:c8:ec:1a
                size: 1Gbit/s
                capacity: 1Gbit/s
                width: 32 bits
                clock: 33MHz
                capabilities: pm msi msix pciexpress bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
                configuration: autonegotiation=on broadcast=yes driver=igb driverversion=5.4.0-k duplex=full firmware=0. 6-1 ip=192.168.0.14 latency=0 link=yes multicast=yes port=twisted

and

        *-network
             description: Ethernet interface
             product: Ethernet Connection (2) I219-LM
             vendor: Intel Corporation
             physical id: 1f.6
             bus info: pci@0000:00:1f.6
             logical name: enp0s31f6
             version: 31
             serial: 80:ee:73:c8:ec:1b
             size: 1Gbit/s
             capacity: 1Gbit/s
             width: 32 bits
             clock: 33MHz
             capabilities: pm msi bus_master cap_list ethernet physical tp 10bt 10bt-fd 100bt 100bt-fd 1000bt-fd autonegotiation
             configuration: autonegotiation=on broadcast=yes driver=e1000e driverversion=3.2.6-k duplex=full firmware=0.8-4 latency=0 link=yes multicast=yes port=twisted pair speed=1Gbit/s
             resources: irq:126 memory:df200000-df21ffff

I have dhclient and bind9 installed:

root@atlantis:~# dhclient --version
isc-dhclient-4.3.5
root@atlantis:~# dpkg --list|grep -i dhc
ii dnsmasq-base 2.78-1 amd64 Small caching DNS proxy and DHCP/TFTP server
ii isc-dhcp-client 4.3.5-3ubuntu2 amd64 DHCP client for automatically obtaining an IP address
ii isc-dhcp-common 4.3.5-3ubuntu2 amd64 common manpages relevant to all of the isc-dhcp packages
root@atlantis:~# dpkg --list|grep -i bind9
ii bind9-host 1:9.10.3.dfsg.P4-12.6ubuntu1 amd64 Version of 'host' bundled with BIND 9.X
ii libbind9-140:amd64 1:9.10.3.dfsg.P4-12.6ubuntu1 amd64 BIND9 Shared Library used by BIND

I've disabled the second ethernet adapter. Periodically `enp3s0` looses its IP address which it got via dhcp from my cable router. I think its related to this bug. I understand the basics of dhcp (I can run some commands), so I don't know the next steps to continue the diagnosis.

Petr Suchý (xsuchy09) wrote :

I am affected with this bug too. First renew of IP is failing, after reconnect everything is ok until next reboot. Shutdown is often suspended for some time (always when shutdown is soon after start or reboot).
Ubuntu 16.04.3 LTS
isc-dhclient-4.3.3

Petr Suchý (xsuchy09) wrote :

Some details:
===
dpkg --list|grep -u dhc
ii isc-dhcp-client 4.3.3-5ubuntu12.7 amd64 DHCP client for automatically obtaining an IP address
ii isc-dhcp-common 4.3.3-5ubuntu12.7 amd64 common files used by all of the isc-dhcp packages
===
dpkg --list|grep -i bind9
ii bind9-host 1:9.10.3.dfsg.P4-8ubuntu1.10 amd64 Version of 'host' bundled with BIND 9.X
ii libbind9-140:amd64 1:9.10.3.dfsg.P4-8ubuntu1.10 amd64 BIND9 Shared Library used by BIND
rc libbind9-90 1:9.9.5.dfsg-3ubuntu0.8 amd64 BIND9 Shared Library used by BIND

Petr Suchý (xsuchy09) wrote :

This is happening only if "Ethernet Lan Option Rom" is checked (active) in BIOS (ThinkPad X230). So there is the reason. After disabled this option IP is not lost after first renew.

Andreas Mohr (andi) wrote :

Major categorization aspects:
- this very important core functionality (properly reliable resource management of DHCP leases, also in suspend-to-RAM etc.) **NEVER EVER WORKS** (exclusively suspend-to-RAM use case here, ending up with stale IP address):
  - this did not work in 12.04LTS
  - this still did not work after upgrade to 14.04LTS
  - this STILL did not work after upgrade to 16.04LTS (recently upgraded, i.e. after activity of #12 which says that it DID work there, yet perhaps not suspend-to-RAM)
- importance of this bug should be "a lot higher than High"
  Justification:
  - messes with/breaks (CORRUPTS!!) environment even *beyond* local system scope (--> inacceptable!!), due to continuing communication via a stale IP address which has long since been re-assigned to other hosts (as can be verified via DNS query)
  - has been observed in *all* recent LTS releases (which are expected to work fine in large enterpriseish environments??)
--> Ubuntu release process is sorely missing reliable verification (persistently executed test case) for this core functionality, to ensure that this is (and keeps!) not causing damage, at least in production (stable) distributions

Example to achieve fixing stale IP (renew DHCP lease):
nmcli c
nmcli c down uuid <UUID>
nmcli c up uuid <UUID>

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers