dhclient killed when DHCPv6 lease is out-of date

Bug #1533631 reported by Alexey Zagarin on 2016-01-13
42
This bug affects 8 people
Affects Status Importance Assigned to Milestone
NetworkManager
Fix Released
Medium
isc-dhcp (Ubuntu)
High
TJ
network-manager (Ubuntu)
High
TJ

Bug Description

Network Manager summarily kills the IPv6 dhclient process when the DHCPv6 lease contained in the dhclient6-${UUID}.lease file is out-of date, leaving the IPv6 interface without a stateful IPv6 address, or, if the "require IPv6 address on this interface" option is enabled, causes NM to cycle continuously deactivating and reactivating the interface (including the IPv4 addresses).

This is effectively a Denial Of Service. It can be trivially induced if, for example, the dhclient6-$(UUID}.lease file contains a lease that was issued before the user went away on vacation or the PC wasn't connected to the same network for a few days (depending on the lease renew/rebind/expiry times). Calculation on the old lease of

start + preferred_lifetime < NOW

triggers dhclient to 'DEPREFER6' the lease (withdraw the address record) and ask the DHCPv6 server for a new lease, but Network Manager will kill the dhclient because it only sees an 'EXPIRE6' state change.

In summary, when the DHCPv6 state transitions from "bound" to "unknown" then "expire" to "done" Network Manager kills the 'dhclient' process before it has chance to request and bind a fresh lease, If 'dhclient' is run manually with the same command-line options and allowed to continue running it correctly gains a new lease.

Network Manager doesn't know how to handle "DEPREF6", which is sent from isc-dhcp dhclient to the helper script (set by "-sf" option).

So it seems that to correctly solve this issue Network Manager must be taught how to handle DEPREF6.

/var/log/syslog will show a message from dhclient of the form:

dhclient: PRC: Address 2a02:8011:2007::2 depreferred.

Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in network-manager (Ubuntu):
status: New → Confirmed
TJ (tj) wrote :

Yes, seeing this too on 15.10 since around the first week of March, as best I can tell, but irrespective of Suspend.

It looks like dhclient thinks/is told the new lease is 'depreferred' and therefore quits, and the dhclient process is terminated, but the address remains on the interface until the valid-lifetime expires.

Mar 22 16:06:02 hephaestion NetworkManager[1468]: <info> Activation (wlp3s0) Beginning DHCPv6 transaction (timeout in 45 seconds)
Mar 22 16:06:02 hephaestion NetworkManager[1468]: <info> dhclient started with pid 19221
Mar 22 16:06:03 hephaestion dhclient: XMT: Confirm on wlp3s0, interval 950ms.
Mar 22 16:06:03 hephaestion dhclient: RCV: Reply message on wlp3s0 from fe80::ee43:f6ff:fe46:c080.
Mar 22 16:06:03 hephaestion dhclient: message status code Success.
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> valid_lft 86400
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> preferred_lft 86400
Mar 22 16:06:03 hephaestion dhclient: PRC: Rebinding lease on wlp3s0.
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> address 2a02:8011:2007::2
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> nameserver '2a02:8010:1:0:212:23:3:100'
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> (wlp3s0): DHCPv6 state changed unknown -> bound
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> Policy set 'wireless.lan.iam.tj' (wlp3s0) as default for IPv6 routing and DNS.
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> Writing DNS information to /sbin/resolvconf
Mar 22 16:06:03 hephaestion dnsmasq[2433]: setting upstream servers from DBus
Mar 22 16:06:03 hephaestion dnsmasq[2433]: using nameserver 2a02:8010:1:0:212:23:3:100#53
Mar 22 16:06:03 hephaestion dnsmasq[2433]: using nameserver 10.254.1.254#53
Mar 22 16:06:03 hephaestion nm-dispatcher: Dispatching action 'dhcp6-change' for wlp3s0
Mar 22 16:06:03 hephaestion dhclient: PRC: Address 2a02:8011:2007::2 depreferred.
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> (wlp3s0): DHCPv6 state changed bound -> unknown
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> (wlp3s0): DHCPv6 state changed unknown -> expire
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> (wlp3s0): canceled DHCP transaction, DHCP client pid 19221
Mar 22 16:06:03 hephaestion NetworkManager[1468]: <info> (wlp3s0): DHCPv6 state changed expire -> done

$ ip -6 addr show dev wlp3s0
3: wlp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 state UP qlen 1000
    inet6 2a02:8011:2007::2/128 scope global dynamic
       valid_lft 82942sec preferred_lft 82942sec
    inet6 fe80::21f:3bff:febd:7485/64 scope link
       valid_lft forever preferred_lft forever

TJ (tj) wrote :

Looking at the files related to dhclient managed by Network Manager under /var/lib/NetworkManager/ I noticed that although the new lease file seems to contain the correct (in the future) timestamps the "timestamps" file entry for that same lease UUID is still set to the time the original lease expired.

$ ll dhclient6-0775a2a6-a4a1-40bf-bd85-9c3068582dbc-wlp3s0.lease
 -rw-r--r-- 1 root root 1056 Mar 22 16:06 dhclient6-0775a2a6-a4a1-40bf-bd85-9c3068582dbc-wlp3s0.lease

$ cat dhclient6-0775a2a6-a4a1-40bf-bd85-9c3068582dbc-wlp3s0.lease
default-duid "\000\004\301\253\223\321\035\340@\022e\036\025M97\347\361";
lease6 {
  interface "wlp3s0";
  ia-na 3b:bd:74:85 {
    starts 1456324435;
    renew 43200;
    rebind 69120;
    iaaddr 2a02:8011:2007::2 {
      starts 1456324435;
      preferred-life 86400;
      max-life 86400;
    }
  }
  option dhcp6.client-id 0:4:c1:ab:93:d1:1d:e0:40:12:65:1e:15:4d:39:37:e7:f1;
  option dhcp6.server-id 0:3:0:1:ec:43:f6:46:c0:80;
  option dhcp6.name-servers 2a02:8010:1:0:212:23:3:100;
  option dhcp6.unknown-16 0:0:d:e9:0:c:64:73:6c:66:6f:72:75:6d:2e:6f:72:67;
}
$ date +%F-%T --date='@1456324435'
2016-02-24-14:33:55

$ date +%F-%T
2016-03-22-17:28:20

So the lease "starts" time is a month ago even though the lease file update time is the time the address was depreferred.

The "timestamps" file entry for that UUID is:

$ grep 0775a2a6-a4a1-40bf-bd85-9c3068582dbc timestamps
0775a2a6-a4a1-40bf-bd85-9c3068582dbc=1458662760

$ date +%F-%T --date='@1458662760'
2016-03-22-16:06:00

This is the time of the deprefer; presumably set when the lease was originally issued.

TJ (tj) wrote :

After deleting the dhcp6${UUID}-${IF}.lease file and reconnecting the interface the "starts" timestamp is now up to date. Not sure if this is relevant to the internals of dhclient though.

$ cat dhclient6-0775a2a6-a4a1-40bf-bd85-9c3068582dbc-wlp3s0.lease
default-duid "\000\001\000\001\036z\001X\000\037;\275t\205";
lease6 {
  interface "wlp3s0";
  ia-na 3b:bd:74:85 {
    starts 1458668291;
    renew 43200;
    rebind 69120;
    iaaddr 2a02:8011:2007::3 {
      starts 1458668291;
      preferred-life 86400;
      max-life 86400;
    }
  }
  option dhcp6.client-id 0:1:0:1:1e:7a:1:58:0:1f:3b:bd:74:85;
  option dhcp6.server-id 0:3:0:1:ec:43:f6:46:c0:80;
  option dhcp6.name-servers 2a02:8010:1:0:212:23:3:100;
  option dhcp6.unknown-16 0:0:d:e9:0:c:64:73:6c:66:6f:72:75:6d:2e:6f:72:67;
}
$ date +%F-%T --date='@1458668291'
2016-03-22-17:38:11

$ grep 0775a2a6-a4a1-40bf-bd85-9c3068582dbc timestamps
0775a2a6-a4a1-40bf-bd85-9c3068582dbc=1458668288

$ date +%F-%T --date='@1458668288'
2016-03-22-17:38:08

TJ (tj) on 2016-03-30
Changed in network-manager (Ubuntu):
importance: Undecided → High
Changed in isc-dhcp (Ubuntu):
importance: Undecided → High
status: New → Triaged
TJ (tj) wrote :
Download full text (6.1 KiB)

Further debugging indicates that when the IPv6 lease file contains an expired lease as its last entry dhclient 'depreferences' it (gives it up) but doesn't go on to attempt to gain a replacement lease because NM cancels (kills) the dhclient process as a result of the 'expires' message.

NM starts "dhclient ... -6 ... -cf dhclient6-${IF}.conf ${IF}" but kills it and deletes the config file immediately although it leaves the lease file in place.

I managed to capture the .conf file by waiting for NM to close it and then making a copy before it was deleted:

sudo touch /var/lib/NetworkManager/dhclient6-${IF}.conf
inotifywait -e CLOSE_NOWRITE,CLOSE dhclient6-${IF}.conf; cp dhclient6-${IF}.conf /tmp/

I then executed dhclient in debug mode (after moving the .conf file into /etc/dhcp/ to avoid apparmor denying access to the file in /tmp/) and found that after the 'depreference' there is a delay of several minutes before dhclient requests a new lease:

$ sudo cp /tmp/dhclient6-${IF}.conf /etc/dhcp/

$ sudo /sbin/dhclient -d -6 -N -lf /var/lib/NetworkManager/dhclient6-0775a2a6-a4a1-40bf-bd85-9c3068582dbc-wlp3s0.lease -cf /etc/dhcp/dhclient6-wlp3s0.conf wlp3s0 |& tee /tmp/dhclient6.test.log

Internet Systems Consortium DHCP Client 4.3.1
Copyright 2004-2014 Internet Systems Consortium.
All rights reserved.
For info, please visit https://www.isc.org/software/dhcp/

Listening on Socket/wlp3s0
Sending on Socket/wlp3s0
PRC: Confirming active lease (INIT-REBOOT).
XMT: Forming Confirm, 0 ms elapsed.
XMT: X-- IA_NA 3b:bd:74:85
XMT: | X-- Confirm Address 2a02:8011:2007::2
XMT: V IA_NA appended.
XMT: Confirm on wlp3s0, interval 1020ms.
RCV: Reply message on wlp3s0 from fe80::ee43:f6ff:fe46:c080.
RCV: X-- Server ID: 00:03:00:01:ec:43:f6:46:c0:80
message status code Success.
PRC: Bound to lease 00:03:00:01:ec:43:f6:46:c0:80.
PRC: Rebind event scheduled in -294538 seconds, to run for 17280 seconds.
PRC: Depreference scheduled in -277258 seconds.
PRC: Expiration scheduled in -277258 seconds.
PRC: Rebinding lease on wlp3s0.
PRC: Depreference scheduled in -277258 seconds.
PRC: Expiration scheduled in -277258 seconds.
PRC: Address 2a02:8011:2007::2 depreferred.
PRC: Expiration scheduled in -277258 seconds.
PRC: Address 2a02:8011:2007::2 expired.
PRC: Bound lease is devoid of active addresses. Re-initializing.
PRC: Soliciting for leases (INIT)

... very very long delay (several minutes) ...

XMT: Forming Solicit, 0 ms elapsed.
XMT: X-- IA_NA 3b:bd:74:85
XMT: | X-- Request renew in +3600
XMT: | X-- Request rebind in +5400
XMT: Solicit on wlp3s0, interval 1080ms.
RCV: Advertise message on wlp3s0 from fe80::ee43:f6ff:fe46:c080.
RCV: X-- IA_NA 3b:bd:74:85
RCV: | X-- starts 1459366132
RCV: | X-- t1 - renew +43200
RCV: | X-- t2 - rebind +69120
RCV: | X-- [Options]
RCV: | | X-- IAADDR 2a02:8011:2007::2
RCV: | | | X-- Preferred lifetime 86400.
RCV: | | | X-- Max lifetime 86400.
RCV: X-- Server ID: 00:03:00:01:ec:43:f6:46:c0:80
RCV: Advertisement recorded.
PRC: Selecting best advertised lease.
PRC: Considering best lease.
PRC: X-- Initial candidate 00:03:00:01:ec:43:f6:46:c0:80 (s: 156, p: 0).
XMT: Forming Request, 0 ms elapsed.
XMT: X-- IA_NA 3b:bd:74:...

Read more...

TJ (tj) wrote :
Download full text (6.9 KiB)

Here's some notes from my digging into the code paths being followed. Using the messages logged by NM to syslog I've been able to reconstruct the call heirarchy. In summary, when the state transitions from "expire" to "done":

nm_dhcp_client_set_state () fires a NM_DHCP_STATE_EXPIRE signal

dhcp6_state_changed () receives NM_DHCP_STATE_EXPIRE signal and calls
   dhcp6_fail (self, FALSE)
      dhcp6_cleanup(self, CLEANUP_TYPE_DECONFIGURE, FALSE)
         nm_dhcp_client_stop ()
            NM_DHCP_CLIENT_GET_CLASS (self)->stop (self, release, priv->duid);

I've not been able to identify why the behaviour appears to have changed recently. Inspecting the changelogs and patches in both isc-dhcp and network-manager hasn't revealed anything.

It definitely seems to be Network Manager's issue since it is killing the dhclient process... however... if dhclient tried to gain a new lease immediately it 'deprefered' the expired lease (thus sending NM additional state change signals) then maybe NM wouldn't kill the process.

Something else I noticed - which misled me for a while - is that the dhclient lease files written by Network Manager for IP4v and IPv6 use different syntax for the timestamps. From 'man 5 dhclient.conf' "LEASE DECLARATIONS" the "renew" "rebind" "expire" dates can have two formats: "default" or "local". This is set by "db-time-format" in the configuration.

default: <weekday> <year>/<month>/<day> <hour>:<minute>:<second>
local: epoch <seconds-since-epoch>; # <day-name> <month-name> <day-number> <hours>:<minutes>:<seconds> <year>

The NM IPv4.lease file uses the "default" format (" renew 4 2016/03/31 06:32:42;") but the IPv6.lease file seems to use "local" ("renew 43200;") but without the "epoch" keyword. The code seems to indicate the manual is confusing the *server* settings with client settings, or else IPv6 is using a different scheme entirely via its 'clientenv_add()' mechanism.

NM generates this config file - there is no 'db-time-format':
---
# Created by NetworkManager

send fqdn.fqdn "hephaestion.lan.iam.tj"; # added by NetworkManager
send fqdn.encoded on;
send fqdn.server-update on;

also request dhcp6.name-servers;
also request dhcp6.domain-search;
also request dhcp6.client-id;
---

The dhcp6 lease file extract (showing the last entry is an expired one:-
$ date --date @$(echo 1459002474+86400 | bc)
Sun Mar 27 15:27:54 BST 2016
---
lease6 {
  interface "wlp3s0";
  ia-na 3b:bd:74:85 {
    starts 1459002474;
    renew 43200;
    rebind 69120;
    iaaddr 2a02:8011:2007::2 {
      starts 1459002474;
      preferred-life 86400;
      max-life 86400;
    }
  }
---

---
Notes from the source-code analysis.

      static const GEnumValue values[] = {
        { NM_DHCP_STATE_UNKNOWN, "NM_DHCP_STATE_UNKNOWN", "nm-dhcp-state-unknown" },
        { NM_DHCP_STATE_BOUND, "NM_DHCP_STATE_BOUND", "nm-dhcp-state-bound" },
        { NM_DHCP_STATE_TIMEOUT, "NM_DHCP_STATE_TIMEOUT", "nm-dhcp-state-timeout" },
        { NM_DHCP_STATE_DONE, "NM_DHCP_STATE_DONE", "nm-dhcp-state-done" },
        { NM_DHCP_STATE_EXPIRE, "NM_DHCP_STATE_EXPIRE", "nm-dhcp-state-expire" },
        { NM_DHCP_STATE_FAIL, "NM_DHCP_STATE_FAIL", "nm-dhcp-state-fail" },
  ...

Read more...

Changed in network-manager (Ubuntu):
status: Confirmed → Triaged
TJ (tj) wrote :

Attaching a Network Manager TRACE log_level from syslog generated using:

/etc/NetworkManager/conf.d/logging.conf:
---
[logging]
level=TRACE
---

I've removed any extraneous messages from processes other than NM and dhclient. I've also removed IPv4 related messages to make it easier to follow.

TJ (tj) wrote :

The debug log contains:

Mar 31 01:34:59 hephaestion NetworkManager[9095]: <debug> [1459384499.752259] [dhcp-manager/nm-dhcp-client.c:222] reason_to_state(): (wlp3s0): unmapped DHCP state 'DEPREF6'
Mar 31 01:34:59 hephaestion NetworkManager[9095]: <debug> [1459384499.758516] [dhcp-manager/nm-dhcp-client.c:757] nm_dhcp_client_handle_event(): (wlp3s0): DHCP reason 'DEPREF6' -> state 'unknown'

Network Manager doesn't know how to handle "DEPREF6", which is sent from isc-dhcp dhclient to the helper script (set by "-sf" option):

client/dhc6.c::do_depref()
{
...
      if (addr->starts + addr->preferred_life <= cur_time) {
        script_init(client, "DEPREF6", NULL);
        dhc6_marshall_values("cur_", client, lease,
                 ia, addr);
        script_write_requested6(client);
        script_go(client);
...
}

So it seems that to correctly solve this issue Network Manager must be taught how to handle DEPREF6.

TJ (tj) on 2016-03-31
Changed in isc-dhcp (Ubuntu):
importance: High → Low
status: Triaged → Invalid
TJ (tj) on 2016-03-31
description: updated
Changed in network-manager:
importance: Unknown → Medium
status: Unknown → Confirmed
TJ (tj) on 2016-03-31
summary: - Failed to renew DHCPv6 lease after suspend
+ dhclient killed when DHCPv6 lease is out-of date
TJ (tj) on 2016-03-31
Changed in network-manager (Ubuntu):
assignee: nobody → TJ (tj)
status: Triaged → In Progress
TJ (tj) on 2016-04-01
Changed in isc-dhcp (Ubuntu):
status: Invalid → In Progress
importance: Low → High
assignee: nobody → TJ (tj)
TJ (tj) wrote :

It seems there's a deficiency in isc-dhcp client that leads to this. Fedora have been carrying a patch since 2013 that solves it by the simple expedient of scanning the .lease file first for unexpired leases before allocate an address.

See:

http://pkgs.fedoraproject.org/cgit/rpms/dhcp.git/commit/dhcp-honor-expired.patch?h=f24&id=e83fb19c51765442d77fa60596bfdb2b3b9fbe2e

There's no sign that the Fedora devs creating this patch made any attempt to upstream it. I'm going to follow-up with the ISC DHCP devs to find out what their view is.

In the meantime I'm working with the Network Manager devs to develop an acceptable band-aid solution to solve the immediate problem.

I'm getting DOS'd about 20 times a day as a result of this issue.

Could a "band-aid" shell script be whipped up by anyone in the know, to physically purge the expired lease files periodically, and posted here?

I would gladly set such a script to run as a "cleanup service" or even run it manually, while the proper fix is implemented.

Thanks,
Jason

VF (fviktor) wrote :

This problem has been happening to me recently with up to date Ubuntu 16.04 amd64 desktop.

Workaround is to clean up the broken state by removing all leases:

1. Disconnect from the network in Network Manager
2. Execute: sudo rm /var/lib/NetworkManager/dhclient*.lease
3. Connect to the network in Network Manager

After this dhclient -6 keeps running and IPv6 leases are renewed correctly.

I had this problem on a second very similar installation. There it has been verified that dhclient -6 process is killed/crashed right after acquiring the IPv6 address, so I had a second reproduction.

I saved the contents of /var/lib/NetworkManager and can send it to the developer fixing it on request. There must be something there crashing dhclient in IPv6 mode.

Klaus Kudielka (kkudielka) wrote :

Same problem on xubuntu 16.10 with:

network-manager 1.2.4-0ubuntu1
isc-dhcp-client 4.3.3-5ubuntu15

Klaus Kudielka (kkudielka) wrote :

As a temporary work-around on my ubuntu machine, I added the following line to /lib/systemd/system/NetworkManager.service in the [Service] section:

ExecStartPre=-/bin/sh -c 'rm -f /var/lib/NetworkManager/dhclient6-*.lease'

Now I get DHCPv6 leases again, even if my machine was powered off for some time.
This probably doesn't work for suspend/resume, but is better than nothing.

Roman (krey81) wrote :

I have no Network Manager in Ubuntu server 16.04 LTS, but this bug present.
After booting where is no dhclient in the processes if accept_ra set to 2 and syslog contains

Jun 1 13:55:33 gw dhclient[3663]: PRC: Prefix 2a02:2168:176b:4800::/56 depreferred.
Jun 1 14:05:44 gw dhclient[2762]: PRC: Address 2a02:2168:82d:949::4 depreferred.

if i remove setting accept_ra from sysctl on same leases all works fine except of course DHCP-PD.

Roman (krey81) wrote :

root@gw:~# cat /var/lib/dhcp/dhclient6.br1.leases
default-duid "\000\001\000\001 \302\271\313\000\002D\032>\251";
lease6 {
  interface "br1";
  ia-na 44:1a:3e:a9 {
    starts 1496317261;
    renew 900;
    rebind 1500;
    iaaddr 2a02:2168:82d:949::1 {
      starts 1496317261;
      preferred-life 1800;
      max-life 1800;
    }
  }
  option dhcp6.client-id 0:1:0:1:20:c2:b9:cb:0:2:44:1a:3e:a9;
  option dhcp6.server-id 0:3:0:1:0:16:4d:7e:df:5c;
  option dhcp6.name-servers 2a02:2168:208:1::1,2a02:2168:208:2::1;
}

Changed in network-manager:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.