NetworkManager wrongly releases DHCP IP

Bug #105251 reported by Oliver Gerlich on 2007-04-10
32
Affects Status Importance Assigned to Milestone
dhcp (Ubuntu)
Undecided
Unassigned
dhcp3 (Ubuntu)
Undecided
Unassigned
dnsmasq (Ubuntu)
Undecided
Unassigned
network-manager (Ubuntu)
Undecided
Unassigned
whereami (Ubuntu)
Undecided
Unassigned

Bug Description

Binary package hint: network-manager

After resuming my laptop from Suspend-to-disk, NetworkManager failed to give me an IP. Usually it gets an IP from DHCP server without problems; but this one time it failed. First dhclient correctly got an IP:
Apr 10 19:32:43 veridian dhclient: bound to 192.168.0.4 -- renewal in 17299 seconds.

A short time later it releases the IP again:
Apr 10 19:34:14 veridian NetworkManager: <information>^IDevice 'eth0' DHCP transaction took too long (>99s), stopping it.
Apr 10 19:34:14 veridian dhclient: There is already a pid file /var/run/dhclient.eth0.pid with pid 1320
Apr 10 19:34:14 veridian dhclient: killed old client process, removed PID file
Apr 10 19:34:14 veridian dhclient: DHCPRELEASE on eth0 to 192.168.0.1 port 67

ifconfig showed this:
eth0 Protokoll:Ethernet Hardware Adresse 00:40:F4:A7:A2:06
          inet6 Adresse: fe80::240:f4ff:fea7:a206/64 Gültigkeitsbereich:Verbindung
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          RX packets:42 errors:0 dropped:0 overruns:0 frame:0
          TX packets:249 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:1000
          RX bytes:4652 (4.5 KiB) TX bytes:18375 (17.9 KiB)
          Interrupt:10

eth0:avah Protokoll:Ethernet Hardware Adresse 00:40:F4:A7:A2:06
          inet Adresse:169.254.6.196 Bcast:169.254.255.255 Maske:255.255.0.0
          UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
          Interrupt:10

lo Protokoll:Lokale Schleife
          inet Adresse:127.0.0.1 Maske:255.0.0.0
          inet6 Adresse: ::1/128 Gültigkeitsbereich:Maschine
          UP LOOPBACK RUNNING MTU:16436 Metric:1
          RX packets:4215 errors:0 dropped:0 overruns:0 frame:0
          TX packets:4215 errors:0 dropped:0 overruns:0 carrier:0
          Kollisionen:0 Sendewarteschlangenlänge:0
          RX bytes:224601 (219.3 KiB) TX bytes:224601 (219.3 KiB)

After left-clicking on NM icon and selecting Cable Network, it got a correct IP.

Oliver Gerlich (ogerlich) wrote :

Syslog output is attached.

Pedro Silva (psilva-sfsu) wrote :

I don't think this is a duplicate of bug #45696. I'm having the same problem without any suspend to RAM or hibernation actions.

NetworkManager started showing this behaviour right after I ticked off my Atheros wifi card from the Restricted Driver Manager, just to see what would happen. Since then, even after adding it back to restricted drivers manager, *any* wifi card I try eventually gets the same DHCP release, after which I'm forced to reconnect (~every 1/2 minutes).

After association to *any* AP, there is a communication failure between the driver and NetworkManager; I can tell, because although ifconfig returns a proper IP address, the "Connection Information" in nm-applet shows 0s all around.

I suppose that, because it figures the DHCP didn't provide an IP address, NetworkManager eventually releases the request. (note that it remain associated to the AP for a period of time, and only after a few times does it disconnect from the AP)

I'm running the Gutsy beta on a Toshiba Portege R200.

uname -a:
Linux r200-ubuntu 2.6.22-12-386 #1 Sun Sep 23 17:37:35 GMT 2007 i686 GNU/Linux

lspci | grep Ethernet
01:00.0 Ethernet controller: Marvell Technology Group Ltd. 88E8053 PCI-E Gigabit Ethernet Controller (rev 15)
02:05.0 Ethernet controller: Atheros Communications, Inc. AR5212/AR5213 Multiprotocol MAC/baseband processor (rev 01)

This is what the relevant DHCP release information (see syslog attachment):

Oct 3 13:57:05 r200-ubuntu dhclient: bound to 192.168.1.102 -- renewal in 956038622 seconds.
Oct 3 13:58:38 r200-ubuntu NetworkManager: <info> Device 'ath0' DHCP transaction took too long (>99s), stopping it.
Oct 3 13:58:38 r200-ubuntu dhclient: There is already a pid file /var/run/dhclient.ath0.pid with pid 5675
Oct 3 13:58:38 r200-ubuntu dhclient: killed old client process, removed PID file
Oct 3 13:58:38 r200-ubuntu dhclient: wifi0: unknown hardware address type 801
Oct 3 13:58:38 r200-ubuntu dhclient: wifi0: unknown hardware address type 801
Oct 3 13:58:38 r200-ubuntu dhclient: DHCPRELEASE on ath0 to 192.168.1.1 port 67
Oct 3 13:58:38 r200-ubuntu avahi-daemon[5134]: Withdrawing address record for 192.168.1.102 on ath0.

Motin (motin) wrote :

Same here, this is not a duplicate of bug #45696 !

This bug is _not_ present in a clean install of Hardy. It seems to be related to the installation of dhcp3-server and/or dnsmasq. A forum thread describing this very issue in detail is available here: http://ubuntuforums.org/showthread.php?t=674178

Bug #155885 is a duplicate of this bug, but cannot be marked as such before this bug's erreneous duplication status has been removed.

Motin (motin) wrote :

It is now correctly marked as a duplicate of Bug #155885 instead.

Changed in network-manager:
status: New → Confirmed
Motin (motin) wrote :

Typo, of course I meant the other way around:
Bug #155885 is now correctly marked as a duplicate of this bug.

Also, a better summary for this bug could be:
"NetworkManager wrongly disconnects wireless after 99s, ignores working DHCP lease. Bug evident after installing/configuring dhcp3-server and/or dnsmasq, even after purging and reinstalling network manager and these packages"

Motin (motin) wrote :

Note once again, however, that the bug is not evident on a freshly installed system.

We ought to be able to inspect the differences in /etc between working and non-working systems to able to work out a solution for affected systems without needing to reinstall Ubuntu...

Motin (motin) wrote :

The bug has now returned, after having installed ppa version of network-manager 0.7 as well as tried (but reverted from) a customly compiled 2.6.25.8-rt7 kernel. This time however, I have not touched dhcp settings files nor installed dnsmasq...

As stated in the very interesting blog post on this matter: http://debian-user-news.blogspot.com/2008/06/networkmanager-and-dhcp-issue-work.html , the problem seem to be related to failing dbus communication between network manager and dhclient.

What is needed now for a developer to investigate this further? I'll happily test proposed solutions/work-arounds.

Cheers!

Motin (motin) wrote :
Download full text (6.2 KiB)

@Guillaume (Blog post author): I tried your proposed workaround (adding "|new_server_name|old_server_name" to the dhcdbd exit hook) but unfortunately it didn't change the situation at all for me. I am still getting...:

Jul 16 03:18:44 motin-laptop NetworkManager: <info> dhclient started with pid 7017
Jul 16 03:18:44 motin-laptop NetworkManager: <info> Activation (wlan0) Stage 3 of 5 (IP Configure Start) complete.
Jul 16 03:18:44 motin-laptop dhclient: Internet Systems Consortium DHCP Client V3.0.6
Jul 16 03:18:44 motin-laptop dhclient: Copyright 2004-2007 Internet Systems Consortium.
Jul 16 03:18:44 motin-laptop dhclient: All rights reserved.
Jul 16 03:18:44 motin-laptop dhclient: For info, please visit http://www.isc.org/sw/dhcp/
Jul 16 03:18:44 motin-laptop dhclient:
Jul 16 03:18:44 motin-laptop dhclient: wmaster0: unknown hardware address type 801
Jul 16 03:18:44 motin-laptop dhclient: wmaster0: unknown hardware address type 801
Jul 16 03:18:44 motin-laptop dhclient: Listening on LPF/wlan0/00:1f:3b:bb:dd:bf
Jul 16 03:18:44 motin-laptop dhclient: Sending on LPF/wlan0/00:1f:3b:bb:dd:bf
Jul 16 03:18:44 motin-laptop dhclient: Sending on Socket/fallback
Jul 16 03:18:48 motin-laptop dhclient: DHCPDISCOVER on wlan0 to 255.255.255.255 port 67 interval 6
Jul 16 03:18:48 motin-laptop dhclient: DHCPOFFER of 192.168.1.127 from 192.168.1.1
Jul 16 03:18:48 motin-laptop dhclient: DHCPREQUEST of 192.168.1.127 on wlan0 to 255.255.255.255 port 67
Jul 16 03:18:48 motin-laptop dhclient: DHCPACK of 192.168.1.127 from 192.168.1.1
Jul 16 03:18:48 motin-laptop dhclient: bound to 192.168.1.127 -- renewal in 41230 seconds.
Jul 16 03:19:29 motin-laptop NetworkManager: <info> Device 'wlan0' DHCP transaction took too long (>45s), stopping it.
Jul 16 03:19:29 motin-laptop NetworkManager: <info> wlan0: canceled DHCP transaction, dhclient pid 7017

Do you, or any other around here, know if there is a way to manually send the correct dbus message to network manager? This way we can at least inspect the syslog ourselves and send the message when we are connected to the network, instead of having to wait for an inevitable disconnect...

Also, you write "It would be interesting to compare dhcp response messages between dnsmasq and dhcp-server in order to see if dhclient returns readable values in the server_name fields when using the latter.". How do we identify and compare these response messages? Logging? Some sort of debug mode? I'll happily debug further if you can put me on the right track... Thanks!

PS All I managed to do was an strace of dhclient shortly after I saw what pid that network manager was using and before and during the actual bound. I have no idea if it helps and or dbus messages can be spotted but here it is:

$ sudo strace -p 7646
Process 7646 attached - interrupt to quit
gettimeofday({1216171610, 812560}, NULL) = 0
select(8, [7], [], [], {0, 0}) = 0 (Timeout)
select(7, [6], [], [], {0, 0}) = 0 (Timeout)
gettimeofday({1216171610, 812983}, NULL) = 0
select(8, [6 7], [], [], {0, 187017}) = 0 (Timeout)
gettimeofday({1216171611, 1634}, NULL) = 0
time(NULL) = 1216171611
stat64("/etc/localtime", ...

Read more...

Marius Gedminas (mgedmin) wrote :

Adding

  exec >> /tmp/dhcp.debug 2>&1
  set -x

to the end of /etc/dhcp3/dhclient-exit-hooks.d/debug showed that the culprit on one system was a whereami script that did 'exit 1' prematurely, stopping /etc/dhcp3/dhclient-exit-hooks.d/zzzz_dhcdbd from being run. Here's the relevant line of code:

  test ! -e /usr/sbin/whereami && exit 1

'exit 1' is a bad idea when your script is being sourced rather than executed.

dpkg --purge whereami fixed the problem on this system. A proper fix would be to make the whereami hook not call 'exit'.

Obviously this happens only when you've had whereami installed and then removed without purging its configuration files.

It might be a good idea to check other packages supplying dhclient hooks for similar problems.

Marius Gedminas (mgedmin) wrote :

The whereami bug was already reported as bug 239740

Motin (motin) wrote :

Interesting about the whereami thing, but I have never installed whereami and still experiencing this bug. May be a solution for others though...

Marius Gedminas (mgedmin) wrote :

Motin: check the other dhclient hooks:

  grep exit /etc/dhcp3/dhclient-exit-hooks.d/*

Alexander Sack (asac) wrote :

The timeout bugs you see in syslog attached are fixed in latest WPA supplicant in combination with improved driver runtime behaviour. Thus, I assume this is fixed in intrepid now. if you can still reproduce there, please reopen.

Changed in network-manager:
status: Confirmed → Fix Released
Changed in dhcp3:
status: New → Invalid
Changed in dhcp:
status: New → Invalid
Changed in dnsmasq:
status: New → Invalid
Changed in whereami:
status: New → Invalid

I can reproduce this in Intreprid since about one month. Note that I came the upgrade path 7.10->8.04->8.10

DHCP via NetworkManager fails on wireless and wired interfaces.

package 'whereami' is not installed, the dpkg --purge workaround does not work even after reinstalling whereami
and removing the non-empty folder /var/lib/whereami

I have no option to reopen this report.

# grep exit /etc/dhcp3/dhclient-exit-hooks.d/*
/etc/dhcp3/dhclient-exit-hooks.d/debug:# $Id: debug-exit,v 1.2 2003/06/18 18:11:21 peloy Exp $
/etc/dhcp3/dhclient-exit-hooks.d/debug: echo `date`: entering dhclient-exit-hooks.d, dumping variables. \

From daemon.log:

Jan 4 18:51:50 dragonscale dhclient: DHCPACK of 10.0.0.19 from 10.0.0.254
Jan 4 18:51:50 dragonscale dhclient: bound to 10.0.0.19 -- renewal in 65444231 seconds.
Jan 4 18:52:35 dragonscale NetworkManager: <info> Device 'eth0' DHCP transaction took too long (>45s), stopping it.
Jan 4 18:52:35 dragonscale NetworkManager: <info> eth0: canceled DHCP transaction, dhcp client pid 7167
Jan 4 18:52:35 dragonscale NetworkManager: <info> Activation (eth0) Stage 4 of 5 (IP Configure Timeout) scheduled...
Jan 4 18:52:35 dragonscale NetworkManager: <info> Activation (eth0) Stage 4 of 5 (IP Configure Timeout) started...
Jan 4 18:52:35 dragonscale NetworkManager: <info> (eth0): device state change: 7 -> 9
Jan 4 18:52:35 dragonscale NetworkManager: <info> Marking connection 'Auto eth0' invalid.

Reopening as requested

Marius Gedminas (mgedmin) wrote :

Just a comment: 'exit' is not the only way to terminate a shell script prematurely; if any of the scripts call 'set -e' and then any of the commands executed after that fails (i.e. returns a non-zero exit code) the script will terminate.

I would suggest the same debugging technique: adding

  exec >> /tmp/dhcp.debug 2>&1
  set -x

somewhere near the top of the first dhclient exit hook, and then looking at the execution log in /tmp/dhcp.debug to see whether the dbus-send command from /etc/dhcp3/dhclient-exit-hooks.d/zzzz_dhcdbd got executed or not.

stevieh (stevieh) wrote :

When I understand this right, following stuff appeared from Hardy to Intrepid:

a) (minor issue) dhclient-script is not started per default. So the path to dhclient-script should be set right (to /sbin/dhclient-script) in /etc/dhclient.conf

b) up to hardy, at the end in the dhclient-exit-hooks.d there is a should be some script telling the NetworkManager (via dbus?) that dhclient has done it's job. In hardy, this was done by zzzz_dhcdbd but dhcbd is not there any more in intrepid.

And this seems to be the issue: there is nothing in dhclients exit hooks which tells the nm, that the job is done. If I'd understood, what is now the "regular" way to tell nm that, I could write a hook in exit...

Or am I completely wrong?

Marius Gedminas (mgedmin) wrote :

Hm, interesting. I didn't know things changed in Intrepid. I changed /etc/dhcp3/dhclient-exit-hooks.d/debug and asked it to print debugging output to /tmp/dhclient-script.debug, yet when I reconnected to the network, that file was not created. So it seems that network-manager talks to the dhcp client via some other mechanism in Intrepid, and all this business with exit hooks is now irrelevant.

Can anyone confirm?

stevieh (stevieh) wrote :

Marius: no, the reason for that is, that dhclient-script is not started. Just uncomment the dhclient-scprit entry and set it to the right path does fix that issue and you get the debugging output. But then it does not return to nm. wich is the real issue.

For some reason I still had the old hardy packages of network-manager and libnm installed. Note that I came the upgrade path 7.10->8.04->8.10 . After forcing the versions from intreprid-updates everything is fine again.

fschmitt (florian-schmitt) wrote :

I have the same problem and still the hardy versions of network manager, libnm and libdus (?), but updating them to intrepid didn't help form me. I've attached the content of dhcp.debug and the apt-cache information regarding network-manager and libnm-glib0.

Austriaco (lanieves) wrote :

Network-manager calls dhclient with -sf option set to /usr/lib/NetworkManager/nm-dhcp-client.action which means dhclient doesn't run the traditional dhclient-script upon acquiring the Lease.

$ sudo /sbin/dhclient -d -sf /usr/lib/NetworkManager/nm-dhcp-client.action -pf /var/run/dhclient-eth0.pid -lf /var/run/dhclient-eth0.lease -cf /var/run/nm-dhclient-eth0.conf eth0
Internet Systems Consortium DHCP Client V3.1.1
Copyright 2004-2008 Internet Systems Consortium.
All rights reserved.
For info, please visit http://www.isc.org/sw/dhcp/

wifi0: unknown hardware address type 801
Error: Could not acquire the NM DHCP client service. Message: (org.freedesktop.DBus.Error.AccessDenied) Connection ":1.36" is not allowed to own the service "org.freedesktop.nm_dhcp_client" due to security policies in the configuration file

It seems that NM's own program fails to notify via DBus the acquisition of the Lease. Now, if you add NM's PPA source to your apt list:

deb http://ppa.launchpad.net/network-manager/ubuntu intrepid main

an upgrade from there to version 0.7-0ubuntu1~nm1~intrepid1 then this problem is solved.

To recap the solution, install NetworkManager from the PPA repository and it will solve the problem

fschmitt (florian-schmitt) wrote :
Download full text (7.8 KiB)

sorry, still not working. i've installed the version 0.7-0ubuntu1~nm1~intrepid1, but no change. The following versions are installed:

florian@qacha:~$ sudo apt-cache showpkg libnm-glib0
[sudo] password for florian:
Package: libnm-glib0
Versions:
0.7-0ubuntu1~nm1~intrepid1 (/var/lib/apt/lists/ppa.launchpad.net_network-manager_ubuntu_dists_intrepid_main_binary-i386_Packages) (/var/lib/dpkg/status)
 Description Language:
                 File: /var/lib/apt/lists/ppa.launchpad.net_network-manager_ubuntu_dists_intrepid_main_binary-i386_Packages
                  MD5: ec3ac72d26a5bdcabc1ea341c1a9cd6c

0.7~~svn20081018t105859-0ubuntu1.8.10.1 (/var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid-updates_main_binary-i386_Packages)
 Description Language: de
                 File: /var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid-updates_main_i18n_Translation-de
                  MD5: e9142dd85b1ee9c156ade5efd7d0469c
 Description Language:
                 File: /var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid-updates_main_binary-i386_Packages
                  MD5: e9142dd85b1ee9c156ade5efd7d0469c

0.7~~svn20081018t105859-0ubuntu1 (/var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid_main_binary-i386_Packages)
 Description Language: de
                 File: /var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid-updates_main_i18n_Translation-de
                  MD5: e9142dd85b1ee9c156ade5efd7d0469c
 Description Language: de
                 File: /var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid_main_i18n_Translation-de
                  MD5: e9142dd85b1ee9c156ade5efd7d0469c
 Description Language:
                 File: /var/lib/apt/lists/de.archive.ubuntu.com_ubuntu_dists_intrepid_main_binary-i386_Packages
                  MD5: e9142dd85b1ee9c156ade5efd7d0469c

Reverse Depends:
  network-manager,libnm-glib0 0.7~~svn20080908
  libnm-glib-dev,libnm-glib0 0.7-0ubuntu1~nm1~intrepid1
  network-manager-gnome,libnm-glib0 0.7~~svn20080908
  gnome-main-menu,libnm-glib0 0.7~~svn20080908
  network-manager-vpnc,libnm-glib0 0.7~~svn20080908
  network-manager-openvpn,libnm-glib0 0.7~~svn20080908
  network-manager-pptp,libnm-glib0 0.7~~svn20080908
  network-manager-pptp,libnm-glib0 0.7~~svn20080908
  network-manager-gnome,libnm-glib0 0.7~~svn20080908
  network-manager,libnm-glib0 0.7~~svn20080908
  libnm-glib-dev,libnm-glib0 0.7~~svn20081018t105859-0ubuntu1.8.10.1
  packagekit,libnm-glib0 0.7~~svn20080908
  network-manager-vpnc,libnm-glib0 0.7~~svn20080908
  network-manager-pptp,libnm-glib0 0.7~~svn20080908
  network-manager-openvpn,libnm-glib0 0.7~~svn20080908
  krb5-auth-dialog,libnm-glib0
  gnome-main-menu,libnm-glib0 0.7~~svn20080908
  network-manager-gnome,libnm-glib0 0.7~~svn20080908
  network-manager,libnm-glib0 0.7~~svn20080908
  libnm-glib-dev,libnm-glib0 0.7~~svn20081018t105859-0ubuntu1
Dependencies:
0.7-0ubuntu1~nm1~intrepid1 - libc6 (2 2.3.4) libdbus-1-3 (2 1.0.2) libdbus-glib-1-2 (2 0.74) libglib2.0-0 (2 2.16.0) libnm-util1 (2 0.7) libnspr4-0d (2 1.8.0.10) libnss3-1d (2 3.12.0~1.9b1) libuuid1 (2 1.05)
0.7~~svn20081018t105859-0ubuntu1.8.10.1 - libc6 (2 2.3.4) libdbus-1-3 (2 1...

Read more...

fschmitt (florian-schmitt) wrote :
Changed in network-manager:
status: Fix Released → Confirmed

I've the same problem on a just installed ubuntu. I took notice that this issue arose only when my wifi router configured in a protected mode (WPA-PSK/WPA2-PSK, TKIP+AES). In an open mode wifi is OK. Hope this notice will helpful.

fschmitt (florian-schmitt) wrote :

can't confirm dependency of protected vs open mode - for me, problem occurs in any case - connecting via ethernet cable, via open Wifi and via WPA-PSK-protected wifi. tested with two different routers.

Can somebody please fix this problem in the PPA port for Hardy, please. We run our servers in Hardy and we would not want to upgrade system to Intrepid. Hardy is a LTTS.

We are experiencing exactly same problem described here.

Please if you can help.

fschmitt (florian-schmitt) wrote :

After upgrading to Jaunty, DHCP worked for Ethernet Connections (before Jaunty, dhcp worked neither with eth nor with wlan). So i replaced the ath5k driver with the atheros madwifi one, and now everything workes without problems. Hmm - would be interesting if the problem was caused by the wlan device driver?

Thomas Hood (jdthood) wrote :

Is this bug still present in Precise?

Thomas Hood (jdthood) on 2012-07-07
Changed in network-manager (Ubuntu):
status: Confirmed → Incomplete
Launchpad Janitor (janitor) wrote :

[Expired for network-manager (Ubuntu) because there has been no activity for 60 days.]

Changed in network-manager (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers