race condition in launching DHCP client and restarting wired interface

Bug #419162 reported by Adam Piątyszek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
wicd
Fix Released
Medium
Dan O'Reilly

Bug Description

I use wicd-1.6.2 to manage both the wireless and wired interfaces. My hardware is DELL Latitude E6400 with Intel WiFi and ethernet cards (iwlagn and e1000e linux drivers).

The problem is that when the DHCP client is trying to get a DHCP address for a wired interface (eth0), it ends up with a timeout. I tried three different clients: dhcpcd, pump and dhclient. All behaves similarly.
I guess the problem is that the wired interface is disabled and reenabled just before the DHCP client is started. This is the log sequence from /var/log/messages:

Aug 26 12:37:16 tataj Registered led device: iwl-phy0::radio
Aug 26 12:37:16 tataj Registered led device: iwl-phy0::assoc
Aug 26 12:37:16 tataj Registered led device: iwl-phy0::RX
Aug 26 12:37:16 tataj Registered led device: iwl-phy0::TX
Aug 26 12:37:16 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 26 12:37:16 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 26 12:37:16 tataj pumpd[11517]: PUMP: sending discover
Aug 26 12:37:19 tataj e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Aug 26 12:37:47 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 26 12:37:47 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 26 12:37:50 tataj e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None

As you can see, the DHCP client (pump in this example) is started and sends a discovery packed 3 seconds before the eth0 interface is actually initialised. So this packet is probably lost.
If I execute "pump eth0" from a terminal, it immediately gets IP from the server. Similar situation is for the other two clients.

Is it possible that wicd does not check whether an interface is "up" before starting the DHCP client?

Thanks in advance for your help!

Revision history for this message
Dan O'Reilly (oreilldf) wrote :

Thanks for the report. Right now wicd doesn't wait for the interface to actually report being up after it issues the "ifconfig up" command. We should be though, and it's an easy addition to make. Expect it for the next version.

Changed in wicd:
assignee: nobody → Dan O'Reilly (oreilldf)
importance: Undecided → Low
milestone: none → 1.6.3
status: New → In Progress
importance: Low → Medium
Dan O'Reilly (oreilldf)
Changed in wicd:
status: In Progress → Fix Committed
Revision history for this message
Adam Piątyszek (ediap) wrote :

Hi Dan,

I tried your fix and it seems that it does not fix my problem. I still can not connect to a wired interface with wicd. Here is the log for lp:wicd branch:

Aug 31 09:24:21 tataj pumpd[21612]: starting at (uptime 0 days, 1:03:53) Mon Aug 31 09:24:21 2009
Aug 31 09:24:21 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 31 09:24:21 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 31 09:24:23 tataj pumpd[21612]: PUMP: sending discover
Aug 31 09:24:24 tataj e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: None
Aug 31 09:24:53 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X
Aug 31 09:24:53 tataj e1000e 0000:00:19.0: irq 26 for MSI/MSI-X

As you can see, the pump was started before the eth0 NIC is "up". Also the discover packet was sent 1 second before.

Revision history for this message
Adam Piątyszek (ediap) wrote :

Hi Dan,

I think now I understand my problems. This is specific to the Cisco 1Gbit switch we use in our company. It seems that it takes about 10-15 seconds to fully negotiate the link parameters between my network card and this switch. In the meantime the link status light is orange, which means the interface is not ready for full operation.
However, the mii-tool reports the link as fully functional:
  eth0: negotiated 1000baseT-FD flow-control, link ok
Therefore, the patch you applied does not change much in my case.

A workaround (or solution) for this problem might be to not disable and reenable the wired interface before executing the connection function. Do we really need to "ifconfig eth0 down" the interface in the connect/disconnect flow?

Revision history for this message
Dan O'Reilly (oreilldf) wrote :

For wireless networks, we definitely need the ifconfig down call. I'm not sure about wired networks. I'm always hesitant to make changes like that because something that works on my system or your system may very well break things on someone else's. Have you tried using dhclient instead of pump? That sends out more than one discover packet while trying to connect, which may give you better results. You could also try adding in time.sleep() calls to the connection code directly and see if that helps.

Revision history for this message
Adam Piątyszek (ediap) wrote :

I have tried pump, dhclient and dhcpcd so far. The former two does not work at all without hacking the connection code and introducing a very long sleep period (over 25s in my case) as you suggested. Eventually, the dhcpcd client is able to connect, but it takes about 60-90 seconds, which is very annoying.

I also modified the code and commented the iface.Down() in the _connect() method, so the wired interface is not disabled just before setting up the connection. As I expected this solves my problem and I am at least able to connect immediately with dhcpcd and pump. Unfortunately, dhcclient has some problems with obtaining the DHCP address for my Gennto system.

Changed in wicd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.