ipconfig binary (from klibc-utils) destroys the network stack with Broadcom 5721 using the tg3 module

Bug #305180 reported by Ryan Steele
2
Affects Status Importance Assigned to Milestone
klibc (Ubuntu)
New
Undecided
Unassigned

Bug Description

Binary package hint: klibc-utils

I've got a server with a Broadcom network card (a 5721, to be precise) and I'm using the tg3 driver, and the box with that NIC that just absolutely refuses to get through the initrd. The installation hangs in the initramfs in /scripts/live on the function do_netmount(), and I'm pretty sure it's because the 'ipconfig' binary included in the initrd is killing networking. I've spent a few days now hacking the initrd, the init script, and it's functions to determine the path it takes to get there, which appears to be:

  1. init is invoked
  2. init sources /scripts/live (since boot=live in the pxelinux.cfg), and then calls the function mountroot(), which is
     defined in /scripts/live
  3. mountroot calls several other functions within the /scripts/live script, and eventually gets to do_netmount()
  4. inside do_netmount, it encounters a line in which the binary 'ipconfig' (yes, ipconfig, not ifconfig) is called, and this is
     where it hangs. I've added some debugging code for clarity:

  do_netmount()
  {
      rc = 1

      modprobe -q af_packet # For DHCP

      udevtrigger
      udevsettle

      # Echo the device it wants to use
      echo -e "\nThis is right before we 'ipconfig ${DEVICE}'\n" > /dev/console 2>&1
      ipconfig ${DEVICE} | tee /netboot.config

      # Echo to let us know that we got past the ipconfig line
      echo -e "\nRight before sourcing ipconfig output\n" > /dev/console 2>&1
      # source relevant ipconfig output
      OLDHOSTNAME = ${HOSTNAME}
      . /tmp/ net- ${DEVICE} . conf

      <snip>
  }

With that debugging output in place, the last output to the console is :

  This is right before we 'ipconfig eth0'

  [100.068705] tg3 : eth0 : Link is up at 1000 Mbps, full duplex .
  [100.068767] tg3 : eth0 : Flow control is off for TX and off for RX .
  [393.930374] Machine check events logged
  [699.732829] Machine check events logged

... and from there it just hangs indefinitely . I know for a fact that the kernel module, tg3.ko, is being loaded by load_modules, so that's not the problem - in fact I'm almost 100% positive that 'ipconfig' is killing network connectivity. The only way I can get a machine to be operable over the network again is to get a console and issue an '/etc/init.d/networking restart'.

I have several other servers that use different network drivers (igb, e1000, etc .) that all seem to work just fine, which furthers my feelings that this Broadcom card is just poorly supported on Linux. I've tried both the tg3.ko that ships with Ubuntu, and compiling the driver myself, both with the same results.

I really think that destroying the network stack and doing nothing about it is a horrible implementation. Yeah, the Broadcom card sucks, but ipconfig should be a bit more robust. Have it time out, try another interface. Anything other than crashing networking and hanging indefinitely, a process that renders the box and PXE absolutely useless.

I was also able to get this same behavior on a VM with a virtual interface - the only way to restore network connectivity to the VM was to pop in to the VMWare console and issue an /etc/init.d/networking restart. If you don't have console access and the box isn't local, you're taking a drive to wherever the box is physically located, which is unacceptable.

Revision history for this message
Ryan Steele (ryans-aweber) wrote :

Some more info about the environment:

Version: 8.04 LTS (Hardy)
Kernel: 2.6.24-21-generic

Revision history for this message
Ryan Steele (ryans-aweber) wrote :

I fixed this by commenting out the 'ipconfig' line, and calling the configure_networking function. I also had to make that function look for net-*.conf, instead of net-${DEVICE}.conf, since by default it looks at eth0 (because it sources the initramfs.conf, which sets DEVICE=eth0).

Revision history for this message
maximilian attems (maks-debian) wrote :

Can you still reproduce that on Maverick?

Ubuntu 10.10 saw several improvments for netbooting thanks to newer initramfs-tools and klibc.
thanks for feedback.

Revision history for this message
Ryan Steele (ryans-aweber) wrote :

Hi Max,

Unfortunately, I doubt we'll be deploying Maverick any time in the near future, so it may be awhile before I can get back to you on this. When I do, I'll let ya know.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.