netcfg/choose_interface=auto fails to find the right interface

Bug #713385 reported by Steve Atwell
88
This bug affects 17 people
Affects Status Importance Assigned to Milestone
netcfg (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

Binary package hint: netcfg

Under some circumstances, netcfg may not be able to find the right interface to run dhclient on when netcfg/choose_interface is set to auto. It looks like the way choose_interface=auto works is that ethtool finds the lowest numbered interface that reports a link, and runs dhclient on that interface. If no interface with a link is found, it tries only eth0.

I'm hitting a problem on a number of servers that have one or two Broadcom BCM5708 interfaces *and* two Intel gigabit interfaces. If the network connection is plugged in to the BCM5708, the install will often fail to find a network with netcfg/choose_interface=auto.

The problem is that the BCM5708 doesn't report link up until you try to send traffic over it. So none of the interfaces on the server report having a link, and netcfg tries dhcp on just eth0. Depending on the order the network modules have been loaded, eth0 may be the BCM5708 or it may be the Intel. If eth0 is the Intel, d-i attempts to run dhclient on the wrong interface, and it fails.

I think a reasonable solution to this problem would be for netcfg to attempt dhclient on all interfaces until one succeeds. Or perhaps it should do this only when no interfaces report a link. Either way, I don't think we can rely entirely on link status, because not all NIC report this correctly.

Tags: sts
Revision history for this message
Colin Watson (cjwatson) wrote : Re: [Bug 713385] [NEW] netcfg/choose_interface=auto fails to find the right interface

Would you be able to retry this with a current natty installation image
(netboot links at http://cdimage.ubuntu.com/netboot/natty/)? netcfg's
link-detection behaviour was substantially changed upstream in version
1.60, and I would be interested to know if this addresses your problem.

Revision history for this message
Steve Atwell (satwell) wrote :
Download full text (7.0 KiB)

Natty doesn't seem to work any better, unfortunately.

This is on a Dell PowerEdge 2950 with two BCM5708 onboard NICs and a dual-port Intel 82571EB expansion card. PCI IDs of the network controllers:

~ # lspci -n | grep ' 0200:'
05:00.0 0200: 14e4:164c (rev 11)
09:00.0 0200: 14e4:164c (rev 11)
0c:00.0 0200: 8086:105e (rev 06)
0c:00.1 0200: 8086:105e (rev 06)

~ # uname -rvm
2.6.38-3-generic #30-Ubuntu SMP Thu Feb 10 00:33:26 UTC 2011 x86_64

And the relevant bits from the installer syslog:

Feb 15 01:33:07 netcfg[1590]: INFO: Starting netcfg v.1.60ubuntu2 (built 20110208-1933)
Feb 15 01:33:07 kernel: [ 17.470248] e1000e 0000:0c:00.0: irq 105 for MSI/MSI-X
Feb 15 01:33:07 kernel: [ 17.530073] e1000e 0000:0c:00.0: irq 105 for MSI/MSI-X
Feb 15 01:33:07 kernel: [ 17.530634] ADDRCONF(NETDEV_UP): eth0: link is not ready
Feb 15 01:33:08 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:08 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:08 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:08 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:09 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:09 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:09 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:09 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:10 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:10 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:10 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:10 netcfg[1590]: INFO: ethtool-lite: eth0 is disconnected.
Feb 15 01:33:10 netcfg[1590]: INFO: found no link on interface eth0.
Feb 15 01:33:10 netcfg[1590]: INFO: eth0 is not a wireless interface. Continuing.
Feb 15 01:33:11 kernel: [ 21.060277] e1000e 0000:0c:00.1: irq 106 for MSI/MSI-X
Feb 15 01:33:11 kernel: [ 21.120075] e1000e 0000:0c:00.1: irq 106 for MSI/MSI-X
Feb 15 01:33:11 kernel: [ 21.120612] ADDRCONF(NETDEV_UP): eth1: link is not ready
Feb 15 01:33:11 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:11 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:12 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:12 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:12 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:12 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:13 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:13 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:13 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:13 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:14 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:14 netcfg[1590]: INFO: ethtool-lite: eth1 is disconnected.
Feb 15 01:33:14 netcfg[1590]: INFO: found no link on interface eth1.
Feb 15 01:33:14 netcfg[1590]: INFO: eth1 is not a wireless interface. Continuing.
Feb 15 01:33:14 kernel: [ 24.491652] bnx2 0000:09:00.0: irq 107 for MSI/MSI-X
Feb 15 ...

Read more...

Changed in netcfg (Ubuntu):
assignee: nobody → Canonical Foundations Team (canonical-foundations)
Revision history for this message
Robbie Williamson (robbiew) wrote :

I'm thinking if netcfg properly supported mac address designation, then we wouldn't need to hack the "=auto" approach to handle nics that don't report link status. Would the workaround in https://bugs.launchpad.net/ubuntu/+source/netcfg/+bug/56679 help until we resolve this?

Revision history for this message
Robbie Williamson (robbiew) wrote :

BTW, the debian bug to address the solution proposed in the bug above (with a patch provided) is http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=615600.

Revision history for this message
Colin Watson (cjwatson) wrote :

I'm told out of band that the fix for bug 56679 should be sufficient to resolve this in practice. The link detection bug hasn't gone away, so I'm going to leave this bug open, but I'm unassigning it since it doesn't sound as though it needs to be treated with any particular priority. Leave a comment if I'm wrong ...

Changed in netcfg (Ubuntu):
status: New → Confirmed
importance: Undecided → Medium
assignee: Canonical Foundations Team (canonical-foundations) → nobody
Revision history for this message
Adam Koczur (sbv) wrote :

Colin, with all the respect, but I think it rather should be treated with some higher priority. I am currently trying to deploy a batch of new servers and because of this issue, the process cannot be fully automated - it keeps asking me to choose networking interface, no matter what value is assigned to 'd-i netcfg/choose_interface select'. It might not be a problem for someone how deploys one box a year. Not to say, that the Red Hat guy, who is sitting next to me, keeps laughing saying how professional and enterprise grade Ubuntu is. I think debian installer should be fixed / finished properly, at some point, as the disk partitioner is broken, too. I know it is the other issue, but try to preseed some more complex partition schema. This is actually edging to depressing...

Revision history for this message
Elson Rodriguez (elson-rodriguez) wrote :

Confirming that this is still a bug.

Also chiming in to say that specifying the interface isn't a viable solution for everyone, especially for massive automated installs.

Revision history for this message
Elson Rodriguez (elson-rodriguez) wrote :

For anyone else stuck on this, the workaround is to pass "netcfg/choose_interface=auto" to your kernel.

If you're using cobbler, just `cobbler distro edit --name=<distro> --kopts="netcfg/choose_interface"`

This bug should still be fixed though.

Revision history for this message
cloudbuilders (operations-8) wrote :

I'm having the same problem. We have a farm of 2000+ boxes, and everytime I try to re-install a box, we need to check whether it will boot from eth0 or eth1, to manually add that to txt.cfg.

The option: "netcfg/choose_interface=auto" it's not working. It will only try dhclient on one interface only.

Revision history for this message
petski (petski) wrote :

@operations-8, We had the same issue with Ubuntu 14.04.0, now that I'm using Ubuntu 14.04.5 I don't have these issues anymore. HTH.

Revision history for this message
Sergey Puzirev (spuzirev) wrote :

This bug made a lots of pain for me, but I've found workaround.
As cloudbuilders told option "netcfg/choose_interface=auto" still not working, and unfortunately BOOTIF=01-${mac-with-hyphens} is not working too.

There is undocumented feature in netcfg package. Option "netcfg/choose_interface" accepts raw MAC-address of nic to set up, so I passed to kernel cmdline option "netcfg/choose_interface=${netX/mac}" (${netX/mac} is variable in iPXE) and all works fine.

I don't know is it possible in PXELINUX, but it is easy in iPXE.

Revision history for this message
garett (garettmd) wrote :

How has this issue been going on since 2011 with no fix to make `interface=auto` work? Does anyone know where the code is that implements this?

Revision history for this message
skaiser (skaiser) wrote :

I can confirm that this Bug is again present in Debian 9.

Motherboard is Asus AB350 Gaming K4 P2.60
Internal NIC is detected as enp10s0 (nr. 2 in list by `ip address show`) (no wire plugged)
External NIC is detected as enp7s0 (nr. 3 in list by `ip address show`) (wire plugged)

Debian 9 tries to get DHCP Config via internal NIC - fails, and aborts the installation process.

Debian 8 Works with the right choose of interface

Even BOOTIF does not have any affect here.

Revision history for this message
skaiser (skaiser) wrote :

Did not wrote - also affects Ubuntu 16.04 LTS, same as Debian 9 - no preseeding possible with multiple NICs

Revision history for this message
skaiser (skaiser) wrote :

following script gives a working workaround for ubuntu 16 and debian 9: https://bugs.launchpad.net/ubuntu/+source/netcfg/+bug/56679/comments/5

Revision history for this message
spencerwjensen (spencerwjensen) wrote :

I can also confirm this is a bug. We are using Cobbler to netboot systems (RedHat and Ubuntu). The RedHat systems have no problem using PXE or iPXE with multiple NICs. Ubuntu 16.04 *seems* to work with PXE, but fails during DHCP with iPXE chainloading.

I believe the issue is that the standard PXE process actually passes some network information to the bootloader so the DHCP process chooses the right interface to bring up, but this same information is getting dropped along the way during iPXE chainloading.

This wouldn't be an issue for us because we are actually assigning static IPs in the preseed file, however, the preseed file isn't available until AFTER the NIC is up to pull down the file (chick v. egg). It might be possible to pre-load all of the static network information in the kernel options and disable DHCP altogether, however, this seems a bit obtuse for a process which should be simple and lightweight in the first place.

I think the correct behavior for "interface=auto" is that it actually loops through the interfaces and tries to DHCP on each one until it gets a valid response.

In the meantime, since this bug is SUPER old, I can +1 verify that @spuzirev's workaround with "interface=${netX/mac}" in the kernel options works like a charm and avoids the "BOOTIF=<mac>" hardcoding.

information type: Public → Public Security
information type: Public Security → Public
Revision history for this message
Nicholas Digati (ndigati) wrote :

I can also confirm this bug is still happening with the Ubuntu 18.04.2 installer. Using BOOTIF and "netcfg/choose_interface=auto" the installer defaults to using eno1 (first interface in the list). These machines all have an external network card they use the PXE boot.

I see the following messages in the installer syslog which makes it seem like it knows which interface to use but ignores it afterwards and uses a different interface:

Mar 6 03:46:42 netcfg[1195]: INFO: Starting netcfg v.1.142ubuntu7
Mar 6 03:46:42 netcfg[1195]: WARNING **: Couldn't read Wpasupplicant pid file, not trying to kill.
Mar 6 03:46:42 netcfg[1195]: DEBUG: Flusing addresses and routes on interface: eno1
Mar 6 03:46:42 netcfg[1195]: DEBUG: Flusing addresses and routes on interface: eno2
Mar 6 03:46:42 netcfg[1195]: DEBUG: Flusing addresses and routes on interface: eno3
Mar 6 03:46:42 netcfg[1195]: DEBUG: Flusing addresses and routes on interface: eno4
Mar 6 03:46:42 netcfg[1195]: DEBUG: Flusing addresses and routes on interface: enp4s0
Mar 6 03:46:42 netcfg[1195]: INFO: Found interface enp4s0 with link-layer address
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface eno1
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface eno2
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface eno3
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface eno4
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface enp4s0
Mar 6 03:46:42 netcfg[1195]: INFO: Taking down interface lo
Mar 6 03:46:42 netcfg[1195]: INFO: Activating interface eno1
Mar 6 03:46:42 netcfg[1195]: INFO: Waiting time set to 3
Mar 6 03:46:42 netcfg[1195]: INFO: ethtool-lite: eno1: carrier down
... (repeats message for 3 seconds)
Mar 6 03:46:45 netcfg[1195]: INFO: Reached timeout for link detection on eno1
Mar 6 03:46:45 netcfg[1195]: DEBUG: Commencing network autoconfiguration on eno1

Some of the machines I've tested on it appears to have worked and some don't work at all (like the one with logs above). On the machines where it appears to work the internal interfaces are getting different names (enp6s0f0, enp6s0f1) so the external card comes first in the list (with the name enp5s0). So it just seems like it works when it's still just using the first interface in the list.

We worked around this for now by disabling the internal network card/interfaces in the BIOS so it only detects the external card.

(P.S sorry for information type updates above clicked too many times)

Revision history for this message
mieba (xxadministratorxx) wrote :

Ubuntu 16.04.x must add these two companions to the APPEND option

only netcfg/choose_interface=auto is not work , must be added IPAPPEND 2

成功的tftp/default

DEFAULT ubuntu16.04
LABEL ubuntu16.04
KERNEL http://xxx.com/linux
APPEND initrd=http://xxx.com/initrd.gz auto=true priority=critical netcfg/choose_interface=auto --
IPAPPEND 2

Revision history for this message
Bryan Hill (bryandhill) wrote :

I can confirm that adding "IPAPPEND 2" in my pxe config has fixed the issue for me on Ubuntu 18.04.2

Revision history for this message
Kieran Kunhya (kierank) wrote :

The IPAPPEND solution is only useful if netbooting. What about if you are not netbooting, netcfg picks enp10s0 instead of enp7s0.

Revision history for this message
libiao (jonly) wrote :

How to add IPAPPEND 2 into /var/lib/tftpboot/grub/grub.cfg ? What option for http://archive.ubuntu.com/ubuntu/dists/bionic-updates/main/uefi/grub2-amd64/current/grubnetx64.efi.signed ?

Revision history for this message
Bzzz (da-bzzz) wrote :

netcfg/choose_interface=auto does work in 22.04 installers. When setting to manual, any WiFi configuration is ignored (even after selecting the appropriate entry), but auto and a valid config does indeed connect to the predefined network.

Revision history for this message
Chuan Li (lccn) wrote :

Is there any plan/timetable fixing this issue?

Revision history for this message
Chuan Li (lccn) wrote :

One of UA customer runs into this issue at 20.04.
The customer is not using PXE boot to install server. The customer is using a customized ISO to install the server, so the comment #18 is not applicable.

For example, the preseed is like

d-i netcfg/choose_interface select auto
d-i netcfg/disable_autoconfig boolean true
d-i netcfg/get_ipaddress string 10.x.x.x
d-i netcfg/get_netmask string 255.255.255.224
d-i netcfg/get_gateway string 10.x.x.x
d-i netcfg/get_nameservers string 10.x.x.x
d-i netcfg/confirm_static boolean true

When the server has multiple interfaces but only one interface is connected, the 'auto' can not recognize the right one.
The workaround is to select the right one manually as 'd-i netcfg/choose_interface select eno5'.

But if different machines have different interfaces being connected, the customized ISO will need to be constantly modified accordingly.

tags: added: sts
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.