PXE Installation Failure - Ubuntu14.04.5

Bug #1667790 reported by Ameen Rahman
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
debian-installer (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Steps to Reproduce:
1.Boot to BIOS setup menu to change to UEFI mode.
2.Boot form PXE of NIC.
3.Select Ubuntu 14.04.5 for the installing.
4.then, there's an error message appeared, as in the attachment (/Error_Messages)

When symptom occurs(network configuration failure during installation), our NIC’s network function still works, i.e., it still can ping to PXE server.

Logs and trace in the attachment.
          /BB_100G: failed case
                    Install: system installation logs. “syslog” contains more detailed debug message.
                    bb.pcapng: network trace.
                    dmesg.txt: output of system #dmesg
                    ip_address.txt: output of #ip address show
          /Intel_1G: pass case
                    dmesg.txt
                    ip_address.txt
                    intel.zip: zipped network trace.

Quick summary for the test setup:
 General PXE server can be used. I referred this page to setup ubuntu 14.04 PXE installation( http://www.tecmint.com/add-ubuntu-to-pxe-network-boot/ )
 Add below entry into PXE installation menu “default” file(/var/lib/tftpboot/pxelinux.cfg/default)

     label linux
                menu label ^Install Ubuntu 14.04 x64
               kernel ubuntu-14.04/ubuntu-installer/amd64/linux
                append ks=http://192.168.10.100/linux/ks.cfg vga=normal initrd=ubuntu-14.04/ubuntu-installer/amd64/initrd.gz ramdisk_size=16432 root=/dev/rd/0 rw netcfg/choose_interface=p4p1 –

 The highlighted texts in red need to be changed according to your server.
 Change p4p1 to the interface name shown on your DUT during installation, in my case p4p1 is used.

 Copy all files on Ubuntu 14.04.05 CD to /var/www/html/Ubuntu1404_x64 for http installation

 Put the kickstart configuration “ks.cfg” to /var/www/html/linux

 Put the initrd.gz into /var/lib/tftpboot/ubutu-14.04/Ubuntu-installer/amd64
 Please note that the initrd.gz had been injected with our qed.ko and qede.ko.
 The Ubuntu 14.04 netboot I am using is downloaded from http://archive.ubuntu.com/ubuntu/dists/trusty-updates/main/installer-amd64/current/images/xenial-netboot/ , where amd64 xenial’s 4.4 HWE kernel is used in my setup.
 I refereed the below pages to inject our drivers.
http://tomoconnor.eu/blogish/hacking-initrdgz-ubuntu-netboot-installer/#.WLAHUctPo5s
https://ubuntuforums.org/showthread.php?t=1843448

We see a bunch of failures in syslogs. could they be causing this ? [We have seen these failures could cause installation issues on some Ubuntu bugs]
There are no failures specific to driver logs – link etc. also shows connected for driver interface.

We saw wire shark traces of BB_100G and saw lot of TFTP [Trivial file transfer protocol ] packets which looks quite same as intel traces.
Is this issue always reproducible or it got passed ever ?

Feb 24 06:31:39 netcfg[3724]: WARNING **: Couldn't read Wpasupplicant pid file, not trying to kill.
Feb 24 06:31:39 netcfg[3724]: DEBUG: Flushing addresses and routes on interface: p4p1
Feb 24 06:31:39 netcfg[3724]: INFO: Could not find valid BOOTIF= entry in /proc/cmdline

Feb 24 06:31:48 netcfg[3724]: DEBUG: rdisc6 line: Soliciting ff02::2 (ff02::2) on p4p1...
Feb 24 06:31:48 netcfg[3724]: DEBUG: rdisc6 line: Timed out.
Feb 24 06:31:48 netcfg[3724]: DEBUG: rdisc6 line: No response.
Feb 24 06:31:48 netcfg[3724]: DEBUG: rdisc6 parsing finished
Feb 24 06:31:48 netcfg[3724]: DEBUG: Stopping rdnssd, PID 3747
Feb 24 06:31:48 netcfg[3724]: DEBUG: No RA received; attempting IPv4 autoconfig
Feb 24 06:31:48 netcfg[3724]: WARNING **: Started DHCP client; PID is 3787

Feb 24 06:31:51 netcfg[3724]: DEBUG: Network config complete
Feb 24 06:31:51 netcfg[3724]: DEBUG: No interface given; clearing /etc/network/interfaces
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing informative header
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing loopback interface
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing DHCP stanza for p4p1
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing wireless options for p4p1
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:52 main-menu[2545]: WARNING **: Configuring 'netcfg' failed with error code 139
Feb 24 06:31:52 main-menu[2545]: WARNING **: Menu item 'netcfg' failed.
Feb 24 06:31:52 kernel: [ 62.915859] netcfg[3724]: segfault at 0 ip 00007f1ab154e34f sp 00007ffc781aec78 error 4 in libc.so.6[7f1ab14b1000+1ba000]

It look like the network configuration failure.
But, We’d tried other vendors’ NICs(Intel and Broadcom), their NICs don’t have this failure under the same PXE environment...
The major difference is these NICs have had inbox drivers in Ubuntu netboot(initrd.gz).

Revision history for this message
Ameen Rahman (arahman) wrote :
Revision history for this message
Ubuntu Foundations Team Bug Bot (crichton) wrote :

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1667790/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
affects: ubuntu → debian-installer (Ubuntu)
tags: added: trusty
Revision history for this message
Manish Chopra (mchopra1988) wrote :

Hi Brian,

In case of non working interface we observe these logs - related to netcfg segfault.
This is what we see main difference between working [where we don't see these failure logs] and non working interface.

eb 24 06:31:49 dhclient: DHCPREQUEST of 192.168.10.91 on p4p1 to 255.255.255.255 port 67 (xid=0x419a4dac)
Feb 24 06:31:49 dhclient: DHCPACK of 192.168.10.91 from 192.168.10.100
Feb 24 06:31:49 dhclient: bound to 192.168.10.91 -- renewal in 2494 seconds.
Feb 24 06:31:51 netcfg[3724]: DEBUG: Reading domain name returned via DHCP
Feb 24 06:31:51 netcfg[3724]: DEBUG: DHCP domain name is 'redhatguides.local'
Feb 24 06:31:51 netcfg[3724]: DEBUG: Reading nameservers from /etc/resolv.conf
Feb 24 06:31:51 netcfg[3724]: DEBUG: Read nameserver 192.168.10.1
Feb 24 06:31:51 netcfg[3724]: DEBUG: State is now 1
Feb 24 06:31:51 netcfg[3724]: DEBUG: State is now 2
Feb 24 06:31:51 netcfg[3724]: DEBUG: State is now 5
Feb 24 06:31:51 netcfg[3724]: INFO: DHCP hostname: "kickseed"
Feb 24 06:31:51 netcfg[3724]: DEBUG: kickseed is a valid FQDN
Feb 24 06:31:51 netcfg[3724]: DEBUG: Preseeding domain from global: redhatguides.local
Feb 24 06:31:51 netcfg[3724]: DEBUG: State is now 6
Feb 24 06:31:51 netcfg[3724]: DEBUG: Network config complete
Feb 24 06:31:51 netcfg[3724]: DEBUG: No interface given; clearing /etc/network/interfaces
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing informative header
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing loopback interface
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing DHCP stanza for p4p1
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing wireless options for p4p1
Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
Feb 24 06:31:52 main-menu[2545]: WARNING **: Configuring 'netcfg' failed with error code 139
Feb 24 06:31:52 main-menu[2545]: WARNING **: Menu item 'netcfg' failed.
Feb 24 06:31:52 kernel: [ 62.915859] netcfg[3724]: segfault at 0 ip 00007f1ab154e34f sp 00007ffc781aec78 error 4 in libc.so.6[7f1ab14b1000+1ba000]

but we don't know why this can be caused -
Any idea how to debug this further for the cause of netcfg segfault ?

Appreciate your help in regards to this.

Thanks,
Manish

Revision history for this message
Ameen Rahman (arahman) wrote :

Hi,

This failure was a result of netcfg segfault seen on the logs.

------------------8<--------------->8----------------------------
Feb 24 06:31:51 netcfg[3724]: DEBUG: Writing wireless options for p4p1
 Feb 24 06:31:51 netcfg[3724]: DEBUG: Success!
 Feb 24 06:31:52 main-menu[2545]: WARNING **: Configuring 'netcfg' failed with error code 139
 Feb 24 06:31:52 main-menu[2545]: WARNING **: Menu item 'netcfg' failed.
 Feb 24 06:31:52 kernel: [ 62.915859] netcfg[3724]: segfault at 0 ip 00007f1ab154e34f sp 00007ffc781aec78 error 4 in libc.so.6[7f1ab14b1000+1ba000]
------------------8<--------------->8----------------------------

We have root caused a bug in the qede driver that was triggering this.

qede IOCTL implementation returns '0' instead of -EOPNOTSUPP for all IOCTL requests made by user.
wireless-lib used by netcfg as part of the PXE boot process tries learning whether our interface is wireless via sending an IOCTL; As we return '0', it learns we're a wireless interface and does some incorrect configurations. Later, netcfg would segfault due to broken information.
Fixed qede IOCTL callback to return -EOPNOTSUPP to everything other than the timesync requests.

Having said that, does netcfg also require a fix to gracefully fail in this case with a meaningful error message, rather than segfault so that it's easier to debug such issues.

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in debian-installer (Ubuntu):
status: New → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Bug attachments

Remote bug watches

Bug watches keep track of this bug in other bug trackers.