LTSP clients kernel panic on boot when active nic isn't eth0

Bug #365380 reported by Kenneth Finnegan on 2009-04-23
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
ltsp (Ubuntu)
Undecided
Unassigned

Bug Description

Upgraded my LTSP server from Hardy to Jaunty with a clean install. I can boot my laptop as a client no problem, but one of my other desktops fails. It worked fine with Hardy.

It gets an IP address, downloads a kernel image, and then sits at the boot splash screen as the bar bounces back and forth, then the screen goes blank and it displays:
[ 63.564633] Kernel panic - not syncing: Attempted to kill init!

ProblemType: Bug
Architecture: i386
DistroRelease: Ubuntu 9.04
NonfreeKernelModules: nvidia
Package: ltsp-server 5.1.65-0ubuntu2
PackageArchitecture: all
ProcEnviron:
 SHELL=/bin/bash
 LANG=en_US.UTF-8
SourcePackage: ltsp
Uname: Linux 2.6.28-11-generic i686

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Can you try booting the thin client without the splash and quiet options
(you can change that in /var/lib/tftpboot/ltsp/i386/pxelinux.cfg/default
on the server), that should give you a bit more information on what went
wrong.

I'm especially looking at an error from either ipconfig (network
configuration) or nbd (connection to the network harddisk).
Is the problem happening with similar network cards ? Do these thin
clients only have one network card or two ?

Thanks
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknv+VwACgkQjxyfqkjBhuw46gCfe3cD/fTGsmGxvtpcOi1WmUjE
rC4An174NeNgw/Hw7FOmCgSFp6i354Bw
=sm7e
-----END PGP SIGNATURE-----

Alright. Got it: It has a second NIC, which it happens to set as eth0, tries to mount the nbd on it, fails since it isn't plugged in, and dies. The workaround was either connecting the second port to the network, or uninstall it completely. I'm surprised it isn't smart enough to retry on eth1 once it fails on eth0.

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Kenneth Finnegan wrote:
> Alright. Got it: It has a second NIC, which it happens to set as eth0,
> tries to mount the nbd on it, fails since it isn't plugged in, and dies.
> The workaround was either connecting the second port to the network, or
> uninstall it completely. I'm surprised it isn't smart enough to retry
> on eth1 once it fails on eth0.

We don't have any test for that in the initrd and IIRC the code comes
directly from Debian so not that easy to change. There may be a way to
manually specify the network interface as boot parameter but for that
you'll need to create a custom pxelinux config for each of these clients.

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iEYEARECAAYFAknwT5IACgkQjxyfqkjBhuzCswCeNmuiCYplhvYn9+979+2rZZlS
PcIAnRjrwXaV0wNbSWzySZxOwo5/HPW4
=tNAi
-----END PGP SIGNATURE-----

This problem exist on the Asus EEE PC-900 (output of `lspci -vvv` of locally installed jaunty attached).

Even disabling the embedded wlan device in the bios doesn't resolve this issue.

The output is (after about 4 seconds initializing usb):

ipconfig: eth0: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
/init: .: line 1: can't open /tmp/net-eth0.conf
kernel panic - not syncing: Attempted to kill init!

Other references to this problem:
 * https://lists.ubuntu.com/archives/edubuntu-users/2009-April/005315.html - LTSP boot problem
 * http://ubuntuforums.org/archive/index.php/t-975417.html - Diskless broken on Intrepid

summary: - Some LTSP clients kernel panic on boot
+ LTSP clients kernel panic on boot when active nic isn't eth0
Nick (xepecine) wrote :

I have the same problem (After upgrading to Jaunty one of the clients refuses to boot giving Kernel Panic). But I have only one NIC at client side.

That's what I've got:

Begin: Loading essential drivers
...
8139too: Fast Ethernet driver 0.9.28
8139cp: 10/100 PCI Ethernet Driver v 1.3 (Mar 22,2004)
...
...............
ipconfig: eth0: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
/init: .: line 1: can't open /tmp/net-eth0.conf
kernel panic - not syncing: Attempted to kill init!

So it has found Ethernet adaptor and has all drivers. But can't see it as eth0.

When I boot from live cd 7.10 lshw gives the folowing:

Description: Ethernet interface
...
Logical name: eth0
...
driver=8139too module=8139too..

So 7.10 finds only one Ethernet interface and sees it as eth0.

Oliver Grawert (ogra) wrote :

as a workaround you can put ip=:::::eth0 (and change eth0 to whatever your device is) behind splash in /var/lib/tftpboot/ltsp/i386/pxelinux.cfg/default

Nick (xepecine) wrote :

This doesn't help.

In fact, when I've booted from 9.04 live cd, lshw gave me following difference with 7.10 live cd:

9.04:
/0/100/5 eth0 network RTL-8139/8139C/8139C+
/1 pan0 network Ethernet interface

7.10:
/0/100/5 eth0 network RTL-8139/8139C/8139C+

When I put ip=:::::eth0 in /var/lib/tftpboot/ltsp/i386/pxelinux.cfg/default I get exactly the same situation

ipconfig: eth0: SIOCGIFINDEX: No such device
ipconfig: no devices to configure
/init: .: line 1: can't open /tmp/net-eth0.conf
kernel panic - not syncing: Attempted to kill init!

How else can I see what devises are there at the client side?

Guevara (eguevara2012) wrote :

I am using Ubuntu 9:04 Desktop with server kernel 2.6.28.13, after upgrade of chroot appears this message on the clients:

/init: .: line 1: can't open /tmp/net-eth0.conf
kernel panic - not syncing: Attempted to kill init!

The NetworkManager recognizes the eth0 as Auto eth0, have something to do with this bug?

Before using the forum search for the answer and tried a few things without success.

Someone knows the solution to the problem?

Guevara (eguevara2012) wrote :

I tried this and did not work:

"The good people at IRC #ltsp helped me solve the problem

here's what I did to get PXE boot to work with the marvell/sky2 driver

sudo nano /opt/ltsp/i386/etc/initramfs-tools/modules and add "sky2"
sudo nano /opt/ltsp/i386/usr/share/initramfs-tools/hook-functions and
add "sky2" at the end of this line:
r8169 s2io sis900 skge slhc smc911x starfire sky2 \
sudo chroot /opt/ltsp/i386 update-initramfs -u
sudo ltsp-update-kernels
and problem solved! "

My server use r8169 and a RTL8139, clients using rtl8139, i put the module 8139too and r8169 in the modules file and the problem persist.

http://ubuntuforums.org/archive/index.php/t-874568.html

I had the same error as #1. The problem was DHCP, there were no more free ips in the pool.

For some reason the client would actually get an IP from DHCP at PXE boot. It used it to download the kernel, but then it tried again before the final LTSP boot, failing this time.

Making the DHCP ip pool bigger solved the problem.

Sameer Verma (sverma-sfsu) wrote :

We have Dell Optiplex 745 as thin clients for a Jaunty LTSP setup. Everything works well on 8.10 LTSP, but on jaunty, on the client side I see (I am typing what I see):

tg3: eth0: Link is up at 100 Mbps, full duplex.
tg3: eth0: Flow control is off for TX and off for RX.
ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
IP-Config: no response after 60 secs - giving up
/init: .: line 1: can't open /tmp/net-eth0.conf
Kernel panic - not syncing: Attempted to kill init!
Dumping ftrace buffer:
(ftrace buffer empty)

This is preventing me from switching an entire lab from 8.10 to 9.04.

Oliver Grawert (ogra) wrote :

doesnt the workaround mentioned above work for you ?

Sameer Verma (sverma-sfsu) wrote :

Hi Oli,

I tried ip=:::::eth0 after the splash in /var/lib/tftpboot/ltsp/i386/pxelinux.cfg/default and ltsp-update-image but that doesn't help. The boot hangs for a bit longer, but I still get the same message.

I've verified the driver to be tg3 and that seems to be loaded. It seems suspicious of the addition of pan0 as an interface.

I have not tried increasing the DHCP pool. Will try that as well in a bit.

Sameer Verma (sverma-sfsu) wrote :

Increasing the pool didn't help. Its currently at 192.168.0.20 to 192.168.0.250

When this specific thin client boots up, it gets 192.168.0.210 So, if I edit /var/lib/tftpboot/i386/pxelinux.cfg/default and add ip=192.168.0.210:::::eth0 then the machine still hangs at the same point, but doesn't get kernel panic. I get dropped to command prompt (initramfs I think? I'm not at the machine right now).

Any pointers?

Alkis Georgopoulos (alkisg) wrote :

First, try to put a *complete* ip=xxx parameter there:
<client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>
Fill *all* of the above, with autoconf=none.

If that works, then you're certain it's a DHCP problem.

Karmic contains a new script (/opt/ltsp/i386/usr/share/initramfs-tools/scripts/init-premount/udhcp) that can handle multiple NICs. So one way around the problem would be to copy the Karmic script in your chroot, and either install udhcpc in the chroot or use IPAPPEND 3 in the pxelinux.cfg/default.

Instead of copying the Karmic script to your chroot you could also add stgraber's PPA (https://launchpad.net/~stgraber/+archive/ppa) to your (both server & chroot) sources. This would get you the latest LTSP along with the script, but you would still need to either install udhcpc or use IPAPPEND 3.

And of course on any initramfs modifications, update-initramfs -u needs to be ran in the chroot. And on any chroot modifications (=udhcpc installation), ltsp-update-image.

Nick (xepecine) wrote :

I've solved this problem by adding acpi=off to my /var/lib/tftpboot/ltsp/i386/pxelinux.cfg/default .

Finally I have normal boot for this damn client!!!

Nick (xepecine) wrote :

It's better to put pci=noacpi instead of acpi=off so the client still able to shutdown.

Changed in ltsp (Ubuntu):
status: New → Incomplete
Sameer Verma (sverma-sfsu) wrote :

After mucking around with Jaunty LTSP with Karmic LTSP PPA, initramfs, etc. I decided to jump into Karmic Beta itself. Our LTSP server has two drives, so I can afford to experiment with an alternative setup while keeping the working one (LTSP 8.10) on the other drive.

Karmic LTSP based on the alternate beta release from a few days ago had problems in detecting my network cards. It detected one only (eth0) so I had to had edit eth1 after the install. It also failed to get IP via dhcp, so I had to manually enter IP info for eth0. Even after this install, the thin client would load, but not authenticate (cannot connect to server). Also, udhcpc would cycle five times and fail four times until it got an IP on the fifth try. In none of the cases did I see the new Karmic swooshing KnightRider-like bar. Instead I see all the udhcpc failure messages.

After updating the install today (6pm Pacific) and rebuilding the thin client, it finally worked. It still boot up and shows udhcpc failure for four or five times, until it eventually gets the IP, after which, it loads the thin client pretty fast and shows the default login screen. This works for all 33 machines in our classroom/lab. Lots of LTSP screens tickle your fancy? See http://twitgoo.com/40p4m

I'll test with a full class (33 stations) on Monday and report back if anything breaks. We'll be brave and run the rest of the semester on Karmic!

Dan Shechter (dans) wrote :

While I'm not using LTSP myself, I am using NFS Boot with PXE, and I'm experiencing the same DHCP woes that have been mentioned here.

I've followed the advice by Alkis Georgopoulos, and it really did help.
After testing out how specifying a complete ip=xxx works (Like Alkis specified: <client-ip>:<server-ip>:<gw-ip>:<netmask>:<hostname>:<device>:<autoconf>)
and seeing the client boot, I proceeded to replace / upgrade the initramfs-tools as Alkis suggested.
For those who are not necessarily experience with this... What I did was download the ltsp source distribution, copied the following files INTO the /etc/initramfs-tools:
./scripts/init-premount/udhcp
./hooks/udhcp

and regenerated the initrd file by issuing a:
mkinitramfs -o /tmp/initrd-pxeboot

This is the initrd file I use to boot from NFS successfully.
As a side note, I did add a "IPAPPEND 3" line to the pxeboot.cfs/default file as the udhcp script in
"./scripts/init-premount/udhcp" suggestes to do...

Hope this helps...

Alkis Georgopoulos (alkisg) wrote :

We are closing this bug report because it lacks the information we need to investigate the problem, as described in the previous comments. Please reopen it if you can give us the missing information, and don't hesitate to submit bug reports in the future. To reopen the bug report you can click on the current status, under the Status column, and change the Status back to "New". Thanks again!

Changed in ltsp (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers