/etc/init.d/ltsp-client-setup has bashisms

Bug #281498 reported by Jordan Erickson on 2008-10-10
2
Affects Status Importance Assigned to Milestone
ltsp (Ubuntu)
Undecided
Scott Balneaves

Bug Description

Please see the following #ltsp conversation regarding ltsp-client-setup errors. Ogra should know about this bug already. There might be repercussions to the specific block of code in question, but I know that the bashisms shouldn't be there on line 27. Also see https://lists.ubuntu.com/archives/edubuntu-users/2008-October/004652.html for a list discussion.

Cheers,
Jordan/Lns
----
<Lns_onsite> Anyone know anything about this regarding LTSP? I'm on-site at one school and I'm seeing, during TC bootup, something similar to the following right after "Setting up LTSP Client" - ..."Negotiation: Invalid operator" - and sometimes it hangs, up to 3-5 minutes, then boots to LDM. I *just* turned on NBD_SWAP this week and this is happening at 2 sites...also upgraded the chroot with hardy-updates - and that's it. All I've found is this. https://bugs.launchpa
 d.net/ubuntu/+source/linux-source-2.6.20/+bug/136410/
 It usually only happens when booting a bunch of clients at once
 I get nothing useful in chroot logs or on the server
<-- Q-FUNK has quit ("Leaving.")
 mccann has quit (Read error: 110 (Connection timed out))
<Lns_onsite> [ 27: ?: unexpected operator
 ^^ That's exactly what it says when it hangs
<-- staffencasa has quit ("Leaving")
<Lns_onsite> So weird, googling shows a lot of kernel-level stuff related to that msg
--> johnny (<email address hidden>) has joined #ltsp
<Lns_onsite> wow someone is alive here! =) johnny, have you ever seen " [ 27: ?: unexpected operator" while booting a TC?
<Ryan52> Lns_onsite: maybe an evil bashism while your using a non bash shell as /bin/sh?
<johnny> yes
 sounds like it to me
<Lns_onsite> Ryan52: possibly, yes
<Ryan52> Lns_onsite: so change your /bin/sh :)
<Lns_onsite> althoguh I'm not sure where it could be, it references no files
<johnny> that's cuz Ryan52 is smarter than me
<Lns_onsite> ;)
 Either something in nbdswapd or in one of the updates in hardy-updates is causing this it seems
 it comes right after "Setting up LTSP client"
 is that ltsp-client-setup script?
 (yep it is)
<monteslu> Lns_onsite, nbd has caused me so much grief over the last couple of week that I'm switching to fedora tonight
<Lns_onsite> monteslu: heh, are you having the same issues? What does fedora use, nfs?
<monteslu> it can use nbd too, but it doesn't by default
 yeah nfs
<Lns_onsite> I haven't had any issues with NBD until i enabled nbdswap
<monteslu> I've posted questions to the edubuntu-users list
 I didn't enable anything, but I've been getting tons of nbd error in /var/log/messages
 so the default settings are somehow hosed
<Lns_onsite> monteslu: have you updated your chroot?
<monteslu> yup
<Lns_onsite> was it happening before you did that?
<monteslu> even followed Jordan's advice on the list
 before and after
<Lns_onsite> heh
* Lns_onsite is Jordan
<monteslu> hehe
 hey, I updated your ticket
<Lns_onsite> which one is that? to use *-updates ?
<monteslu> yeah
 well, i added my 2cents anyway
<Lns_onsite> ty =) I've been updating the ubuntu ltsp wiki with some of this stuff too
<monteslu> it seemed to be working for a day
<Lns_onsite> I'm gonna go turn off swap real quick and reboot about 15 stations
 brb
<monteslu> I've got 70 thin clients and most of them seem to be creating the I/O errors about nbd in the log
<Lns_onsite> hmm, ok turning off nbd_swap in lts.conf and rebooting about 15 clients didn't give me that msg or hang at ALL
 It's got to be something to do with that
 and if it's just some typo in ltsp-client-setup script...should be trivial to fix
 as Ryan52 said maybe i'll just change /bin/sh to /bin/bash... that'll probably *$#@ the whole chroot up though ;)
<monteslu> Lns_onsite, you mean hang on the client startup?
<Lns_onsite> monteslu: yes
<monteslu> This thing is grinding my server down.
 At least I think it is
 I can't even do a normal boot
<Lns_onsite> monteslu: no, my server seems to be fine... althoguh it's a 2x dualcore xeon 1.6GHz, 8GB RAM...
<monteslu> Mine has 8 cores and 16 gigs :)
<Lns_onsite> hehe
* Lns_onsite got beat
<monteslu> ooh, and I have two of them
 my ldap server died though, so I'm just using one right now. Which means I can test fedora9 on the other tonight
<Lns_onsite> Ryan52: would the 27 in that "unexpected operator" be the line number of the script?
* Lns_onsite doesn't like this cold server room
<monteslu> beats the hell out of a loud server room with all the alarms going off :)
 i mean a warm one
<Lns_onsite> monteslu: ...yes, yes it does. :p
 brb
<Ryan52> Lns_onsite: I'm pretty sure it is.
 Lns_onsite: changing /bin/sh to bash won't break anything.
 Lns_onsite: bash supports like everything.
 Lns_onsite: if your sure you know which script it is, then just change the shebang to bash. that's probably safer.
<-- japerry has quit ()
<Lns_onsite> Ryan52: Just did that, and it seems to still hang, with an additional msg.. "67: Too many arguments"
 Line 67 in ltsp-client-setup is: for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
             if [ -n $(ps ax|grep nbd$i) ]; then
                 NBD_SWAP_DEVICE="/dev/nbd$(($i+1))";
             fi;
 (the if line)
<Ryan52> ugh. switch to nfs and report the bug, is what I'd do. I can't believe that this problem was introduced in a stable update.
<Lns_onsite> Ryan52: do you see anything wrong with those lines, or are you just talking in general?
<Ryan52> oh, wait.
 [ -n $(ps ax|grep nbd$i) ]
 change to: [ -n "$(ps ax|grep nbd$i)" ]
 I didn't see that at first, but ya, that should fix it.
<Lns_onsite> woa
 cool :) ok will try that now
 that'd be so funny if this was the whole issue
--- rangerpb is now known as rangerhomezzz
<Lns_onsite> yikes...now they're *all* hanging
 "Setting up LTSP Client..."
 "Negotiation: "
 (hangs there)
 bunch of 'dd' proceses running currently
 now some are booted up
 I'm guessing dd is used to create swap files or some crazy thing
----

Related branches

Jordan Erickson (lns) wrote :

UPDATE: At 2 school sites, I have fixed the quoting issue to reflect the following:

---
if [ -n "$(ps ax|grep nbd$i)" ]; then
---

and went through the motions - rebuilt the image, rebooted thin-clients, etc. etc...

This seems to have helped the "Invalid operator" message upon TC bootup, but definitely has NOT fixed the issue of clients sitting at "Negotiation: " during nbd_swap file creation on the server. I see that 'dd' processes are running on the server, with little CPU time, yet I/O seems to be starved when multiple 'dd' processes are running simultaneously (such as when more than a few thin-clients are booted up at the same time, which is common practice when you have a lab full of thin clients, booting them all up at the beginning of the school day). They seem to sit for 2-4 minutes, depending on how many are being booted up at the same time. Because of the supposed I/O saturation, other server functions such as DHCP are unable to process, causing some thin-clients with flash memory to boot to that after failing to obtain an IP address via PXE/network boot.

I have tried what was suggested to me in #ltsp: a "elevator=deadline" I/O scheduler that is supposedly more suited for LTSP environments. This, after first test, reported "better" than the previous bootups, but still caused many machines to boot to flash because of not obtaining an IP address.

It would be greatly appreciated to have someone to help with this issue ASAP. It is critical to so many LTSP sites, that I'm wondering how anyone can reliably use nbd_swap functionality in Ubuntu. Thank you, and as always, I will help test any proposed solution/patch!

Cheers,
Jordan/Lns

Changed in ltsp:
assignee: nobody → sbalneav
status: New → Confirmed
Jordan Erickson (lns) wrote :

Bump? Is there anything I can do at all to help move this bug along? This is a pretty big issue for booting up thin-clients. Doubles (triples?) the time to boot up when it hangs here. :(

- Jordan/Lns

Jordan Erickson (lns) wrote :

I should also mention that I was at a school site a few days ago and noticed this bug in action (again) - it seems to cause other booting LTSP clients to not get a DHCP lease if dhcpd is on the same server (maybe due to heavy server i/o when building other swapfiles?) Causes half clients to fail to boot, making the person in charge have to boot/reboot until they get a lease. Pretty frustrating for me just that once being on site, can't imagine what it's like for the techs that have to deal with it every day. :(

Cheers,
Jordan/Lns

Oliver Grawert (ogra) wrote :

that code is horrible overall ... it iterates over nbd0-15, if it finds nbd0 being used it automatically assumes nbd1 to be free and assigns it as swapdevice without checking if nbd1 isnt taken as well ... you are indeed right about the quoting issue (though its not a bashism its just buggy code), but essentially the whole function needs a rewrite in a more robust manner ...

Launchpad Janitor (janitor) wrote :

This bug was fixed in the package ltsp - 5.1.64-0ubuntu5

---------------
ltsp (5.1.64-0ubuntu5) jaunty; urgency=low

  * Make the Geode fix jaunty-specific and make it fallback to upstream's
    fix for the other releases instead of no fix as in ubuntu4.
  * Fix bashism in ltsp-client-setup. (LP: #281498)

 -- Stephane Graber <email address hidden> Sat, 14 Mar 2009 21:23:40 -0400

Changed in ltsp:
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers