/etc/init.d/ltsp-client-setup has bashisms
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
ltsp (Ubuntu) |
Fix Released
|
Undecided
|
Scott Balneaves |
Bug Description
Please see the following #ltsp conversation regarding ltsp-client-setup errors. Ogra should know about this bug already. There might be repercussions to the specific block of code in question, but I know that the bashisms shouldn't be there on line 27. Also see https:/
Cheers,
Jordan/Lns
----
<Lns_onsite> Anyone know anything about this regarding LTSP? I'm on-site at one school and I'm seeing, during TC bootup, something similar to the following right after "Setting up LTSP Client" - ..."Negotiation: Invalid operator" - and sometimes it hangs, up to 3-5 minutes, then boots to LDM. I *just* turned on NBD_SWAP this week and this is happening at 2 sites...also upgraded the chroot with hardy-updates - and that's it. All I've found is this. https:/
d.net/
It usually only happens when booting a bunch of clients at once
I get nothing useful in chroot logs or on the server
<-- Q-FUNK has quit ("Leaving.")
mccann has quit (Read error: 110 (Connection timed out))
<Lns_onsite> [ 27: ?: unexpected operator
^^ That's exactly what it says when it hangs
<-- staffencasa has quit ("Leaving")
<Lns_onsite> So weird, googling shows a lot of kernel-level stuff related to that msg
--> johnny (<email address hidden>) has joined #ltsp
<Lns_onsite> wow someone is alive here! =) johnny, have you ever seen " [ 27: ?: unexpected operator" while booting a TC?
<Ryan52> Lns_onsite: maybe an evil bashism while your using a non bash shell as /bin/sh?
<johnny> yes
sounds like it to me
<Lns_onsite> Ryan52: possibly, yes
<Ryan52> Lns_onsite: so change your /bin/sh :)
<Lns_onsite> althoguh I'm not sure where it could be, it references no files
<johnny> that's cuz Ryan52 is smarter than me
<Lns_onsite> ;)
Either something in nbdswapd or in one of the updates in hardy-updates is causing this it seems
it comes right after "Setting up LTSP client"
is that ltsp-client-setup script?
(yep it is)
<monteslu> Lns_onsite, nbd has caused me so much grief over the last couple of week that I'm switching to fedora tonight
<Lns_onsite> monteslu: heh, are you having the same issues? What does fedora use, nfs?
<monteslu> it can use nbd too, but it doesn't by default
yeah nfs
<Lns_onsite> I haven't had any issues with NBD until i enabled nbdswap
<monteslu> I've posted questions to the edubuntu-users list
I didn't enable anything, but I've been getting tons of nbd error in /var/log/messages
so the default settings are somehow hosed
<Lns_onsite> monteslu: have you updated your chroot?
<monteslu> yup
<Lns_onsite> was it happening before you did that?
<monteslu> even followed Jordan's advice on the list
before and after
<Lns_onsite> heh
* Lns_onsite is Jordan
<monteslu> hehe
hey, I updated your ticket
<Lns_onsite> which one is that? to use *-updates ?
<monteslu> yeah
well, i added my 2cents anyway
<Lns_onsite> ty =) I've been updating the ubuntu ltsp wiki with some of this stuff too
<monteslu> it seemed to be working for a day
<Lns_onsite> I'm gonna go turn off swap real quick and reboot about 15 stations
brb
<monteslu> I've got 70 thin clients and most of them seem to be creating the I/O errors about nbd in the log
<Lns_onsite> hmm, ok turning off nbd_swap in lts.conf and rebooting about 15 clients didn't give me that msg or hang at ALL
It's got to be something to do with that
and if it's just some typo in ltsp-client-setup script...should be trivial to fix
as Ryan52 said maybe i'll just change /bin/sh to /bin/bash... that'll probably *$#@ the whole chroot up though ;)
<monteslu> Lns_onsite, you mean hang on the client startup?
<Lns_onsite> monteslu: yes
<monteslu> This thing is grinding my server down.
At least I think it is
I can't even do a normal boot
<Lns_onsite> monteslu: no, my server seems to be fine... althoguh it's a 2x dualcore xeon 1.6GHz, 8GB RAM...
<monteslu> Mine has 8 cores and 16 gigs :)
<Lns_onsite> hehe
* Lns_onsite got beat
<monteslu> ooh, and I have two of them
my ldap server died though, so I'm just using one right now. Which means I can test fedora9 on the other tonight
<Lns_onsite> Ryan52: would the 27 in that "unexpected operator" be the line number of the script?
* Lns_onsite doesn't like this cold server room
<monteslu> beats the hell out of a loud server room with all the alarms going off :)
i mean a warm one
<Lns_onsite> monteslu: ...yes, yes it does. :p
brb
<Ryan52> Lns_onsite: I'm pretty sure it is.
Lns_onsite: changing /bin/sh to bash won't break anything.
Lns_onsite: bash supports like everything.
Lns_onsite: if your sure you know which script it is, then just change the shebang to bash. that's probably safer.
<-- japerry has quit ()
<Lns_onsite> Ryan52: Just did that, and it seems to still hang, with an additional msg.. "67: Too many arguments"
Line 67 in ltsp-client-setup is: for i in 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14; do
if [ -n $(ps ax|grep nbd$i) ]; then
fi;
(the if line)
<Ryan52> ugh. switch to nfs and report the bug, is what I'd do. I can't believe that this problem was introduced in a stable update.
<Lns_onsite> Ryan52: do you see anything wrong with those lines, or are you just talking in general?
<Ryan52> oh, wait.
[ -n $(ps ax|grep nbd$i) ]
change to: [ -n "$(ps ax|grep nbd$i)" ]
I didn't see that at first, but ya, that should fix it.
<Lns_onsite> woa
cool :) ok will try that now
that'd be so funny if this was the whole issue
--- rangerpb is now known as rangerhomezzz
<Lns_onsite> yikes...now they're *all* hanging
"Setting up LTSP Client..."
"Negotiation: "
(hangs there)
bunch of 'dd' proceses running currently
now some are booted up
I'm guessing dd is used to create swap files or some crazy thing
----
Related branches
Changed in ltsp: | |
assignee: | nobody → sbalneav |
status: | New → Confirmed |
UPDATE: At 2 school sites, I have fixed the quoting issue to reflect the following:
---
if [ -n "$(ps ax|grep nbd$i)" ]; then
---
and went through the motions - rebuilt the image, rebooted thin-clients, etc. etc...
This seems to have helped the "Invalid operator" message upon TC bootup, but definitely has NOT fixed the issue of clients sitting at "Negotiation: " during nbd_swap file creation on the server. I see that 'dd' processes are running on the server, with little CPU time, yet I/O seems to be starved when multiple 'dd' processes are running simultaneously (such as when more than a few thin-clients are booted up at the same time, which is common practice when you have a lab full of thin clients, booting them all up at the beginning of the school day). They seem to sit for 2-4 minutes, depending on how many are being booted up at the same time. Because of the supposed I/O saturation, other server functions such as DHCP are unable to process, causing some thin-clients with flash memory to boot to that after failing to obtain an IP address via PXE/network boot.
I have tried what was suggested to me in #ltsp: a "elevator=deadline" I/O scheduler that is supposedly more suited for LTSP environments. This, after first test, reported "better" than the previous bootups, but still caused many machines to boot to flash because of not obtaining an IP address.
It would be greatly appreciated to have someone to help with this issue ASAP. It is critical to so many LTSP sites, that I'm wondering how anyone can reliably use nbd_swap functionality in Ubuntu. Thank you, and as always, I will help test any proposed solution/patch!
Cheers,
Jordan/Lns