nbd-client disconnects if the size of the image file at nbd-server is larger than 200MB

Bug #504420 reported by müzso
24
This bug affects 5 people
Affects Status Importance Assigned to Milestone
nbd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

Unfortunately the problem occurs only during the boot of an LTSP client, so I'm not 100% convinced that this is a bug with nbd-client or nbd-server, might be related to LTSP (the kernel, the scripts or whatever). But still, nbd is my best bet.

My parameters:
- An i386 LTSP server running Ubuntu 9.04 (Jaunty) with all updates installed
- An i386 LTSP client running the same Ubuntu version (the chroot image was created with ltsp-build-client on the LTSP server)
- The nbd-server package on the LTSP server and the nbd-client package on the LTSP client are both 1:2.9.11-2ubuntu1.
- The server is running: Linux server33 2.6.28-17-server #58-Ubuntu SMP Tue Dec 1 19:58:28 UTC 2009 i686 GNU/Linux
- The client is running: Linux tc-004 2.6.27-11-generic #1 SMP Wed Apr 1 20:57:48 UTC 2009 i686 GNU/Linux
- Both the server and the client have 1G RAM installed and during my manual tests (of nbd-server + nbd-client connections) there was plenty of free (unused) memory, so probably it's not an out-of-memory issue.

If the chroot image file (/opt/ltsp/images/i386.img) is smaller than 200MB, then the LTSP client boots just fine. If the image is larger than 200MB, the client stops with the following output (assuming that I've removed both "splash" and "quiet" options from the kernel parameters of the pxe config file):

-------------
 * Loading kernel modules...
 * Loading manual drivers...
[ 10.410731] lp: driver loaded but no devices found
                                                                                                          [OK]
 * Setting up LTSP client...
Disconnecting: que, disconnect, [ 10.568785] nbd0: NBD_DISCONNECT
sock, done
[ 10.569504] nbd0: Receive control failed (result -32)
[ 10.569928] nbd0: queue cleared
Kernel call returned: Broken pipe Reconnecting
Negotiation: ..size = 209220KB
Error: Ioctl/1.1a failed: Bad file descriptor
-------------

An image file of 194MB works just fine, but if I install a few more packages into the chroot and build a new image with a size of 205MB, the client won't connect successfully to nbd-server.

On the server I see the following in syslog:
Jan 7 19:37:34 server33 nbdrootd[14823]: connect from 192.168.0.53 (192.168.0.53)
Jan 7 19:37:34 server33 nbd_server[14824]: connect from 192.168.0.53, assigned file is /opt/ltsp/images/i386.img
Jan 7 19:37:34 server33 nbd_server[14824]: Size of exported file/device is 214241280
Jan 7 19:37:39 server33 nbd_server[14824]: Disconnect request received.
Jan 7 19:37:39 server33 nbdrootd[14826]: connect from 192.168.0.53 (192.168.0.53)
Jan 7 19:37:39 server33 nbd_server[14827]: connect from 192.168.0.53, assigned file is /opt/ltsp/images/i386.img
Jan 7 19:37:39 server33 nbd_server[14827]: Size of exported file/device is 214241280
Jan 7 19:37:39 server33 nbd_server[14827]: Read failed: Inappropriate ioctl for device

It seems that the problem is on the server side. For some reason the second nbd-client connection results in a "Read failed: Inappropriate ioctl for device" message.

However if I ...
1. create a smaller image file on the server
2. boot up the client successfully with this smaller image
3. start an nbd-server manually using the larger image file on the server
4. start nbd-client manually on the client and connect to the previously started nbd-server instance
5. mount the nbd device on the client to a directory
Everything works. No problem.

I've also set up the client to log to the syslog of the LTSP server, but unfortunately nothing gets to the syslog server (the client does not reach the part of the boot process where it'd set up syslog logging).

Any thoughts are welcome to unravel this mystery. I could not find any references to a 200MB limit burnt into the code.

Revision history for this message
müzso (bit2) wrote :

I'm attaching a photo of the client's screen taken after it failed to connect to the nbd-server.

Revision history for this message
Tim Wallace (twallace-computer) wrote :

I'm seeing this same error with 9.10 on my client desktop, where the image is over 200 MB. My 9.04 image is below that and works fine. The strange thing is that this is on my desktop client--my laptop client works fine!

Revision history for this message
Wouter Verhelst (wouter-debian) wrote :

Could either of you please attach a network dump created with 'wireshark' (please don't use tcpdump, since that drops part of every package, making the dump not as useful). This will help greatly in debugging this.

Note that with its default settings, wireshark will listen for *all* network traffic. Sharing such a file would have obvious privacy issues, which is not what I'm asking for, and could potentially be extremely large, depending on the amount of network traffice. To avoid this issue, you need to set up a 'capture filter' containing something like 'port 1234' -- assuming the port used by nbd-server is, indeed, 1234. This will limit the capture to only nbd traffic. However, be aware that it may still contain sensitive data contained in the LTSP image.

Revision history for this message
müzso (bit2) wrote :

Unfortunately I cannot reproduce the bug anymore. I've upgraded all our Ubuntu based LTSP systems (both servers and client chroots) to Karmic and the problem did not occur since then.

Revision history for this message
Tim Wallace (twallace-computer) wrote :

I can possibly do this, but I need a lot more information to successfully do it. I couldn't figure out what port ubuntu 9.10 ltsp is using for nbd-server--didn't find any nbd-server/config file on my system (?). I assume that 1234 is just an example? If you give me the exact filter line for the version of wireshark delivered with 9.10, with the default port, I can type it in and probably give you a dump. In the syslog I see a lot of different ports--not sure what is the initial one, maybe 2468. Without a filter I got a 50 MB dump. If that's too big, please be more specific. Wireshark is a lot lower-level than I ever go. I'm using eth1 on IP 192.168.13 for my ltsp network. So I just captured that interface. I find a lot of internet tutorial stuff is obsolete for wireshark...

Revision history for this message
Serge Hallyn (serge-hallyn) wrote :

Marking invalid as it can no longer be reproduced.

Changed in nbd (Ubuntu):
status: New → Invalid
Revision history for this message
Nicolas LEBRUN (nico-leb) wrote :

It still happens today on a Precise server... but restarting nbd-server did the job.

sudo service nbd-server restart

Hope it'll help

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.