Comment 31 for bug 661294

Revision history for this message
Stefan Bader (smb) wrote : Re: System lock-up when receiving large files (big data amount) from NFS server

@David. Thanks for testing. In that case it is not the problem I was reminded of (this only happens on certain via controllers and apparently WD disks). The observation about the switch LEDs is interesting. As I said, I did my tests using a Lucid server and Maverick client and used a gigabit switch. I ran the test again this morning and stopped after having copied 15GBs for the 20th time. In my case I definitely saw both LEDs flash in sync. I did not pay too much attention to the hd LEDs but those do not need to go as fast. Writes hit the cache and are then written off in batches.

So obviously I still do something wrong or got lucky to have the "right" hardware. I am still trying to figure out what all of you with the problem may have in common (feels a bit like CSI, just without those nice fancy tools ;)). The fact that David saw errors that point to the disk subsystem but has no problems when only using that could also mean that, whatever happens, causes severe memory corruption. Or maybe missing interrupts (the error message mentions timeout).

At the moment I am not sure which direction to go. First probably it would be good to have more information on the systems affected. If I could get the output of the following commands from at least two affected clients.

sudo lspci -vvnnn >lspci.txt
cat /proc/interrupts >interrupts.txt

Also, just to confirm, server is Lucid based and client on Maverick. At least this was the case in previous comments. And which was the last known good client? One test that comes to my mind: can you scp that big file from the server to the client? That would hint whether it is the network in general or specifically nfs.