Comment 57 for bug 661294

Revision history for this message
David (g-launchpad-strategyplayer-net) wrote : Re: System lock-up when receiving large files (big data amount) from NFS server

Okay, I tried the one with NFS now.

At first I tought that the debugging output slowed down the process enough to avoid the problem, as Andrew experienced.

It copied quite a lot at a transfer spped between 20 an 45, which semmed to be slow enough to avoid the crash. But then the transfer became faster and between 50 an 60 (MB/s) I had the crash.

Now I've got a 90MB syslog file (I didn't the stuff via SSH because I haven't already set up ssh at this machine at all) which ends quite unsuspicious..

The last entries before the logging stopped because of the freeze are like:

NFS: read done (0:16/21807132 4096@1859477504)

There are thousands and thousands of those of course, regularly intermitted by

NFS: nfs_readpage_result: 16772, (status 262144)
NFS: nfs_update_inode(0:16/21807132 ct=1 info=0x7e7f)

Sometimes they are also inermitted by a bunch of those:

NFS: read(path/bigfile.xyz, 32768@1844510720)

or those

 NFS: 0 initiated read call (req 0:16/21807132, 262144 bytes @ offset 1859579904)

or those

NFS: nfs3_forget_cached_acls(0:16/954383)
NFS: clear cookie (0xffff880113c51830/0x(null))

Probably all quite normal?

Here some lines that seemed special to me (mostly because there are only very few of those and they seem kind of 'crippled' compared to the other ones of which there are so much):

this line is a bit special because it only appears very few times in this way:

NFS: read done (0:16/21807132 4096@1806016 read done (0:16/21807132 4096@1806135296)

that one has a weird timestamp:

[2569325693.585308] NFS: read done (0:16/21807132 4096@1805770752)

(it's surrounded by timestamps like '[25693.578505] ')

that one is special too:

[25693.5132 4096@18047132 4096@180132 4096@1804734464)

or that:

NFS: read done (0@1797181440)

or that:

[25693.492344] N1789460480)

this one is also quite uniqe:

NFS: nfs3_forget_cache(null))

here we have some weird braces in front of the timestamp:

Dec 24 22:59:54 unimatrixzero kernel: <>[25691.765538] NFS: read done (0:16/21807132 4096@1699704832)

somewhere in between we have

NFS: dentry_delete(galerie/Allgemein, 0)
NFS: dentry_delete(converted-images/galerie, 0)
...

they don't seem to be related to the actual copy process because they name some totally different path which is part of a backup folder that gets transfered using a rsync script every day via nfs to the server. But the daily transfer was two hours earlier.

a long and special one:

NFS: read(path/bigfile.xyz, 32768@168: read(path/biS: read(path/FS: nfs_readpage_r NFS: nfs_update_inode628369] NFS: read do628371] NFS:628373] NFS628375] NFS: re628376] NFS: r628378] NFS: re628380] NFS:628381] NFS:628383] NFS: read done (0:16/21807132 4096@1694461952)

(this was all one line in the log)

or this one:

NFS: read dd done (0:16/21807132 4096@1694404608)

at one point we have a bunch of lines in which everything is quite crippled:

...
[2566@1666424832)
[2@1666428928)
[2 4096@16664096@1666437120)
132 4096@1666432 4096@1666445312)
1807132 4807132 4096@16666/21807132 4096@/21807132 4096@1666(0:16/2180:16/21807132 4096@1666469888)
...
(there was some mor of it)

Don't know about those 'special' lines. Partially looks to me like overchallenged login... but maybe it has something to say?

Since it costs like an hour only to check a few seconds of the logfile (which contains about 1 million lines) I haven't looked up the whole record.