Comment 4 for bug 1423472

Revision history for this message
Sergio Gelato (sergio-gelato) wrote :

So far, this particular symptom has been seen exactly once. The host it was observed on was reinstalled from scratch with trusty a few weeks ago following a hard disk failure, so no dist-upgrade involved. I did some NFS-client-related tuning on this and many other machines this week so it's conceivable that this has caused new code paths to be exercised (although the changes were rather benign: a longer credential timeout in rpc.gssd, an explicit port number for nfs.nfs_callback_tcpport, a smaller value for auth_rpcgss.key_expire_timeo, and only the rpc.gssd change had actually taken effect on that host at the time of the incident).

I've looked at the source code "for kthread_run" (or rather the function behind that macro). The error is the result of a memory allocation failure. What caused the kernel to run out of memory (this machine has 32GB of RAM, by the way) last night is probably unknowable at this point, and need not have had anything to do with NFS. *This* bug report is only about the fact that the issuing of that particular error message (from fs/nfs/nfs4state.c:nfs4_schedule_state_manager()) is not rate-limited (neither in 3.13 nor in the linux-stable tree at git.kernel.org), which put an undesirable load on my syslog infrastructure. That should be easy to fix: it's what pr_warn_ratelimited() is for.

I cannot reproduce the symptom at will, so I won't actually test the kernel from vivid: any negative result would be inconclusive. Since I know from reading the source code that the message is still not rate-limited upstream, I assume that kernel-bug-exists-upstream is the right choice.