Logs flooded with "nfs4_schedule_state_manager: kthread_run: -12"

Bug #1423472 reported by Sergio Gelato
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Medium
Unassigned

Bug Description

An NFSv4 client running kernel 3.13.0-44-generic #73-Ubuntu (amd64) suddenly started spewing
   nfs4_schedule_state_manager: kthread_run: -12
log messages at an average rate of 2.65 kHz. It did not stop until I rebooted it.

At the very least that message needs to be rate-limited. (Doesn't seem to be fixed upstream yet.)

As for the underlying problem, -12 is -ENOMEM. I'm afraid I have no idea why the kernel ran out of memory at that point. WIll follow up if the problem ever recurs. This bug report is mainly about the lack of rate limiting.

Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1423472

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Sergio Gelato (sergio-gelato) wrote :

Logs too big for inclusion (the problem was log flooding). Also, they would be missed by apport-collect because /var had been filled by an earlier, not necessarily related, problem; the only full copy of the logs is on a remote syslog server which does not run Ubuntu.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.19 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.19-vivid/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Sergio Gelato (sergio-gelato) wrote :

So far, this particular symptom has been seen exactly once. The host it was observed on was reinstalled from scratch with trusty a few weeks ago following a hard disk failure, so no dist-upgrade involved. I did some NFS-client-related tuning on this and many other machines this week so it's conceivable that this has caused new code paths to be exercised (although the changes were rather benign: a longer credential timeout in rpc.gssd, an explicit port number for nfs.nfs_callback_tcpport, a smaller value for auth_rpcgss.key_expire_timeo, and only the rpc.gssd change had actually taken effect on that host at the time of the incident).

I've looked at the source code "for kthread_run" (or rather the function behind that macro). The error is the result of a memory allocation failure. What caused the kernel to run out of memory (this machine has 32GB of RAM, by the way) last night is probably unknowable at this point, and need not have had anything to do with NFS. *This* bug report is only about the fact that the issuing of that particular error message (from fs/nfs/nfs4state.c:nfs4_schedule_state_manager()) is not rate-limited (neither in 3.13 nor in the linux-stable tree at git.kernel.org), which put an undesirable load on my syslog infrastructure. That should be easy to fix: it's what pr_warn_ratelimited() is for.

I cannot reproduce the symptom at will, so I won't actually test the kernel from vivid: any negative result would be inconclusive. Since I know from reading the source code that the message is still not rate-limited upstream, I assume that kernel-bug-exists-upstream is the right choice.

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.