kernel and lockd xprt_adjust_timeout rq_timeout

Bug #1358226 reported by arbuntu on 2014-08-18
50
This bug affects 7 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Unassigned
Trusty
Medium
Unassigned

Bug Description

I am seeing the following messages in syslog. There are no other messages around that time to suggest why it is happening.

Aug 18 11:01:16 hostname kernel: [943559.414398] xprt_adjust_timeout: rq_timeout = 0!
Aug 18 11:01:16 hostname kernel: [943559.414401] lockd: server nfs_server not responding, still trying
Aug 18 11:01:16 hostname kernel: [943559.415347] lockd: server nfs_server OK

System is:
Linux hostname 3.13.0-32-generic #57-Ubuntu SMP Tue Jul 15 03:51:08 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 14.04.1 LTS

Thank you for taking the time to report this bug and helping to make Ubuntu better. It seems that your bug report is not filed about a specific source package though, rather it is just filed against Ubuntu in general. It is important that bug reports be filed about source packages so that people interested in the package can find the bugs about it. You can find some hints about determining what package your bug might be about at https://wiki.ubuntu.com/Bugs/FindRightPackage. You might also ask for help in the #ubuntu-bugs irc channel on Freenode.

To change the source package that this bug is filed about visit https://bugs.launchpad.net/ubuntu/+bug/1358226/+editstatus and add the package name in the text box next to the word Package.

[This is an automated message. I apologize if it reached you inappropriately; please just reply to this message indicating so.]

tags: added: bot-comment
Trent Lloyd (lathiat) on 2014-08-21
affects: ubuntu → linux-meta (Ubuntu)
affects: linux-meta (Ubuntu) → linux (Ubuntu)
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in linux (Ubuntu):
status: New → Confirmed
Trent Lloyd (lathiat) wrote :

I keep seeing this as well, my use case is a mail server (dovecot). It always seems to happen for less than a second, the times don't seem predictable and two 14.04 servers accessing the same share report the issues and different times, so I'm not sure it's really related to the remote end.

I am also getting locking issues with dovecot that seem to always appear about 90-120 seconds after this appears in syslog

Aug 21 06:48:55 mail-01 kernel: [2909938.530590] lockd: server 10.10.10.232 not responding, still trying
Aug 21 06:48:55 mail-01 kernel: [2909938.531290] lockd: server 10.10.10.232 OK
... [snip] ....
Aug 21 08:17:11 mail-01 kernel: [2915231.997469] lockd: server 10.10.10.232 not responding, still trying
Aug 21 08:17:11 mail-01 kernel: [2915231.998316] lockd: server 10.10.10.232 OK
Aug 21 08:31:04 mail-01 kernel: [2916064.970993] lockd: server 10.10.10.232 not responding, still trying
Aug 21 08:31:04 mail-01 kernel: [2916064.980768] lockd: server 10.10.10.232 OK
Aug 21 08:41:31 mail-01 kernel: [2916691.595302] lockd: server 10.10.10.232 not responding, still trying
Aug 21 08:41:31 mail-01 kernel: [2916691.596268] lockd: server 10.10.10.232 OK
Aug 21 08:50:25 mail-01 kernel: [2917224.586795] lockd: server 10.10.10.232 not responding, still trying
Aug 21 08:50:25 mail-01 kernel: [2917224.587609] lockd: server 10.10.10.232 OK

Aug 21 06:50:52 mail-01 dovecot: lmtp(18603, <email address hidden>): Error: Timeout (29s) while waiting for lock for transaction log file /srv/mail/domains/w/webinabox.net.au/lvstest/Maildir/dovecot.index.log
Aug 21 08:42:11 mail-01 dovecot: lmtp(14261, <email address hidden>): Error: Timeout (29s) while waiting for lock for transaction log file /srv/mail/domains/w/webinabox.net.au/lvstest/Maildir/dovecot.index.log

Joseph Salisbury (jsalisbury) wrote :

Did this issue occur in a previous version of Ubuntu, or is this a new issue?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v3.17 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

If you are unable to test the mainline kernel, for example it will not boot, please add the tag: 'kernel-unable-to-test-upstream'.
Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.17-rc1-utopic/

tags: added: kernel-da-key
Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
tags: removed: lockd nfs
Marcus Furlong (furlongm) wrote :

I am also experiencing this bug.

This issue did not occur on precise, however it does occur on trusty. I have not tried non-LTS releases.

I can confirm that the same issue occurs on the current upstream mainline kernel.

I have reproduced the issue with 3.17.1-031701-generic

tags: added: kernel-bug-exists-upstream
Joseph Salisbury (jsalisbury) wrote :

The v3.18-rc1 kernel is now available. Can you see if it also exhibits the bug?

 http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc1-utopic/

andrew bezella (abezella) wrote :

hello -

per the above comments, i installed the 3.18.0-031800rc1-generic mainline kernel on the Ubuntu 14.04.1 LTS vm that we have been seeing log these errors. the error persists:
Oct 29 22:30:37 vm1 kernel: [ 214.434104] xprt_adjust_timeout: rq_timeout = 0!
Oct 29 22:30:37 vm1 kernel: [ 214.434110] lockd: server nfs-home not responding, still trying
Oct 29 22:30:37 vm1 kernel: [ 214.728194] lockd: server nfs-home OK

i tried to install the 3.18-rc2-utopic version but the result was unable to boot properly. it looks like the kernel modules were gzip'ed and trusty wasn't expecting this(?)

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Dave Lachapelle (davm) wrote :

I'm also seeing this issue - currently running:

3.15.4-x86_64-linode45 #1 SMP Mon Jul 7 08:42:36 EDT 2014 x86_64 x86_64 x86_64 GNU/Linux

Nov 14 00:23:05 localhost kernel: xprt_adjust_timeout: rq_timeout = 0!
Nov 14 00:23:05 localhost kernel: lockd: server 192.168.132.120 not responding, still trying
Nov 14 00:23:06 localhost kernel: lockd: server 192.168.132.120 OK

I couldn't find any errors on the NFS server, only clients appear to be throwing this at random.

Joseph Salisbury (jsalisbury) wrote :

@Andrew, the issue with the upstream kernels not booting should be resolved now.

The v3.18-rc5 kernel is now available. Can you see if it also exhibits the bug?

http://kernel.ubuntu.com/~kernel-ppa/mainline/v3.18-rc5-vivid/

andrew bezella (abezella) wrote :

@Joseph - rebooted into 3.18.0-031800rc5-generic #201411162035
and the msgs are still being logged:
[ 231.437941] xprt_adjust_timeout: rq_timeout = 0!
[ 231.437945] lockd: server nfs-home not responding, still trying
[ 231.441744] lockd: server nfs-home OK

Joseph Salisbury (jsalisbury) wrote :

This issue appears to be an upstream bug, since you tested the latest upstream kernel. Would it be possible for you to open an upstream bug report[0]? That will allow the upstream Developers to examine the issue, and may provide a quicker resolution to the bug.

Please follow the instructions on the wiki page[0]. The first step is to email the appropriate mailing list. If no response is received, then a bug may be opened on bugzilla.kernel.org.

Once this bug is reported upstream, please add the tag: 'kernel-bug-reported-upstream'.

[0] https://wiki.ubuntu.com/Bugs/Upstream/kernel

Changed in linux (Ubuntu):
status: Confirmed → Triaged
andrew bezella (abezella) wrote :

verified that messages persist w/3.18.0-031800rc6-generic #201411231935 and sent email to <email address hidden>

tags: added: kernel-bug-reported-upstream
Kyle O'Donnell (kyleo-t) wrote :
andrew bezella (abezella) wrote :

to apply to the 3.13 trusty kernel source i had to make a trivial edit to the patch (result attached). the power user/developer whose workflow typically triggers these messages has been using it and i have not yet seen the errors recur.

tags: added: patch
andrew bezella (abezella) wrote :

still no errors seen. i believe this patch addresses the issue.

Joseph Salisbury (jsalisbury) wrote :

Can you provide some information on the status of the patch with regards to getting it merged upstream? Has it been sent upstream, what sort of feedback has it received, is it getting applied to a subsystem maintainer's tree, etc?

andrew bezella (abezella) wrote :

doesn't appear to have made it into 3.19-rc3 but beyond that i don't have any information other than what is in http://www.spinics.net/lists/linux-nfs/msg48569.html (patch appears to be from and signed off by the NFS client maintainer)

Joseph Salisbury (jsalisbury) wrote :

Sorry, right. It looks like the patch was just submitted upstream. It was also cc'd to stable, so it should make it's way into all the stable kernels.

Kyle O'Donnell (kyleo-t) wrote :

Does anyone know if/when this will make it into the trusty kernel?

I'm affected by this bug across 10 production servers, is there a fix/workaround for trusty, that doesn't involve a manual kernel roll?

Joseph Salisbury (jsalisbury) wrote :

It looks like the fix for this bug is in mainline as of 3.19-rc5:

06bed7d LOCKD: Fix a race when initialising nlmsvc_timeout

git describe --contains 06bed7d
v3.19-rc5~13^2~5

The fix was also cc'd to stable, and it is working it's way through the stable releases. It has made it's way into the linux-3.13.y-queue branch of the upstream 3.13 kernel:
1185361 LOCKD: Fix a race when initialising nlmsvc_timeout

Trusty will pick up this fix when it gets the 3.13 upstream updates.

Changed in linux (Ubuntu Trusty):
status: New → Triaged
importance: Undecided → Medium
Kyle O'Donnell (kyleo-t) wrote :

looks like this made it into 3.13.0-48.80

https://launchpad.net/ubuntu/trusty/+source/linux/+changelog

  * LOCKD: Fix a race when initialising nlmsvc_timeout

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1427438

Upgraded a few days ago, haven't seen the errors

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers