NFS kernel server creates a kworker with 100% CPU usage, then hangs randomly

Bug #1322407 reported by Mark Haidekker on 2014-05-23
84
This bug affects 19 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
High
Unassigned
Trusty
High
Stefan Bader

Bug Description

This concerns the server edition of 14.04. I have set up a NFS server. Once I attach at least one client, one kworker process starts to use 100% CPU. The runaway kworker returns to idle when I reset the NFS server daemon with "service nfs-kernel-server stop" followed by "service nfs-kernel-server start". With the nfs kernel server stopped, the CPU remains idle. With the nfs kernel server running, as soon as one client requests a connection, the kworker process jumps to 100% CPU again.

After some random time, the nfs kernel server no longer accepts requests from clients. Restarting the service allows clients to reconnect. The syslog shows no relevant information.

This problem has never appeared on a very similar server setup with 12.04.

Configration: The svcgssd is not running. I played with various configurations (enabling/disabling NFSv3 and NFSv4), but it makes no difference.

I tried to enable event debugging:

#> echo workqueue:workqueue_queue_work > /sys/kernel/debug/tracing/set_event
#> cat /sys/kernel/debug/tracing/trace_pipe > /var/tmp/kerntrace.txt

and found the following kernel trace in a tight loop:

[...]
     kworker/2:1-86 [002] d... 161940.910668: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d.s. 161940.910674: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d... 161940.910675: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d.s. 161940.910681: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d... 161940.910682: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d.s. 161940.910688: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d... 161940.910689: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d.s. 161940.910695: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d... 161940.910696: workqueue_queue_work: work struct=ffff8804140675e0 function=xs_tcp_setup_socket [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
     kworker/2:1-86 [002] d.s. 161940.910702: workqueue_queue_work: work struct=ffff8800cef9f488 function=rpc_async_schedule [sunrpc] workqueue=ffff8800cfbb6a00 req_cpu=256 cpu=2
[...]

At present, I have to consider NFS fubar.

Thanks,
Mark

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-24-generic 3.13.0-24.47
ProcVersionSignature: Ubuntu 3.13.0-24.47-generic 3.13.9
Uname: Linux 3.13.0-24-generic x86_64
AlsaVersion: Advanced Linux Sound Architecture Driver Version k3.13.0-24-generic.
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.14.1-0ubuntu3.1
Architecture: amd64
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/by-path', '/dev/snd/controlC0', '/dev/snd/hwC0D0', '/dev/snd/pcmC0D0c', '/dev/snd/pcmC0D0p', '/dev/snd/pcmC0D1p', '/dev/snd/pcmC0D2c', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Card0.Amixer.info: Error: [Errno 2] No such file or directory: 'amixer'
Card0.Amixer.values: Error: [Errno 2] No such file or directory: 'amixer'
Date: Thu May 22 22:48:48 2014
HibernationDevice: RESUME=UUID=adcdeef5-9b46-4ea4-b9f4-b6e642ea91e8
InstallationDate: Installed on 2014-05-12 (10 days ago)
InstallationMedia: Ubuntu-Server 14.04 LTS "Trusty Tahr" - Release amd64 (20140416.2)
IwConfig:
 eth0 no wireless extensions.

 lo no wireless extensions.
MachineType: Gigabyte Technology Co., Ltd. GA-990XA-UD3
ProcEnviron:
 TERM=xterm
 PATH=(custom, no user)
 XDG_RUNTIME_DIR=<set>
 LANG=en_US.UTF-8
 SHELL=/bin/bash
ProcFB: 0 nouveaufb
ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-3.13.0-24-generic root=UUID=6cf411b3-3e8e-41e5-8670-db14ad259f58 ro IOMMU=soft nomdmonddf nomdmonisw
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-24-generic N/A
 linux-backports-modules-3.13.0-24-generic N/A
 linux-firmware 1.127.2
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 10/13/2011
dmi.bios.vendor: Award Software International, Inc.
dmi.bios.version: F9
dmi.board.name: GA-990XA-UD3
dmi.board.vendor: Gigabyte Technology Co., Ltd.
dmi.board.version: x.x
dmi.chassis.type: 3
dmi.chassis.vendor: Gigabyte Technology Co., Ltd.
dmi.modalias: dmi:bvnAwardSoftwareInternational,Inc.:bvrF9:bd10/13/2011:svnGigabyteTechnologyCo.,Ltd.:pnGA-990XA-UD3:pvr:rvnGigabyteTechnologyCo.,Ltd.:rnGA-990XA-UD3:rvrx.x:cvnGigabyteTechnologyCo.,Ltd.:ct3:cvr:
dmi.product.name: GA-990XA-UD3
dmi.sys.vendor: Gigabyte Technology Co., Ltd.

Mark Haidekker (mhaidekk) wrote :
description: updated
description: updated

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Stefan Bader (smb) wrote :

Can you be more specific about the configuration of the server (/etc/exports,/etc/defaul/nfs-*) and what kind of client you use. I just tried this on two VMs and saw not issues (at least with a basic NFSv4 setup).

Mark Haidekker (mhaidekk) wrote :

/etc/exports has one line that is not a comment:

/home 192.168.1.0/255.255.255.0(rw,sync,root_squash,no_subtree_check)

/etc/defaults/nfs-common:

NEED_STATD=no
# Options for rpc.statd.
STATDOPTS=
# Do you want to start the gssd daemon? It is required for Kerberos mounts.
NEED_GSSD=no

/etc/defaults/nfs-kernel-server

# Number of servers to start up
RPCNFSDCOUNT=8
# Runtime priority of server (see nice(1))
RPCNFSDPRIORITY=0
# Options for rpc.mountd.
RPCMOUNTDOPTS="--manage-gids"
# Do you want to start the svcgssd daemon? It is only required for Kerberos
# exports. Valid alternatives are "yes" and "no"; the default is "no".
NEED_SVCGSSD="no"
# Options for rpc.svcgssd.
RPCSVCGSSDOPTS=""
# Options for rpc.nfsd.
RPCNFSDOPTS="--debug --syslog"

################
Note that I tried several alternatives, such as

RPCMOUNTDOPTS="--manage-gids --no-nfs-version 4"

and other combinations, to no effect. I am mounting Ubuntu 12.04 clients. The relevant entry in /etc/mtab for one of these clients looks like this:

192.168.1.2:/home /mnt/nfshome nfs rw,vers=4,addr=192.168.1.2,clientaddr=192.168.1.9 0 0

The client's nfs-common version is 1:1.2.5-3ubuntu3.1. Also, it makes no difference whether I use mount -t nfs or mount -t nfs4 (in the first case the /etc/mtab entry has vers=4 in the options, in the other case it does not, but the filesystem type is nfs4.

Mark Haidekker (mhaidekk) wrote :

Oh, and maybe of less relevance, but to keep it complete, here is the output of "top":

top - 13:46:47 up 5 days, 56 min, 2 users, load average: 1.00, 0.97, 1.01
Tasks: 172 total, 2 running, 170 sleeping, 0 stopped, 0 zombie
%Cpu(s): 0.0 us, 12.0 sy, 0.0 ni, 83.3 id, 0.0 wa, 4.8 hi, 0.0 si, 0.0 st
KiB Mem: 16415556 total, 2105640 used, 14309916 free, 390756 buffers
KiB Swap: 4194296 total, 0 used, 4194296 free. 865744 cached Mem

  PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
   86 root 20 0 0 0 0 R 100.0 0.0 6867:11 kworker/2:1
15637 root 20 0 24956 1760 1172 R 0.3 0.0 0:00.03 top
...

Changed in linux (Ubuntu):
importance: Undecided → High
tags: added: kernel-key
Stefan Bader (smb) wrote :

Probably the relevant part I missed initially is that probably this involves connections going on and succeeding for a bit and fail at some point. Where it is unclear how many connections have been going on and so on.
The function tracing you did does show that there is some kind of loop going on but does not allow to figure out any real details. I wonder whether you could try to enable some of the nfs debugging from /proc/sys/sunrpc/*_debug. Which allows to enable various pieces of internal debugging.

#define RPCDBG_XPRT 0x0001
#define RPCDBG_CALL 0x0002
#define RPCDBG_DEBUG 0x0004
#define RPCDBG_NFS 0x0008
#define RPCDBG_AUTH 0x0010
#define RPCDBG_BIND 0x0020
#define RPCDBG_SCHED 0x0040
#define RPCDBG_TRANS 0x0080
#define RPCDBG_SVCXPRT 0x0100
#define RPCDBG_SVCDSP 0x0200
#define RPCDBG_MISC 0x0400
#define RPCDBG_CACHE 0x0800
#define RPCDBG_ALL 0x7fff

So echoing 524287 into the various /proc interfaces should enable all debugging. Not sure which if them maybe starting with nfsd_debug and/or rpc_debug. Maybe this allows to narrow down what goes wrong in a better way.

Mark Haidekker (mhaidekk) wrote :

A few bits for clarification before I get started: (1) Your suggested value, 524287, clearly is 0x7ffff. This is (RPCDBG_ALL | 0x78000), i.e., with four extra bits, correct?

(2) I guess I can harvest the debug messages from /var/log/syslog, correct? Meaning, once the kworker runs amok, I simply copy syslog and attach it to one of my responses?

I'll start doing this now.

Stefan Bader (smb) wrote :

Oh, sorry, actually it was just me hitting too many Fs. So 32767 (0x7fff) should be enough. I guess the other value is ok, too just sets too many bits in the mask.

Yes, the messages should end up in syslog. With all debugging turned on there will be quite a bit of logging going on. Hope this does not change timing in a way that the problem does not show up anymore.

Mark Haidekker (mhaidekk) wrote :

Funny you should mention timing... right now, the server does not show the runaway kworker. nfds faithfully spits out a few debug messages per second, but acts normally otherwise. I have put the server back in production, see if I can catch the runaway kworker when I have more users -- I'll get some flak if nfsd hangs again, but that's life. I am also setting up another, similar NFS server for testing purposes. Will report once I have more information.

Mark Haidekker (mhaidekk) wrote :
Download full text (12.3 KiB)

OK -- bug can be reproduced. Here is an interesting observation: When a *second* nfs server is connected to the network , the bug seems to be suppressed, for what it's worth. This one really puzzles me. The runaway kworker appeared only after I shut down the old server.

As for debugging, the component that seems to create an insane amount of debug messages is rpcd. I am pasting snippets from nfsd's debugging-enabled and rpcd's debugging-enabled here, please let me know if you need more of the syslog file.

######### NFSD ##############
(...)
May 28 13:06:41 marcato kernel: [184158.621702] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619455)
May 28 13:06:41 marcato kernel: [184158.621714] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10618649)
May 28 13:06:41 marcato kernel: [184158.621731] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619242)
May 28 13:06:41 marcato kernel: [184158.621742] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619422)
May 28 13:06:41 marcato kernel: [184158.621754] nfsd: fh_compose(exp 09:04/2 (filename redacted), ino=10619296)
May 28 13:06:41 marcato kernel: [184158.621761] nfsv4 compound op ffff880417a01080 opcnt 2 #2: 26: status 0
May 28 13:06:41 marcato kernel: [184158.621764] nfsv4 compound returned 0
May 28 13:06:41 marcato kernel: [184158.622472] nfsd_dispatch: vers 4 proc 1
May 28 13:06:41 marcato kernel: [184158.622484] nfsv4 compound op #1/2: 22 (OP_PUTFH)
May 28 13:06:41 marcato kernel: [184158.622491] nfsd: fh_verify(28: 01060001 57c17980 8845d969 d9acb495 eab15095 00a205da)
May 28 13:06:41 marcato kernel: [184158.622507] nfsv4 compound op ffff880417a01080 opcnt 2 #1: 22: status 0
May 28 13:06:41 marcato kernel: [184158.622511] nfsv4 compound op #2/2: 26 (OP_READDIR)
May 28 13:06:41 marcato kernel: [184158.622549] nfsd: fh_verify(28: 01060001 57c17980 8845d969 d9acb495 eab15095 00a205da)

######### RPCD ##############
(...)
May 28 13:07:29 marcato kernel: [184207.248774] RPC: 22256 call_connect_status (status -11)
May 28 13:07:29 marcato kernel: [184207.248777] svc: socket ffff880417284d00 TCP data ready (svsk ffff8803e3ff2000)
May 28 13:07:29 marcato kernel: [184207.248781] svc: transport ffff8803e3ff2000 served by daemon ffff8800ce6da000
May 28 13:07:29 marcato kernel: [184207.248791] RPC: 22256 call_bind (status 0)
May 28 13:07:29 marcato kernel: [184207.248793] svc: server ffff8800ce6da000, pool 0, transport ffff8803e3ff2000, inuse=2
May 28 13:07:29 marcato kernel: [184207.248795] svc: tcp_recv ffff8803e3ff2000 data 1 conn 0 close 0
May 28 13:07:29 marcato kernel: [184207.248800] svc: socket ffff8803e3ff2000 recvfrom(ffff8803e3ff22bc, 0) = 4
May 28 13:07:29 marcato kernel: [184207.248802] svc: TCP record, 188 bytes
May 28 13:07:29 marcato kernel: [184207.248806] svc: socket ffff8803e3ff2000 recvfrom(ffff880415a7c0bc, 3908) = 188
May 28 13:07:29 marcato kernel: [184207.248809] svc: TCP final record (188 bytes)
May 28 13:07:29 marcato kernel: [184207.248812] svc: got len=188
May 28 13:07:29 marcato kernel: [184207.248815] svc: svc_authenticate (1)
May 28 13:07:29 marcato kernel: [184207.248819] svc: calling dispatcher
May 28 13:07:29 marcato kernel: [184207.248826]...

Mark Haidekker (mhaidekk) wrote :

One more observation that I'll post here as *TENTATIVE*. All clients are , as I wrote earlier, 12.04 clients. The bug, as described above, occurs whenever one of the client's fstab entry uses the nfs filesystem:

192.168.1.2:/home /marcato/home nfs defaults,user,exec,hard,auto,nolock 0 0

However, the NFS server appears to act normally when all clients use nfs4:

192.168.1.2:/home /marcato/home nfs4 defaults,user,exec,hard,auto,nolock 0 0

Interesting, because the /etc/mtab of the corresponding client has vers=4 in the options, which those clients that use nfs4 don't have. So it appears that, after all, all clients use version 4, right?

Mark Haidekker (mhaidekk) wrote :

Strike #11 above. Observation is not correct. The runaway kworker appears even when all clients use "nfs4".

Stefan Bader (smb) wrote :

If /proc/mounts shows nvsvers=4 I would assume as well that nfsv4 is used. Formally I had the feeling that the examples looked like for really using nfsv4 one would need to have one entry in /etc/exports declaring a fsid=0 (iow the root) and then clients would ask for paths relative to that root. On one hand I *think* that I remember someone asking on irc and having issues when not following that. Which was another thing that seemed to have worked before.

For the debugging, I think it does give me a rough idea, just need to match that up against some code. Which will probably take a bit. But somehow it appears to try to connect but then to think the socket needs to be released for some reason. And possibly it either is failing to release it with some unexpected condition or incorrectly assumes it is still connected and repeats this forever. Cannot really think of a reason why this is related (or not) to another server running (at least yet).

Mark Haidekker (mhaidekk) wrote :

I saw the thread with the /etc/exports entry that has fsid=0 (export filesystem root), and, of course, I tried that. Somehow, I did not get any valid exports when, for example, using

/home 192.168.1.0/255.255.255.0(rw,fsid=0,root_squash,no_subtree_check)
/home/samba 192.168.1.0/255.255.255.0(rw,root_squash,no_subtree_check)

Also, I had the same problem when I dsabled NFSv4 and limited the kernel server to NFSv3 (with options in /etc/defaults)

So I moved away from using the fsid=0 option.

Stefan, if you need me to do anything, le me know. I have done my own share of printk-debugging, although I am not sure if I am out of my depth with nfs.

Mark Haidekker (mhaidekk) wrote :

I made one more observation, and it is fully reproducible. Frankly, it would appear that the bug may not be as impactful as originally though (if this is a bug, to begin with). Here it is:

On one of the Linux clients, Windows runs in Virtualbox. The NFS share is mounted within the Linux host, and the mount point is available in the virtual environment as a vboxdrv shared folder. Within Windows, the user uses a translation automation tool called "Trados". If -- and only if -- this program is running, the apparent runaway kworker uses 100% CPU. When Trados is closed, the CPU core returns to idle within ~20 to 30 seconds.

This, finally, explains many observations I made before, which are mere coincidences. It also explains why a second NFS server that I brought online today does not exhibit the runaway behavior. Only the combination of all three: NFS, Virtualbox (Win 7 guest), and Trados, puts a heavy load on the rpc daemon. No other Windows program in the virtual box does that.

Why and how? I used tcpdump to see if anything insane is going on, but it is not -- neither on the client's physical eth0 interface nor on the virtual vboxnet0 interface. I did not find anything suspicious. Also, I cannot explain why the old 10.04 NFS server didn't show this behavior. NFSv3 perhaps? And why does it affect the rpc daemon?

In conclusion, I believe that I found the cause (albeit not the reason) for the runaway behavior. I also believe that the combination of software packages is very unusual, and this bug is not likely to affect many people. I propose to reduce the importance level of this bug to "low".

Latsly, it makes me shudder to think about what Trados might do to the hard disks of people who use it locally, with natively installed Windows.

Stefan Bader (smb) wrote :

Now this also explains why I had a hard time (iow was not being able to) reproducing this here. Since this indeed is a rather unusual corner case, I can put it a little lower on the list. Still I would try to understand the debug output enough to have some better idea about how that windows app is getting NFS into such a mess. Maybe there is some way to at least weaken the effects.

About the fsid=0 exports. Yeah, it seems like a bit of a twisted way of setting things up. That face that I have to mount "<server>:/" for the export defined as fsid=0 I can grasp. But then how other mounts are matched to be below this and to be NFSv4 is not really that obvious.

Mark Haidekker (mhaidekk) wrote :

Well, I am game.

I'll safely assume that Trados is rather one of the less-frequently used programs in this community, so it is obviously up to me to run the actual tests. If you could help me by suggesting debugging functions or actions, I'll gladly provide more information.

tags: removed: kernel-key
Stefan Bader (smb) wrote :

Deeper inspection of the logs looks like the problem is some connection attempt when xprt is not connected. Part of that procedure is to re-use the connection which forces the xprt to disconnect (so the socket can be re-used). This triggers a state change (TCP_CLOSE) and wakes up the task waiting for the connection. But the connection state then in INPROGRESS which somehow gets translated into EGAIN and that triggers call_bind which repeats the re-use of socket process.

With that lead, I found two commits upstream referring to this commit that introduces that behaviour:

* 561ec1603171 (SUNRPC: call_connect_status should recheck bind..)

The two fixes related to that are:

* 1fa3e2e SUNRPC: Ensure call_connect_status() deals correctly with SOFTCONN tasks
* 485f225 SUNRPC: Ensure that call_connect times out correctly

The latter would at least cause timeouts to be re-adjusted before looping back into call_bind. So it might be worth trying those. I build a trusty kernel with those two patches added. The debs are at http://people.canonical.com/~smb/lp1322407/
Could you install those on the server side and see whether this helps with the problem?

Mark Haidekker (mhaidekk) wrote :

Preliminary results: This fixed it.

I say "preliminary", because we tested it only for a short moment. After some heavy Trados use yesterday, the NFS kernel daemon "hung" again (rpc timeout, probably choked on its own connect/reconnect state queue), I installed the kernel packages you prepared. We briefly ran Trados and the runaway kworker no longer appeared.

I would like to observe this a bit more and report in a few days.

Mark Haidekker (mhaidekk) wrote :

FIX CONFIRMED. We have been operating with the new NFS server for several days, and the runaway kworker has not reappeared. Not even once. The NFS daemon has never "hung" during these days.

I'd say this is it. Thanks, Stefan.

Stefan Bader (smb) wrote :

Thanks Mark. I will propose those for 14.04 (Trusty) then.

Changed in linux (Ubuntu Trusty):
assignee: nobody → Stefan Bader (smb)
importance: Undecided → High
status: New → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Stefan Bader (smb) wrote :

The fix-released for the development kernel is based on both patches being in the 3.14.4 upstream stable tree.

Gordon Dracup (gordon-dracup) wrote :

I appear to still having this issue, despite the fix being release. 3.13.0-29-generic

Happy to provide more details.

Gordon Dracup (gordon-dracup) wrote :

Problem exists on 3.13.0-29-generic. Clean 14.04 server install. Export of root sub folder. NFSv4. Client 12.04 mount. Kworker 100% every time the folder is accessed. Testing on a single server and single client.

Dilshod (dilshod-z) wrote :

Linux 3.13.0-29-generic #53-Ubuntu SMP Wed Jun 4 21:00:20 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux

I have this problem. Any suggestion how to fix it? Do this require upgrading to newer version of kernel?

davidwca (davidwca) wrote :

Any confirmed release date for the fixed kernel? I'm using Stefan's custom build which fixes the problem.

Gordon Dracup (gordon-dracup) wrote :

FIX CONFIRMED. I can confirm that Stefan's fix in #18 resolves my issue, and is unrelated to Trados.

Many thanks Stefan.

Stefan Bader (smb) wrote :

It might be a bit unclear but the main task is "fix released" because Utopic (14.10 and current trunk) is based on 3.15 right now (will move to 3.16 before release). So Utopic is ok. For Trusty this is still in the mill. The patches have references to this report. So you should see an automatic post here when a kernel with the fixes hits proposed.
Since I found both patches in the git repo/branch used to prepare the next update, I mark the Trusty task as "fix committed". That still does not mean there is a kernel to test officially. Just that it is in the repository.

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Luis Henriques (henrix) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty

I have the same issue on Ubuntu 12.04 LTS using Trusty kernel:

Linux version 3.13.0-29-generic (buildd@roseapple) (gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) ) #53~precise1-Ubuntu SMP Wed Jun 4 22:06:25 UTC 2014

Distributor ID: Ubuntu
Description: Ubuntu 12.04.4 LTS
Release: 12.04
Codename: precise

Andrew G Meyer (andrewgmeyer) wrote :

Using henrix's -proposed kernel appears to resolve the issue for me.

Under 3.13.0-32-generic #56-Ubuntu, Kworker is no longer running away, whereas before under *.30-generic #55 it was.

My setup is a server with a nfs client on a KVM VM coming in over openvswitch. Restarting the NFS client VM seemed to hang the NFS daemon on the server, and Kworker was running away.

I replicated that same scenario, and everything appears to be working correctly now. The only difference was that there was a significant amount of NFS activity that I didn't recreate, but as I said, everything appears normal now.

Andrew G Meyer (andrewgmeyer) wrote :

tags:added: verification-done-trusty

tags: added: verification-done-trusty
removed: verification-needed-trusty
Mark Haidekker (mhaidekk) wrote :

Thanks, Andrew, for testing and for updating the tags. I have just tested the -proposed kernel and can confirm the fix, too. You beat me by an hour or so in adding verification-done-trusty.

Mark

Gordon Dracup (gordon-dracup) wrote :

I confirm also that the proposed kernel changes #56 fixes this issue. Problem still occurs with #55. See attached package versions which work. Sorry, took me ages to work out how to install proposed kernel changes to test this. :-)

Launchpad Janitor (janitor) wrote :
Download full text (35.8 KiB)

This bug was fixed in the package linux - 3.13.0-32.57

---------------
linux (3.13.0-32.57) trusty; urgency=low

  [ Upstream Kernel Changes ]

  * l2tp: Privilege escalation in ppp over l2tp sockets
    - LP: #1341472
    - CVE-2014-4943

linux (3.13.0-32.56) trusty; urgency=low

  [ Luis Henriques ]

  * Merged back Ubuntu-3.13.0-30.55 security release
  * Revert "x86_64,ptrace: Enforce RIP <= TASK_SIZE_MAX (CVE-2014-4699)"
    - LP: #1337339
  * Release Tracking Bug
    - LP: #1338524

  [ Upstream Kernel Changes ]

  * ptrace,x86: force IRET path after a ptrace_stop()
    - LP: #1337339
    - CVE-2014-4699
  * hpsa: add new Smart Array PCI IDs (May 2014)
    - LP: #1337516

linux (3.13.0-31.55) trusty; urgency=low

  [ Luis Henriques ]

  * Release Tracking Bug
    - LP: #1336278

  [ Andy Whitcroft ]

  * [Config] switch hyper-keyboard to virtual
    - LP: #1325306
  * [Packaging] linux-udeb-flavour -- standardise on linux prefix

  [ dann frazier ]

  * [Config] CONFIG_GPIO_DWAPB=m
    - LP: #1334823

  [ Feng Kan ]

  * SAUCE: (no-up) arm64: dts: Add Designware GPIO dts binding to APM
    X-Gene platform
    - LP: #1334823

  [ John Johansen ]

  * SAUCE: (no-up) apparmor: fix apparmor spams log with warning message
    - LP: #1308761

  [ Kamal Mostafa ]

  * [Config] updateconfigs ACPI_PROCFS_POWER=y after v3.13.11.4 rebase

  [ Loc Ho ]

  * SAUCE: (no-up) phy-xgene: Use correct tuning for Mustang
    - LP: #1335636

  [ Michael Ellerman ]

  * SAUCE: (no-up) powerpc/perf: Ensure all EBB register state is cleared
    on fork()
    - LP: #1328914

  [ Ming Lei ]

  * Revert "SAUCE: (no-up) rtc: Add X-Gene SoC Real Time Clock Driver"
    - LP: #1274305

  [ Suman Tripathi ]

  * SAUCE: (no-up) libahci: Implement the function ahci_restart_engine to
    restart the port dma engine.
    - LP: #1335645
  * SAUCE: (no-up) ata: Fix the dma state machine lockup for the IDENTIFY
    DEVICE PIO mode command.
    - LP: #1335645

  [ Tim Gardner ]

  * [Config] CONFIG_POWERNV_CPUFREQ=y for powerpc, ppc64el
    - LP: #1324571
  * [Debian] Add UTS_UBUNTU_RELEASE_ABI to utsrelease.h
    - LP: #1327619
  * [Config] CONFIG_HAVE_MEMORYLESS_NODES=y
    - LP: #1332063
  * [Config] CONFIG_HID_RMI=m
    - LP: #1305522

  [ Upstream Kernel Changes ]

  * Revert "offb: Add palette hack for little endian"
    - LP: #1333430
  * Revert "net: mvneta: fix usage as a module on RGMII configurations"
    - LP: #1333837
  * Revert "USB: serial: add usbid for dell wwan card to sierra.c"
    - LP: #1333837
  * Revert "macvlan : fix checksums error when we are in bridge mode"
    - LP: #1333838
  * serial: uart: add hw flow control support configuration
    - LP: #1328295
  * mm/numa: Remove BUG_ON() in __handle_mm_fault()
    - LP: #1323165
  * Tools: hv: Handle the case when the target file exists correctly
    - LP: #1306215
  * Documentation/devicetree/bindings: add documentation for the APM X-Gene
    SoC RTC DTS binding
    - LP: #1274305
  * drivers/rtc: add APM X-Gene SoC RTC driver
    - LP: #1274305
  * arm64: add APM X-Gene SoC RTC DTS entry
    - LP: #1274305
  * powerpc/perf: Add Power8 cache & TLB events
    - LP: #1328914
  * powerpc/perf: Configure BH...

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
Yuri Sa (9-yuri-1) wrote :

Sorry to ressurect an old thread, but I did all of this but the problem still persisted.

I have some 25+ clients connected to NFS server.

When i dug into the clients, it was 2 of them with connection failures (one on the eth card, another the cable was damaged)...

When I corrected those problems, my Kworker related to NFS went back to usual 0.5%....

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers