NFS server hangs

Bug #607724 reported by Torkil Svensgaard
16
This bug affects 3 people
Affects Status Importance Assigned to Milestone
nfs-utils (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

Binary package hint: nfs-kernel-server

1)
Description: Ubuntu 10.04 LTS
Release: 10.04

2)
nfs-kernel-server:
  Installed: 1:1.2.0-4ubuntu4
  Candidate: 1:1.2.0-4ubuntu4
  Version table:
 *** 1:1.2.0-4ubuntu4 0
        500 http://dk.archive.ubuntu.com/ubuntu/ lucid/main Packages
        100 /var/lib/dpkg/status

3)
Stable nfs mounts and good performance

4)
2-3 times a week the nfs server hangs and the machine has to be restarted. The hangs usually occur when there's a lot going on, like multiple people copying large amounts of data to/from the machine. The nfs performance is also horrible, at all times.

The following came in the syslog just before the last hang:

Jul 20 12:31:41 storage2 kernel: [435601.329902] INFO: task nfsd:3226 blocked for more than 120 seconds.
Jul 20 12:31:41 storage2 kernel: [435601.329907] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 20 12:31:41 storage2 kernel: [435601.329911] nfsd D 00000000ffffffff 0 3226 2 0x00000000
Jul 20 12:31:41 storage2 kernel: [435601.329917] ffff880360945200 0000000000000046 0000000000015bc0 0000000000015bc0
Jul 20 12:31:41 storage2 kernel: [435601.329922] ffff880366eb1ab0 ffff880360945fd8 0000000000015bc0 ffff880366eb16f0
Jul 20 12:31:41 storage2 kernel: [435601.329926] 0000000000015bc0 ffff880360945fd8 0000000000015bc0 ffff880366eb1ab0
Jul 20 12:31:41 storage2 kernel: [435601.329930] Call Trace:
Jul 20 12:31:41 storage2 kernel: [435601.329956] [<ffffffffa0390280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.329966] [<ffffffff81555627>] io_schedule+0x47/0x70
Jul 20 12:31:41 storage2 kernel: [435601.329977] [<ffffffffa039028e>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.329982] [<ffffffff81555c4f>] __wait_on_bit+0x5f/0x90
Jul 20 12:31:41 storage2 kernel: [435601.329994] [<ffffffffa0390280>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.329998] [<ffffffff81555cf8>] out_of_line_wait_on_bit+0x78/0x90
Jul 20 12:31:41 storage2 kernel: [435601.330005] [<ffffffff81084fe0>] ? wake_bit_function+0x0/0x40
Jul 20 12:31:41 storage2 kernel: [435601.330016] [<ffffffffa039026f>] nfs_wait_on_request+0x2f/0x40 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.330029] [<ffffffffa039466f>] nfs_wait_on_requests_locked+0x7f/0xd0 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.330035] [<ffffffff81130bb8>] ? add_partial+0x58/0x90
Jul 20 12:31:41 storage2 kernel: [435601.330049] [<ffffffffa0395aae>] nfs_sync_mapping_wait+0x9e/0x1a0 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.330062] [<ffffffffa0395c31>] nfs_wb_page+0x81/0xe0 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.330071] [<ffffffffa0384b0f>] nfs_release_page+0x5f/0x80 [nfs]
Jul 20 12:31:41 storage2 kernel: [435601.330076] [<ffffffff810f2672>] try_to_release_page+0x32/0x50
Jul 20 12:31:41 storage2 kernel: [435601.330081] [<ffffffff811014b3>] shrink_page_list+0x453/0x5f0
Jul 20 12:31:41 storage2 kernel: [435601.330085] [<ffffffff8110195d>] shrink_inactive_list+0x30d/0x7e0
Jul 20 12:31:41 storage2 kernel: [435601.330090] [<ffffffff81101ec1>] shrink_list+0x91/0xf0
Jul 20 12:31:41 storage2 kernel: [435601.330093] [<ffffffff811020b7>] shrink_zone+0x197/0x240
Jul 20 12:31:41 storage2 kernel: [435601.330097] [<ffffffff81102886>] __zone_reclaim+0x146/0x260
Jul 20 12:31:41 storage2 kernel: [435601.330101] [<ffffffff811001d0>] ? isolate_pages_global+0x0/0x50
Jul 20 12:31:41 storage2 kernel: [435601.330104] [<ffffffff81102ab7>] zone_reclaim+0x117/0x150
Jul 20 12:31:41 storage2 kernel: [435601.330109] [<ffffffff810f8fd4>] get_page_from_freelist+0x544/0x6c0
Jul 20 12:31:41 storage2 kernel: [435601.330116] [<ffffffff810116b0>] ? __switch_to+0xd0/0x320
Jul 20 12:31:41 storage2 kernel: [435601.330120] [<ffffffff810f98c9>] __alloc_pages_nodemask+0xd9/0x180
Jul 20 12:31:41 storage2 kernel: [435601.330127] [<ffffffff8112c597>] alloc_pages_current+0x87/0xd0
Jul 20 12:31:41 storage2 kernel: [435601.330131] [<ffffffff81132498>] new_slab+0x248/0x310
Jul 20 12:31:41 storage2 kernel: [435601.330136] [<ffffffff81134d29>] __slab_alloc+0x169/0x2d0
Jul 20 12:31:41 storage2 kernel: [435601.330159] [<ffffffffa0302cba>] ? kmem_zone_alloc+0x9a/0xe0 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330163] [<ffffffff8113524b>] kmem_cache_alloc+0xfb/0x130
Jul 20 12:31:41 storage2 kernel: [435601.330178] [<ffffffffa0302cba>] kmem_zone_alloc+0x9a/0xe0 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330194] [<ffffffffa0302d1e>] kmem_zone_zalloc+0x1e/0x50 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330211] [<ffffffffa02fae24>] _xfs_trans_alloc+0x34/0x80 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330227] [<ffffffffa02fafea>] xfs_trans_alloc+0x9a/0xb0 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330243] [<ffffffffa0300899>] xfs_fsync+0x69/0x1a0 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330259] [<ffffffffa03081ca>] xfs_file_fsync+0x4a/0x60 [xfs]
Jul 20 12:31:41 storage2 kernel: [435601.330269] [<ffffffffa03e408b>] nfsd_sync+0x7b/0xc0 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330277] [<ffffffffa03e4f95>] nfsd_commit+0x65/0x90 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330287] [<ffffffffa03ec2dd>] nfsd3_proc_commit+0x9d/0xf0 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330294] [<ffffffffa03de44e>] nfsd_dispatch+0xfe/0x250 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330312] [<ffffffffa01c55f4>] svc_process_common+0x344/0x610 [sunrpc]
Jul 20 12:31:41 storage2 kernel: [435601.330321] [<ffffffff8105b280>] ? default_wake_function+0x0/0x20
Jul 20 12:31:41 storage2 kernel: [435601.330334] [<ffffffffa01c59d0>] svc_process+0x110/0x150 [sunrpc]
Jul 20 12:31:41 storage2 kernel: [435601.330341] [<ffffffffa03deaf5>] nfsd+0xc5/0x170 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330348] [<ffffffffa03dea30>] ? nfsd+0x0/0x170 [nfsd]
Jul 20 12:31:41 storage2 kernel: [435601.330351] [<ffffffff81084c26>] kthread+0x96/0xa0
Jul 20 12:31:41 storage2 kernel: [435601.330356] [<ffffffff810141ea>] child_rip+0xa/0x20
Jul 20 12:31:41 storage2 kernel: [435601.330360] [<ffffffff81084b90>] ? kthread+0x0/0xa0
Jul 20 12:31:41 storage2 kernel: [435601.330363] [<ffffffff810141e0>] ? child_rip+0x0/0x20

Revision history for this message
Torkil Svensgaard (spam-svensgaard) wrote :
Changed in nfs-utils (Ubuntu):
status: New → Confirmed
Revision history for this message
Roman Yepishev (rye) wrote :

I can reproduce this pretty easily with 10.04 nfs-kernel-server 1.2.0-4ubuntu4.1
All the clients that are connected to the server hang afterwards. This time the server hung when I was browsing the folder full of pictures and nautilus was processing files to make the thumbnails.

What's interesting is I have 2 nfs4 mount points and whenever one of them hangs so does the other, pointing to the server-side issue.

Revision history for this message
Roman Yepishev (rye) wrote :

This looks like LP:561210

Revision history for this message
Vasily Kolosov (vasily-kolosov) wrote :

Happens around 2 times a week on our NFS server. Ubuntu Server 12.04 on a render farm, nodes boot nfsrooted, mounting their root partition from this NFS server.

As one can see in the attached log file, on Oct 24 at 17:50:23 nfsd suddenly hangs. Same in syslog.

Revision history for this message
Vasily Kolosov (vasily-kolosov) wrote :

Also attaching dmesg from the affected machine.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.