nfs4 client hangs on LUCID

Bug #684318 reported by Klaus Steinberger
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
nfs4-acl-tools (Ubuntu)
New
Undecided
Unassigned

Bug Description

We do observe client hangs with NFS4 on the following occasions:

- a network outage
- a server outage
- an expiration of a kerberos ticket.

The Server is an SL 5.5 machine with latest kernel

We often see the following errors in /var/log/messages on the client:

Dec 2 14:36:19 th-ws-i706 kernel: [707908.181453] chromium-brow D 0000000000000000 0 31737 31716 0x00000000
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181457] ffff88015b5bb8e8 0000000000000086 0000000000015bc0 0000000000015bc0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181460] ffff880329cb5f38 ffff88015b5bbfd8 0000000000015bc0 ffff880329cb5b80
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181463] 0000000000015bc0 ffff88015b5bbfd8 0000000000015bc0 ffff880329cb5f38
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181466] Call Trace:
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181485] [<ffffffffa0d014c0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181491] [<ffffffff81541bc7>] io_schedule+0x47/0x70
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181499] [<ffffffffa0d014ce>] nfs_wait_bit_uninterruptible+0xe/0x20 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181502] [<ffffffff8154242f>] __wait_on_bit+0x5f/0x90
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181510] [<ffffffffa0d014c0>] ? nfs_wait_bit_uninterruptible+0x0/0x20 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181513] [<ffffffff815424d8>] out_of_line_wait_on_bit+0x78/0x90
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181517] [<ffffffff810845b0>] ? wake_bit_function+0x0/0x40
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181524] [<ffffffffa0d014af>] nfs_wait_on_request+0x2f/0x40 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181534] [<ffffffffa0d07226>] nfs_try_to_update_request+0xb6/0x160 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181543] [<ffffffffa0d0730d>] nfs_writepage_setup+0x3d/0x1e0 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181551] [<ffffffffa0d07544>] nfs_updatepage+0x94/0x1a0 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181556] [<ffffffff8113b9f1>] ? mem_cgroup_add_lru_list+0x21/0xa0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181563] [<ffffffffa0cf6eaa>] nfs_write_end+0x5a/0x2c0 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181567] [<ffffffff810f3422>] generic_perform_write+0x122/0x1d0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181570] [<ffffffff810f4553>] generic_file_buffered_write+0x73/0xd0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181573] [<ffffffff810f5b00>] __generic_file_aio_write+0x240/0x470
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181576] [<ffffffff810f5d9f>] generic_file_aio_write+0x6f/0xe0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181582] [<ffffffffa0cf6a0a>] nfs_file_write+0xda/0x1e0 [nfs]
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181585] [<ffffffff81142f7a>] do_sync_write+0xfa/0x140
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181589] [<ffffffff8117c893>] ? ep_scan_ready_list+0x183/0x190
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181592] [<ffffffff81084570>] ? autoremove_wake_function+0x0/0x40
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181595] [<ffffffff8117c972>] ? ep_poll+0xb2/0x270
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181600] [<ffffffff81252446>] ? security_file_permission+0x16/0x20
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181602] [<ffffffff81143278>] vfs_write+0xb8/0x1a0
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181605] [<ffffffff81143b11>] sys_write+0x51/0x80
Dec 2 14:36:19 th-ws-i706 kernel: [707908.181609] [<ffffffff810121b2>] system_call_fastpath+0x16/0x1b

After that error message the processes are unkillable, the client sends a packet storm to the server and apport hangs too, so I had to report this manually.

If you need any further information, please ask. The problem is really urgent for us.

affects: ubuntu → nfs4-acl-tools (Ubuntu)
Revision history for this message
Klaus Steinberger (klaus-steinberger) wrote :

Hi, why do you think that this affects nfs4-acl-tools? I think it is a problem deep inside kernel

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.