Comment 3 for bug 769927

Revision history for this message
Jens Maus (jens.maus) wrote :

After having tried to switch all remote NFS share mounts to use NFS3 instead of NFS4 as I thought that might have been the reason for the crash. However, now that two of our servers showed up with the exact same problem again, I can report that this seems to even happen with nfs3 shares. The traceback then looks like:

-- cut here --
[537189.380178] BUG: Dentry ffff88084c5c5b00{i=babc05,n=/} still in use (1) [unmount of autofs autofs]
[537189.380256] ------------[ cut here ]------------
[537189.380281] kernel BUG at /build/buildd/linux-2.6.38/fs/dcache.c:947!
[537189.380309] invalid opcode: 0000 [#1] SMP
[537189.380335] last sysfs file: /sys/devices/system/cpu/cpu0/cpufreq/stats/time_in_state
[537189.380378] CPU 8
[537189.380382] Modules linked in: binfmt_misc ocfs2_dlmfs ocfs2_stack_o2cb ocfs2_dlm ocfs2_nodemanager ocfs2_stackglue configfs parport_pc ppdev autofs4 mptctl k8temp lm75 lm87 hwmon_vid ipmi_watchdog ipmi_poweroff nfsd exportfs nfs lockd fscache nfs_acl auth_rpcgss sunrpc radeon psmouse serio_raw ghes hed ttm amd64_edac_mod drm_kms_helper edac_core k10temp edac_mce_amd joydev drm nv_tco i2c_algo_bit i2c_nforce2 ipmi_si ipmi_devintf ipmi_msghandler lp parport usbhid hid usb_storage mptsas mptscsih mptbase qla2xxx e1000e scsi_transport_fc scsi_transport_sas scsi_tgt
[537189.380665]
[537189.380685] Pid: 12686, comm: automount Not tainted 2.6.38-8-server #42-Ubuntu SUN MICROSYSTEMS SUN BLADE X8440 SERVER MODULE/Sun Blade X8440 Server Module
[537189.380747] RIP: 0010:[<ffffffff81179be5>] [<ffffffff81179be5>] shrink_dcache_for_umount_subtree+0x285/0x290
[537189.380800] RSP: 0018:ffff880676e63de8 EFLAGS: 00010296
[537189.380825] RAX: 000000000000006d RBX: ffff88084c5c5b5c RCX: 00000000ffffffff
[537189.380867] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000246
[537189.380909] RBP: ffff880676e63e28 R08: 0000000000000000 R09: ffffffff816423e0
[537189.380951] R10: 0000000000000000 R11: 0000000000000000 R12: ffff88084c5c5b5c
[537189.380992] R13: ffff88084c5c5b00 R14: ffff88084c5c5ba0 R15: 00007f739c43eb60
[537189.381035] FS: 00007f7397e6c700(0000) GS:ffff88068fc00000(0000) knlGS:00000000f68ff700
[537189.381078] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[537189.381104] CR2: 00007f2e33b827c0 CR3: 0000000676637000 CR4: 00000000000006e0
[537189.381146] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[537189.381188] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[537189.381230] Process automount (pid: 12686, threadinfo ffff880676e62000, task ffff88047af944a0)
[537189.381275] Stack:
[537189.381294] ffff88087c758658 ffff88067cee6000 ffff880676e63e08 ffff88087c758400
[537189.381340] ffff88084c5c5b5c ffff88084c5c5b00 00007f739c43eea0 00007f739c43eb60
[537189.381386] ffff880676e63e58 ffffffff8117c521 ffff880676e63e48 ffff88087c758400
[537189.381432] Call Trace:
[537189.381456] [<ffffffff8117c521>] shrink_dcache_for_umount+0x51/0x90
[537189.381487] [<ffffffff811670ac>] generic_shutdown_super+0x2c/0x100
[537189.381515] [<ffffffff81167216>] kill_anon_super+0x16/0x60
[537189.381542] [<ffffffff81167287>] kill_litter_super+0x27/0x30
[537189.381572] [<ffffffffa046f4d8>] autofs4_kill_sb+0x48/0x60 [autofs4]
[537189.381600] [<ffffffff81167685>] deactivate_locked_super+0x45/0x70
[537189.381629] [<ffffffff8116830a>] deactivate_super+0x4a/0x70
[537189.381658] [<ffffffff811834a4>] mntput_no_expire+0xa4/0xf0
[537189.381685] [<ffffffff81184530>] sys_umount+0x60/0xd0
[537189.381712] [<ffffffff8100bfc2>] system_call_fastpath+0x16/0x1b
[537189.381738] Code: 8b 40 28 4c 8b 08 49 8b 45 30 48 85 c0 74 07 48 8b 90 a8 00 00 00 48 89 34 24 48 c7 c7 c0 a6 7e 81 4c 89 ee 31 c0 e8 30 af 45 00 <0f> 0b 0f 0b 0f 1f 80 00 00 00 00 55 48 89 e5 48 83 ec 20 48 89
[537189.381893] RIP [<ffffffff81179be5>] shrink_dcache_for_umount_subtree+0x285/0x290
[537189.381936] RSP <ffff880676e63de8>
[537189.382252] ---[ end trace 550a20827993cdc3 ]---
-- cut here --

And I can confirm that this only happens under heavy load on the NFS shares.

System is:
Linux venus 2.6.38-8-server #42-Ubuntu SMP Mon Apr 11 03:49:04 UTC 2011 x86_64 x86_64 x86_64 GNU/Linux

Please note that in my case the NFS server is a Solaris10 server.

Any quick fix or potential workaround is welcome as we otherwise have to downgrade our servers to use kernel 2.6.35 instead.