nfs4 daemon crashes, eventually PC locks up
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Expired
|
Medium
|
Unassigned |
Bug Description
Aug 7 01:05:46 mike kernel: [60329.252525] PGD 5b4ac067 PUD 5b4ad067 PMD 0
Aug 7 01:05:46 mike kernel: [60329.252532] Oops: 0000 [#1] SMP
Aug 7 01:05:46 mike kernel: [60329.252538] Modules linked in: rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sanyo_decoder ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ir_nec_decoder ir_rc5_decoder fc0012 rtl2832 dvb_usb_rtl28xxu rtl2830 dvb_usb_v2 dvb_core rc_core nvidia(POF) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi k8temp edac_core edac_mce_amd snd_seq snd_seq_device usblp snd_timer snd soundcore i2c_nforce2 mac_hid parport_pc ppdev lp parport zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) hid_generic usbhid pata_acpi hid firewire_ohci firewire_core psmouse e100 forcedeth mii crc_itu_t sata_nv pata_amd
Aug 7 01:05:46 mike kernel: [60329.252626] CPU: 0 PID: 2099 Comm: nfsd Tainted: PF O 3.13.0-32-generic #57-Ubuntu
Aug 7 01:05:46 mike kernel: [60329.252632] Hardware name: /GA-K8N Pro-SLI, BIOS F4 07/16/2005
Aug 7 01:05:46 mike kernel: [60329.252638] task: ffff880060f5c7d0 ti: ffff88005b42e000 task.ti: ffff88005b42e000
Aug 7 01:05:46 mike kernel: [60329.252642] RIP: 0010:[<
Aug 7 01:05:46 mike kernel: [60329.252660] RSP: 0018:ffff88005b
Aug 7 01:05:46 mike kernel: [60329.252664] RAX: 0000000000000000 RBX: ffff88002e9f3d80 RCX: 00000000008036d0
Aug 7 01:05:46 mike kernel: [60329.252668] RDX: ffffffffa0fe86e7 RSI: 0000000000000000 RDI: ffff88002e9f3d80
Aug 7 01:05:46 mike kernel: [60329.252672] RBP: ffff88005b42fd20 R08: 00000000000171e0 R09: ffff88007fc171e0
Aug 7 01:05:46 mike kernel: [60329.252676] R10: ffffea0000282500 R11: ffffffffa0fbb0c7 R12: 0000000000000000
Aug 7 01:05:46 mike kernel: [60329.252680] R13: ffff88002e9f3d80 R14: ffffffffa0fe86e7 R15: 0000000000000660
Aug 7 01:05:46 mike kernel: [60329.252685] FS: 00007f62c5fb578
Aug 7 01:05:46 mike kernel: [60329.252689] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 7 01:05:46 mike kernel: [60329.252693] CR2: 0000000000000010 CR3: 000000005b4ab000 CR4: 00000000000007f0
Aug 7 01:05:46 mike kernel: [60329.252697] Stack:
Aug 7 01:05:46 mike kernel: [60329.252700] ffff88002e9f3d80 ffff88000a094940 ffff88005c2540b0 ffff880060fe0000
Aug 7 01:05:46 mike kernel: [60329.252707] 0000000000000660 ffff88005b42fd60 ffffffffa0fbb492 ffff880060fe0000
Aug 7 01:05:46 mike kernel: [60329.252714] ffff88000a094900 0000000000000000 ffff880060fe8040 ffff8800667ee240
Aug 7 01:05:46 mike kernel: [60329.252720] Call Trace:
Aug 7 01:05:46 mike kernel: [60329.252740] [<ffffffffa0fbb
Aug 7 01:05:46 mike kernel: [60329.252759] [<ffffffffa0fc8
Aug 7 01:05:46 mike kernel: [60329.252777] [<ffffffffa0fca
Aug 7 01:05:46 mike kernel: [60329.252792] [<ffffffffa0fb6
Aug 7 01:05:46 mike kernel: [60329.252838] [<ffffffffa0f11
Aug 7 01:05:46 mike kernel: [60329.252864] [<ffffffffa0f11
Aug 7 01:05:46 mike kernel: [60329.252879] [<ffffffffa0fb6
Aug 7 01:05:46 mike kernel: [60329.252894] [<ffffffffa0fb6
Aug 7 01:05:46 mike kernel: [60329.252904] [<ffffffff8108b
Aug 7 01:05:46 mike kernel: [60329.252911] [<ffffffff8108b
Aug 7 01:05:46 mike kernel: [60329.252922] [<ffffffff8172c
Aug 7 01:05:46 mike kernel: [60329.252928] [<ffffffff8108b
Aug 7 01:05:46 mike kernel: [60329.252932] Code: eb e6 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 49 89 f4 53 <48> 63 46 10 be d0 00 00 00 4c 8d 3c c5 04 00 00 00 4c 89 ff e8
Aug 7 01:05:46 mike kernel: [60329.253003] RSP <ffff88005b42fcf8>
Aug 7 01:05:46 mike kernel: [60329.253006] CR2: 0000000000000010
Aug 7 01:05:46 mike kernel: [60329.253012] ---[ end trace 737bd44e2270bead ]---
and
Aug 7 03:00:10 mike kernel: [67193.680580] PGD 36391067 PUD 36392067 PMD 0
Aug 7 03:00:10 mike kernel: [67193.680588] Oops: 0000 [#2] SMP
Aug 7 03:00:10 mike kernel: [67193.680594] Modules linked in: rfcomm bnep bluetooth nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache ir_lirc_codec lirc_dev ir_mce_kbd_decoder ir_sanyo_decoder ir_sony_decoder ir_jvc_decoder ir_rc6_decoder ir_nec_decoder ir_rc5_decoder fc0012 rtl2832 dvb_usb_rtl28xxu rtl2830 dvb_usb_v2 dvb_core rc_core nvidia(POF) snd_intel8x0 snd_ac97_codec ac97_bus snd_pcm snd_page_alloc snd_seq_midi snd_seq_midi_event serio_raw snd_rawmidi k8temp edac_core edac_mce_amd snd_seq snd_seq_device usblp snd_timer snd soundcore i2c_nforce2 mac_hid parport_pc ppdev lp parport zfs(POF) zunicode(POF) zavl(POF) zcommon(POF) znvpair(POF) spl(OF) hid_generic usbhid pata_acpi hid firewire_ohci firewire_core psmouse e100 forcedeth mii crc_itu_t sata_nv pata_amd
Aug 7 03:00:10 mike kernel: [67193.680681] CPU: 0 PID: 2098 Comm: nfsd Tainted: PF D O 3.13.0-32-generic #57-Ubuntu
Aug 7 03:00:10 mike kernel: [67193.680688] Hardware name: /GA-K8N Pro-SLI, BIOS F4 07/16/2005
Aug 7 03:00:10 mike kernel: [67193.680693] task: ffff880060f5afe0 ti: ffff880060fde000 task.ti: ffff880060fde000
Aug 7 03:00:10 mike kernel: [67193.680697] RIP: 0010:[<
Aug 7 03:00:10 mike kernel: [67193.680715] RSP: 0018:ffff880060
Aug 7 03:00:10 mike kernel: [67193.680719] RAX: 0000000000000000 RBX: ffff88004a9e0840 RCX: 00000000008c923e
Aug 7 03:00:10 mike kernel: [67193.680723] RDX: ffffffffa0fe86e7 RSI: 0000000000000000 RDI: ffff88004a9e0840
Aug 7 03:00:10 mike kernel: [67193.680727] RBP: ffff880060fdfd20 R08: 00000000000171e0 R09: ffff88007fc171e0
Aug 7 03:00:10 mike kernel: [67193.680731] R10: ffffea0001eb8040 R11: ffffffffa0fbb0c7 R12: 0000000000000000
Aug 7 03:00:10 mike kernel: [67193.680735] R13: ffff88004a9e0840 R14: ffffffffa0fe86e7 R15: 0000000000000660
Aug 7 03:00:10 mike kernel: [67193.680740] FS: 00007f08bffff70
Aug 7 03:00:10 mike kernel: [67193.680744] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Aug 7 03:00:10 mike kernel: [67193.680748] CR2: 0000000000000010 CR3: 0000000036390000 CR4: 00000000000007f0
Aug 7 03:00:10 mike kernel: [67193.680752] Stack:
Aug 7 03:00:10 mike kernel: [67193.680755] ffff88004a9e0840 ffff88007ae019c0 ffff880006bf72a8 ffff880060eb6000
Aug 7 03:00:10 mike kernel: [67193.680762] 0000000000000660 ffff880060fdfd60 ffffffffa0fbb492 ffff880060eb6000
Aug 7 03:00:10 mike kernel: [67193.680768] ffff88007ae01980 0000000000000000 ffff8800667e9040 ffff8800667eb240
Aug 7 03:00:10 mike kernel: [67193.680775] Call Trace:
Aug 7 03:00:10 mike kernel: [67193.680795] [<ffffffffa0fbb
Aug 7 03:00:10 mike kernel: [67193.680814] [<ffffffffa0fc8
Aug 7 03:00:10 mike kernel: [67193.680832] [<ffffffffa0fca
Aug 7 03:00:10 mike kernel: [67193.680847] [<ffffffffa0fb6
Aug 7 03:00:10 mike kernel: [67193.680894] [<ffffffffa0f11
Aug 7 03:00:10 mike kernel: [67193.680920] [<ffffffffa0f11
Aug 7 03:00:10 mike kernel: [67193.680934] [<ffffffffa0fb6
Aug 7 03:00:10 mike kernel: [67193.680949] [<ffffffffa0fb6
Aug 7 03:00:10 mike kernel: [67193.680959] [<ffffffff8108b
Aug 7 03:00:10 mike kernel: [67193.680966] [<ffffffff8108b
Aug 7 03:00:10 mike kernel: [67193.680977] [<ffffffff8172c
Aug 7 03:00:10 mike kernel: [67193.680983] [<ffffffff8108b
Aug 7 03:00:10 mike kernel: [67193.680987] Code: eb e6 66 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 41 57 41 56 49 89 d6 41 55 49 89 fd 41 54 49 89 f4 53 <48> 63 46 10 be d0 00 00 00 4c 8d 3c c5 04 00 00 00 4c 89 ff e8
Aug 7 03:00:10 mike kernel: [67193.681058] RSP <ffff880060fdfcf8>
Aug 7 03:00:10 mike kernel: [67193.681061] CR2: 0000000000000010
Aug 7 03:00:10 mike kernel: [67193.681067] ---[ end trace 737bd44e2270beae ]---
After this sequence, the server will lockup solid, and even REISUB will not reset it, it requires either hardware reset or power cycle.
It happens about 3 times a week (it has a function as file server for other linux machines), and only started happening after upgrade from saucy to trusty.
Hardware has been tested, with no obvious errors.
---
ApportVersion: 2.14.1-0ubuntu3.5
Architecture: amd64
AudioDevicesInUse:
USER PID ACCESS COMMAND
/dev/snd/
/dev/snd/
DistroRelease: Ubuntu 14.04
HibernationDevice: RESUME=
InstallationDate: Installed on 2014-10-01 (23 days ago)
InstallationMedia: Ubuntu 14.04 LTS "Trusty Tahr" - Release amd64 (20140417)
MachineType: CLEVO D900K
NonfreeKernelMo
Package: linux (not installed)
PccardctlIdent:
Socket 0:
no product info available
PccardctlStatus:
Socket 0:
no card
ProcEnviron:
LANGUAGE=en_AU:en
TERM=xterm
PATH=(custom, no user)
LANG=en_AU.UTF-8
SHELL=/bin/bash
ProcFB:
ProcKernelCmdLine: BOOT_IMAGE=
ProcVersionSign
PulseList:
Error: command ['pacmd', 'list'] failed with exit code 1: Home directory not accessible: Permission denied
No PulseAudio daemon running, or not running as session daemon.
RelatedPackageV
linux-
linux-
linux-firmware 1.127.7
Tags: trusty
Uname: Linux 3.13.0-37-generic x86_64
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: fuse
_MarkForUpload: True
dmi.bios.date: 12/06/2005
dmi.bios.vendor: Phoenix
dmi.bios.version: 4.06CJ15
dmi.board.name: D900K
dmi.board.vendor: CLEVO
dmi.board.version: VT8341B
dmi.chassis.
dmi.chassis.type: 1
dmi.chassis.vendor: No Enclosure
dmi.chassis.
dmi.modalias: dmi:bvnPhoenix:
dmi.product.name: D900K
dmi.product.
dmi.sys.vendor: CLEVO
no longer affects: | linux-meta-lts-backport-oneiric (Ubuntu) |
Changed in linux (Ubuntu): | |
status: | New → Confirmed |
affects: | linux-meta-lts-backport-quantal (Ubuntu) → linux (Ubuntu) |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | New → Incomplete |
further investigation :-
I have 4 partitions on 2 disks mounted in fstab, then these 4 partitions (ext4) are mounted with "bind" to /exports for eventual nfs4 export.
After reboot, some of these partitions are mounted, (according to mount) but are empty. If these /exports bind directories are umounted, then remounted with mount -a, then nfs-kernel-server restarted, there is no problem. It has been running without problem for some days now, when before it would show crash in logs every day, and freeze once per day or so.
It is a pain to have to log into the file server and jigger around with mounts/exportfs manually, after every reboot.
Could it be that some of the directories are exported before the "bind" mount is done, confusing the exportfs command?
Is there some delay or change of dependencies in boot up that could delay the exportfs until all directories are correctly mounted?
Is there some problem with nfs4 facls as noted with other launchpad bugs?