General protection fault in btrfs crashes NFS server

Bug #1499272 reported by Stephane Mutz
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Expired
Medium
Unassigned

Bug Description

Under unknown circumstances, the NFS server crashes and the only solution is a server reboot. The symptom is the following trace in the kernel log file:
Sep 22 21:43:35 lgin0001 kernel: [1584766.014733] general protection fault: 0000 [#1] SMP
Sep 22 21:43:35 lgin0001 kernel: [1584766.014739] Modules linked in: unix_diag tcp_diag inet_diag drbd lru_cache quota_v2 quota_tree bridge stp llc intel_powerclamp coretemp bnep rfcomm gpio_ich ipmi_devintf kvm_intel kvm bluetooth crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper joydev acpi_power_meter dcdbas ablk_helper cryptd serio_raw mac_hid lpc_ich wmi ipmi_si i7core_edac edac_core ioatdma shpchp nfsd auth_rpcgss nfs_acl nfs lockd sunrpc fscache parport_pc ppdev bonding lp parport btrfs xor raid6_pq libcrc32c ses enclosure hid_generic ixgbe psmouse usbhid dca hid ptp pps_core bnx2 megaraid_sas mdio
Sep 22 21:43:35 lgin0001 kernel: [1584766.014793] CPU: 6 PID: 3286 Comm: nfsd Tainted: G I 3.13.0-63-generic #103-Ubuntu
Sep 22 21:43:35 lgin0001 kernel: [1584766.014796] Hardware name: Dell Inc. PowerEdge R510/0DPRKF, BIOS 1.11.0 07/23/2012
Sep 22 21:43:35 lgin0001 kernel: [1584766.014799] task: ffff8805fdb1b000 ti: ffff8805e9016000 task.ti: ffff8805e9016000
Sep 22 21:43:35 lgin0001 kernel: [1584766.014801] RIP: 0010:[<ffffffff81372dd2>] [<ffffffff81372dd2>] memcpy+0x12/0x110
Sep 22 21:43:35 lgin0001 kernel: [1584766.014810] RSP: 0018:ffff8805e90174d0 EFLAGS: 00010202
Sep 22 21:43:35 lgin0001 kernel: [1584766.014812] RAX: ffff8801410d4026 RBX: 0000000000000001 RCX: 0000000000000001
Sep 22 21:43:35 lgin0001 kernel: [1584766.014814] RDX: 0000000000000001 RSI: 0005080000000000 RDI: ffff8801410d4026
Sep 22 21:43:35 lgin0001 kernel: [1584766.014816] RBP: ffff8805e9017508 R08: 0000000000001000 R09: ffff8805e90174d8
Sep 22 21:43:35 lgin0001 kernel: [1584766.014818] R10: 0000000000000000 R11: 0000000000000000 R12: ffff880041980850
Sep 22 21:43:35 lgin0001 kernel: [1584766.014820] R13: 0000160000000000 R14: ffff8801410d4027 R15: 0000000000000001
Sep 22 21:43:35 lgin0001 kernel: [1584766.014823] FS: 0000000000000000(0000) GS:ffff8806172c0000(0000) knlGS:0000000000000000
Sep 22 21:43:35 lgin0001 kernel: [1584766.014825] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Sep 22 21:43:35 lgin0001 kernel: [1584766.014827] CR2: 0000000000cca000 CR3: 0000000001c0e000 CR4: 00000000000007e0
Sep 22 21:43:35 lgin0001 kernel: [1584766.014829] Stack:
Sep 22 21:43:35 lgin0001 kernel: [1584766.014831] ffffffffa016d63c 0000000000001000 ffff880608e3e800 ffff88060c94fea0
Sep 22 21:43:35 lgin0001 kernel: [1584766.014835] 0000000000000000 ffff8800a142bc10 ffff880093063db0 ffff8805e90175c8
Sep 22 21:43:35 lgin0001 kernel: [1584766.014839] ffffffffa015241c 0000000000000f60 0000000000000000 0000000000001000
Sep 22 21:43:35 lgin0001 kernel: [1584766.014843] Call Trace:
Sep 22 21:43:35 lgin0001 kernel: [1584766.014870] [<ffffffffa016d63c>] ? read_extent_buffer+0xbc/0x110 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014890] [<ffffffffa015241c>] btrfs_get_extent+0x91c/0x970 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014911] [<ffffffffa0169837>] __do_readpage+0x357/0x730 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014929] [<ffffffffa0151b00>] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014950] [<ffffffffa0169f92>] __extent_readpages.constprop.41+0x2a2/0x2c0 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014968] [<ffffffffa0151b00>] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.014988] [<ffffffffa016be16>] extent_readpages+0x1b6/0x1c0 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015006] [<ffffffffa0151b00>] ? btrfs_real_readdir+0x5b0/0x5b0 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015016] [<ffffffffa00edb1d>] ? ixgbe_xmit_frame+0x3d/0x80 [ixgbe]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015022] [<ffffffff81197f83>] ? alloc_pages_current+0xa3/0x160
Sep 22 21:43:35 lgin0001 kernel: [1584766.015040] [<ffffffffa014fc5f>] btrfs_readpages+0x1f/0x30 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015046] [<ffffffff8115bdb9>] __do_page_cache_readahead+0x1b9/0x260
Sep 22 21:43:35 lgin0001 kernel: [1584766.015050] [<ffffffff8115c292>] ondemand_readahead+0x152/0x2a0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015054] [<ffffffff8115c411>] page_cache_sync_readahead+0x31/0x50
Sep 22 21:43:35 lgin0001 kernel: [1584766.015059] [<ffffffff811ece1b>] __generic_file_splice_read+0x53b/0x5a0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015064] [<ffffffff811eb6e0>] ? page_cache_pipe_buf_release+0x20/0x20
Sep 22 21:43:35 lgin0001 kernel: [1584766.015068] [<ffffffff81365580>] ? cpumask_next_and+0x30/0x50
Sep 22 21:43:35 lgin0001 kernel: [1584766.015073] [<ffffffff810a6cf3>] ? find_busiest_group+0x133/0x890
Sep 22 21:43:35 lgin0001 kernel: [1584766.015078] [<ffffffff811d88f3>] ? find_inode+0xa3/0xb0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015096] [<ffffffffa014ef80>] ? btrfs_clean_one_deleted_snapshot+0x110/0x110 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015100] [<ffffffff811d9c64>] ? iget5_locked+0x94/0x1e0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015117] [<ffffffffa014f320>] ? create_pinned_em+0x140/0x140 [btrfs]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015121] [<ffffffff811d9f8b>] ? iput+0x3b/0x180
Sep 22 21:43:35 lgin0001 kernel: [1584766.015126] [<ffffffff811ecebe>] generic_file_splice_read+0x3e/0x80
Sep 22 21:43:35 lgin0001 kernel: [1584766.015129] [<ffffffff811ebd66>] do_splice_to+0x66/0x80
Sep 22 21:43:35 lgin0001 kernel: [1584766.015133] [<ffffffff811ebe27>] splice_direct_to_actor+0xa7/0x1e0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015143] [<ffffffffa02ef940>] ? fsid_source+0x60/0x60 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015152] [<ffffffffa02efb91>] nfsd_vfs_read.isra.12+0x151/0x160 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015162] [<ffffffffa02f36e8>] nfsd_read_file+0x68/0xa0 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015173] [<ffffffffa030149e>] nfsd4_encode_read+0x16e/0x260 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015185] [<ffffffffa030982f>] nfsd4_encode_operation+0x5f/0xc0 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015196] [<ffffffffa02ff20f>] nfsd4_proc_compound+0x21f/0x7d0 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015204] [<ffffffffa02ebd3b>] nfsd_dispatch+0xbb/0x200 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015223] [<ffffffffa024563d>] svc_process_common+0x46d/0x6d0 [sunrpc]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015238] [<ffffffffa02459a7>] svc_process+0x107/0x170 [sunrpc]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015246] [<ffffffffa02eb71f>] nfsd+0xbf/0x130 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015254] [<ffffffffa02eb660>] ? nfsd_destroy+0x80/0x80 [nfsd]
Sep 22 21:43:35 lgin0001 kernel: [1584766.015259] [<ffffffff8108b7d2>] kthread+0xd2/0xf0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015263] [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015268] [<ffffffff817347e8>] ret_from_fork+0x58/0x90
Sep 22 21:43:35 lgin0001 kernel: [1584766.015271] [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
Sep 22 21:43:35 lgin0001 kernel: [1584766.015273] Code: 66 0f 1f 84 00 00 00 00 00 e8 fb fb ff ff eb e2 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 f3 48 a5 89 d1 <f3> a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d
Sep 22 21:43:35 lgin0001 kernel: [1584766.015306] RIP [<ffffffff81372dd2>] memcpy+0x12/0x110
Sep 22 21:43:35 lgin0001 kernel: [1584766.015310] RSP <ffff8805e90174d0>
Sep 22 21:43:35 lgin0001 kernel: [1584766.015313] ---[ end trace 1f6f30e209f8104b ]---

ProblemType: Bug
DistroRelease: Ubuntu 14.04
Package: linux-image-3.13.0-63-generic 3.13.0-63.103
ProcVersionSignature: Ubuntu 3.13.0-63.103-generic 3.13.11-ckt25
Uname: Linux 3.13.0-63-generic x86_64
ApportVersion: 2.14.1-0ubuntu3.12
Architecture: amd64
AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', '/dev/snd/timer'] failed with exit code 1:
CRDA: Error: [Errno 2] No such file or directory: 'iw'
Date: Thu Sep 24 11:40:11 2015
HibernationDevice: RESUME=UUID=f2d0fc4a-75fe-4b10-816c-5378020c5578
InstallationDate: Installed on 2015-01-23 (243 days ago)
InstallationMedia: Ubuntu-Server 14.04.1 LTS "Trusty Tahr" - Release amd64 (20140722.3)
MachineType: Dell Inc. PowerEdge R510
ProcFB: 0 VESA VGA
ProcKernelCmdLine: BOOT_IMAGE=/@/boot/vmlinuz-3.13.0-63-generic root=UUID=f7cf4d5a-fff2-45c1-affe-5b61a7eacce3 ro rootflags=subvol=@ acpi=force
PulseList: Error: command ['pacmd', 'list'] failed with exit code 1: No PulseAudio daemon running, or not running as session daemon.
RelatedPackageVersions:
 linux-restricted-modules-3.13.0-63-generic N/A
 linux-backports-modules-3.13.0-63-generic N/A
 linux-firmware 1.127.15
RfKill:

SourcePackage: linux
UpgradeStatus: No upgrade log present (probably fresh install)
WifiSyslog:

dmi.bios.date: 07/23/2012
dmi.bios.vendor: Dell Inc.
dmi.bios.version: 1.11.0
dmi.board.name: 0DPRKF
dmi.board.vendor: Dell Inc.
dmi.board.version: A03
dmi.chassis.type: 23
dmi.chassis.vendor: Dell Inc.
dmi.modalias: dmi:bvnDellInc.:bvr1.11.0:bd07/23/2012:svnDellInc.:pnPowerEdgeR510:pvr:rvnDellInc.:rn0DPRKF:rvrA03:cvnDellInc.:ct23:cvr:
dmi.product.name: PowerEdge R510
dmi.sys.vendor: Dell Inc.

Revision history for this message
Stephane Mutz (stephane-mutz-7) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Status changed to Confirmed

This change was made by a bot.

Changed in linux (Ubuntu):
status: New → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Did this issue start happening after an update/upgrade? Was there a prior kernel version where you were not having this particular problem?

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.3 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.3-rc3-unstable/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Stephane Mutz (stephane-mutz-7) wrote :

This is a production machine (main file server) so it is not possible to test the latest kernel.

The problem appeared during summer but it is hard to tell exactly after which upgrade due
to the elusive nature of the bug.

When I reboot the server, the NFS server crashes very quickly if the clients are still up
and waiting for transactions to complete.

Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for linux (Ubuntu) because there has been no activity for 60 days.]

Changed in linux (Ubuntu):
status: Incomplete → Expired
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.