NULL pointer dereference in kernel in response to NFS traffic

Bug #1508657 reported by Jason
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Confirmed
Undecided
Unassigned

Bug Description

I have a badly behaving NFS client device(an embedded system mounting it's root filesystem off my Ubuntu development machine) which is causing a NULL pointer dereference in the kernel. After this occurs, the NFS server becomes unresponsive. Sending a SIGKILL to the various NFS daemons does not kill the processes. '/etc/init.d/nfs-kernel-server restart' does not work to restore NFS server functionality.

Here is the output of dmesg:

[63517.096117] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[63517.096127] IP: [<ffffffff8161d84d>] skb_copy_and_csum_datagram_iovec+0x2d/0x110
[63517.096136] PGD 0
[63517.096140] Oops: 0000 [#1] SMP
[63517.096144] Modules linked in: nfsv3 rpcsec_gss_krb5 nfsv4 vmnet(OX) vmw_vsock_vmci_transport vsock vmw_vmci vmmon(OX) autofs4 rfcomm bnep bluetooth pl2303 joydev usbserial hid_microsoft nfsd auth_rpcgss nfs_acl nfs snd_hda_codec_hdmi lockd sunrpc fscache nls_iso8859_1 hid_generic snd_hda_codec_realtek snd_hda_intel snd_hda_codec snd_hwdep usbhid x86_pkg_temp_thermal hid intel_powerclamp coretemp mxm_wmi snd_pcm eeepc_wmi asus_wmi sparse_keymap kvm_intel video snd_page_alloc kvm uvcvideo videobuf2_vmalloc videobuf2_memops snd_seq_midi videobuf2_core videodev snd_seq_midi_event crct10dif_pclmul crc32_pclmul snd_rawmidi ghash_clmulni_intel aesni_intel aes_x86_64 lrw snd_seq gf128mul glue_helper ablk_helper cryptd serio_raw snd_seq_device sb_edac snd_timer edac_core mei_me nvidia(POX) mei snd lpc_ich soundcore drm shpchp mac_hid wmi parport_pc ppdev lp parport psmouse r8169 ahci mii libahci
[63517.096222] CPU: 0 PID: 1498 Comm: nfsd Tainted: P OX 3.13.0-66-generic #108-Ubuntu
[63517.096226] Hardware name: System manufacturer System Product Name/P9X79 LE, BIOS 4608 12/24/2013
[63517.096229] task: ffff8807ff194800 ti: ffff88003d996000 task.ti: ffff88003d996000
[63517.096231] RIP: 0010:[<ffffffff8161d84d>] [<ffffffff8161d84d>] skb_copy_and_csum_datagram_iovec+0x2d/0x110
[63517.096237] RSP: 0018:ffff88003d997bc0 EFLAGS: 00010216
[63517.096239] RAX: 0000000000000000 RBX: ffff8807e6540000 RCX: 00000000000004f0
[63517.096241] RDX: 0000000000000000 RSI: 0000000000001080 RDI: ffff8807deab4400
[63517.096243] RBP: ffff88003d997bf8 R08: 0000000000000000 R09: 000000000d03f2fc
[63517.096246] R10: 00000000000004c0 R11: 0000000000000004 R12: 0000000000000008
[63517.096248] R13: ffff8807deab4400 R14: 0000000000001078 R15: ffff8807deab4400
[63517.096251] FS: 0000000000000000(0000) GS:ffff88082fc00000(0000) knlGS:0000000000000000
[63517.096254] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[63517.096256] CR2: 0000000000000008 CR3: 0000000002c0e000 CR4: 00000000001407f0
[63517.096258] Stack:
[63517.096260] ffffffff81616f66 ffffffff81616fb0 ffff8807e6540000 ffff88003d997df8
[63517.096266] 0000000000000000 0000000000001078 ffff8807deab4400 ffff88003d997c60
[63517.096271] ffffffff8168b2ec ffff88003d9ca028 ffff8807e6540070 0000000200000000
[63517.096276] Call Trace:
[63517.096284] [<ffffffff81616f66>] ? skb_checksum+0x26/0x30
[63517.096289] [<ffffffff81616fb0>] ? skb_push+0x40/0x40
[63517.096296] [<ffffffff8168b2ec>] udp_recvmsg+0x1dc/0x380
[63517.096303] [<ffffffff8169650c>] inet_recvmsg+0x6c/0x80
[63517.096308] [<ffffffff8160f0aa>] sock_recvmsg+0x9a/0xd0
[63517.096314] [<ffffffff8107576a>] ? del_timer_sync+0x4a/0x60
[63517.096319] [<ffffffff8172762d>] ? schedule_timeout+0x17d/0x2d0
[63517.096324] [<ffffffff8160f11a>] kernel_recvmsg+0x3a/0x50
[63517.096347] [<ffffffffa0de1d29>] svc_udp_recvfrom+0x89/0x440 [sunrpc]
[63517.096353] [<ffffffff8172c01b>] ? _raw_spin_unlock_bh+0x1b/0x40
[63517.096375] [<ffffffffa0deecc8>] ? svc_get_next_xprt+0xd8/0x310 [sunrpc]
[63517.096393] [<ffffffffa0def450>] svc_recv+0x4a0/0x5c0 [sunrpc]
[63517.096404] [<ffffffffa0e8570d>] nfsd+0xad/0x130 [nfsd]
[63517.096413] [<ffffffffa0e85660>] ? nfsd_destroy+0x80/0x80 [nfsd]
[63517.096418] [<ffffffff8108b7d2>] kthread+0xd2/0xf0
[63517.096423] [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
[63517.096428] [<ffffffff81734ba8>] ret_from_fork+0x58/0x90
[63517.096433] [<ffffffff8108b700>] ? kthread_create_on_node+0x1c0/0x1c0
[63517.096435] Code: 44 00 00 55 31 c0 48 89 e5 41 57 41 56 41 55 49 89 fd 41 54 41 89 f4 53 48 83 ec 10 8b 77 68 41 89 f6 45 29 e6 0f 84 89 00 00 00 <48> 8b 42 08 48 89 d3 48 85 c0 75 14 0f 1f 80 00 00 00 00 48 83
[63517.096477] RIP [<ffffffff8161d84d>] skb_copy_and_csum_datagram_iovec+0x2d/0x110
[63517.096481] RSP <ffff88003d997bc0>
[63517.096483] CR2: 0000000000000008
[63517.096487] ---[ end trace 15884e761cd443a7 ]---

I understand that my NFS client is probably sending malformed data to the NFS server, but this should *never* *ever* result in a NULL pointer dereference in the kernel.

I do not have a capture of the network traffic leading to a crash. Without a ethernet hub or setting up a VM I do not have an easy way to capture it. I can try wireshark or tcpdump, but I'm concerned that the packet which triggers the null-pointer dereference will not make it up the stack, so an independent method of capturing the stream would be the most reliable approach.

1)
# lsb_release -rd
Description: Ubuntu 14.04.3 LTS
Release: 14.04

2)
# apt-cache policy nfs-kernel-server
nfs-kernel-server:
  Installed: 1:1.2.8-6ubuntu1.1
  Candidate: 1:1.2.8-6ubuntu1.1
  Version table:
 *** 1:1.2.8-6ubuntu1.1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:1.2.8-6ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy linux-generic
linux-generic:
  Installed: (none)
  Candidate: 3.13.0.66.72
  Version table:
     3.13.0.66.72 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     3.13.0.24.28 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

3) NFS should not die. If it does, it should be able to be restarted.
4) NFS died. Kernel dereferenced a null pointer. My dog ate my homework.

Revision history for this message
Jason (jgaiser) wrote :

Also this issue is 100% reproducible with my setup, so if you'd like more data, let me know and I will try to accommodate your request.

Revision history for this message
Andreas Bouché (a-bouche) wrote :

This is most likely a kernel problem. Going back to kernel 3.13.0-65 (or 3.16.0-50) solves this problem.
Also, this seems to happen only when using NFS v3 over UDP. I had no Problem using NFS v4 or NFS v3 over TCP.

See also: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1508510
and: https://github.com/mitchellh/vagrant/issues/6423

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in nfs-utils (Ubuntu):
status: New → Confirmed
Steve Langasek (vorlon)
affects: nfs-utils (Ubuntu) → linux (Ubuntu)
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.