kernel oops: NULL pointer dereference in nfs_inode_attach_open_context+0x37/0x70 [nfs]

Bug #1566471 reported by Dan Schatzberg
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
Medium
Unassigned
linux-lts-xenial (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

I'm attempting to boot a Xenial server install (created from debootstrap) via NFS with overlayroot so that the initial rootfs is read-only (via NFS) and all modifications are written to a tmpfs so that I can boot many such machines. The kernel oops occurs during run-init after the initramfs has successfully mounted the NFS rootfs, created the tmpfs, and the overlayfs using both. If I do not use overlayfs, and just boot into the NFS root (read-write), then everything works. Note that the following oops was gathered from a qemu virtual machine that I netbooted, though the apport output was from real hardware. The issue occurs in both cases. Please let me know if I can provide more information.

+ exec run-init /root /sbin/init
[ 9.003288] BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
[ 9.005772] IP: [<ffffffffc01d14d7>] nfs_inode_attach_open_context+0x37/0x70 [nfs]
[ 9.007227] PGD 0
[ 9.007227] Oops: 0002 [#1] SMP
[ 9.007227] Modules linked in: overlay nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse floppy pata_acpi
[ 9.007227] CPU: 0 PID: 1 Comm: init Not tainted 4.4.0-16-generic #32-Ubuntu
[ 9.007227] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 9.007227] task: ffff88013ab80000 ti: ffff88013ab88000 task.ti: ffff88013ab88000
[ 9.007227] RIP: 0010:[<ffffffffc01d14d7>] [<ffffffffc01d14d7>] nfs_inode_attach_open_context+0x37/0x70 [nfs]
[ 9.007227] RSP: 0018:ffff88013ab8bc30 EFLAGS: 00010246
[ 9.007227] RAX: ffff88007fa86d30 RBX: ffff8800bba16000 RCX: 0000000200000000
[ 9.007227] RDX: 0000000000000000 RSI: ffff88007fa86cc0 RDI: ffff8800bba16088
[ 9.007227] RBP: ffff88013ab8bc48 R08: ffff88007f09e09c R09: ffff88013b001800
[ 9.007227] R10: ffff88007fa86cc0 R11: 0000000000000000 R12: ffff88007fa86cc0
[ 9.007227] R13: ffff8800bba16088 R14: ffff8800bb9f7d88 R15: ffff88013a52f010
[ 9.007227] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 9.007227] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.007227] CR2: 0000000000000008 CR3: 000000013a530000 CR4: 00000000001406f0
[ 9.007227] Stack:
[ 9.007227] ffff88007fa86cc0 ffff88013a52f000 ffff8800bb9f7d88 ffff88013ab8bc58
[ 9.007227] ffffffffc01d153b ffff88013ab8bc80 ffffffffc01d3d37 ffff88013a52f000
[ 9.007227] ffff8800bb9f7d88 0000000000000000 ffff88013ab8bca0 ffffffffc01d010d
[ 9.007227] Call Trace:
[ 9.007227] [<ffffffffc01d153b>] nfs_file_set_open_context+0x2b/0x30 [nfs]
[ 9.007227] [<ffffffffc01d3d37>] nfs_open+0x37/0x60 [nfs]
[ 9.007227] [<ffffffffc01d010d>] nfs_file_open+0x4d/0x70 [nfs]
[ 9.007227] [<ffffffff812098cf>] do_dentry_open+0x1ff/0x310
[ 9.007227] [<ffffffffc01d00c0>] ? nfs_file_fsync+0x130/0x130 [nfs]
[ 9.007227] [<ffffffff8120aa76>] vfs_open+0x56/0x60
[ 9.007227] [<ffffffff8121a107>] path_openat+0x1b7/0x1360
[ 9.007227] [<ffffffff8121c4a1>] do_filp_open+0x91/0x100
[ 9.007227] [<ffffffff81229da8>] ? __alloc_fd+0xc8/0x190
[ 9.007227] [<ffffffff8120ae3e>] do_sys_open+0x13e/0x2a0
[ 9.007227] [<ffffffff810a112d>] ? __put_cred+0x3d/0x50
[ 9.007227] [<ffffffff8120a1f8>] ? SyS_access+0x1e8/0x230
[ 9.007227] [<ffffffff8120afbe>] SyS_open+0x1e/0x20
[ 9.007227] [<ffffffff81824ef2>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 9.007227] Code: 54 53 48 8b 47 40 49 89 fc 48 8b 58 30 4c 8d ab 88 00 00 00 4c 89 ef e8 98 37 65 c1 48 8b 93 60 ff ff ff 49 8d 44 24 70 4c 89 ef <48> 89 42 08 49 89 54 24 70 48 8d 93 60 ff ff ff 49 89 54 24 78
[ 9.007227] RIP [<ffffffffc01d14d7>] nfs_inode_attach_open_context+0x37/0x70 [nfs]
[ 9.007227] RSP <ffff88013ab8bc30>
[ 9.007227] CR2: 0000000000000008
[ 9.056135] ---[ end trace 4bf38e0df912649a ]---
[ 9.057055] BUG: unable to handle kernel NULL pointer dereference at 0000000000000158
[ 9.058345] IP: [<ffffffffc01d1c70>] __put_nfs_open_context+0xa0/0x100 [nfs]
[ 9.059479] PGD 0
[ 9.059823] Oops: 0000 [#2] SMP
[ 9.060117] Modules linked in: overlay nfsv3 nfs_acl nfs lockd grace sunrpc fscache raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear aesni_intel aes_x86_64 glue_helper lrw gf128mul ablk_helper cryptd psmouse floppy pata_acpi
[ 9.060117] CPU: 0 PID: 1 Comm: init Tainted: G D 4.4.0-16-generic #32-Ubuntu
[ 9.060117] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.8.2-0-g33fbe13 by qemu-project.org 04/01/2014
[ 9.060117] task: ffff88013ab80000 ti: ffff88013ab88000 task.ti: ffff88013ab88000
[ 9.060117] RIP: 0010:[<ffffffffc01d1c70>] [<ffffffffc01d1c70>] __put_nfs_open_context+0xa0/0x100 [nfs]
[ 9.060117] RSP: 0018:ffff88013ab8b878 EFLAGS: 00010282
[ 9.060117] RAX: 0000000000000000 RBX: ffff880138e3e3c0 RCX: 0000000000000001
[ 9.060117] RDX: ffff88007fd3b358 RSI: 0000000000000001 RDI: ffff880138e3e3c0
[ 9.060117] RBP: ffff88013ab8b8a0 R08: 0000000000000000 R09: 0000000000000000
[ 9.060117] R10: ffff88007fd43598 R11: ffff8800bb71b610 R12: ffff88007fd3b3f8
[ 9.060117] R13: ffff88007fd3b480 R14: 0000000000000001 R15: ffff88007f09e000
[ 9.060117] FS: 0000000000000000(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[ 9.060117] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9.060117] CR2: 0000000000000158 CR3: 0000000001e0a000 CR4: 00000000001406f0
[ 9.060117] Stack:
[ 9.060117] ffff880138e3e3c0 ffff88007fd3b358 ffff88007fd3b480 ffff880138426620
[ 9.060117] ffff88007fd38600 ffff88013ab8b8c8 ffffffffc01d3cf3 ffff8800bb71b600
[ 9.060117] ffff88007fd43598 ffff88007fd43598 ffff88013ab8b8e8 ffffffffc01cfa8b
[ 9.060117] Call Trace:
[ 9.060117] [<ffffffffc01d3cf3>] nfs_file_clear_open_context+0x83/0x90 [nfs]
[ 9.060117] [<ffffffffc01cfa8b>] nfs_file_release+0x3b/0x50 [nfs]
[ 9.060117] [<ffffffff8120db84>] __fput+0xe4/0x220
[ 9.060117] [<ffffffff8120dcfe>] ____fput+0xe/0x10
[ 9.060117] [<ffffffff8109d9e8>] task_work_run+0x78/0xa0
[ 9.060117] [<ffffffff81082b64>] do_exit+0x2e4/0xae0
[ 9.060117] [<ffffffff8101abf1>] oops_end+0xa1/0xd0
[ 9.060117] [<ffffffff81069db5>] no_context+0x135/0x380
[ 9.060117] [<ffffffff8106a080>] __bad_area_nosemaphore+0x80/0x1f0
[ 9.060117] [<ffffffff8106a253>] bad_area+0x43/0x50
[ 9.060117] [<ffffffff8106a76b>] __do_page_fault+0x35b/0x400
[ 9.060117] [<ffffffff8106a877>] trace_do_page_fault+0x37/0xe0
[ 9.060117] [<ffffffff81062f29>] do_async_page_fault+0x19/0x70
[ 9.060117] [<ffffffff818270a8>] async_page_fault+0x28/0x30
[ 9.060117] [<ffffffffc01d14d7>] ? nfs_inode_attach_open_context+0x37/0x70 [nfs]
[ 9.060117] [<ffffffffc01d153b>] nfs_file_set_open_context+0x2b/0x30 [nfs]
[ 9.060117] [<ffffffffc01d3d37>] nfs_open+0x37/0x60 [nfs]
[ 9.060117] [<ffffffffc01d010d>] nfs_file_open+0x4d/0x70 [nfs]
[ 9.060117] [<ffffffff812098cf>] do_dentry_open+0x1ff/0x310
[ 9.060117] [<ffffffffc01d00c0>] ? nfs_file_fsync+0x130/0x130 [nfs]
[ 9.060117] [<ffffffff8120aa76>] vfs_open+0x56/0x60
[ 9.060117] [<ffffffff8121a107>] path_openat+0x1b7/0x1360
[ 9.060117] [<ffffffff8121c4a1>] do_filp_open+0x91/0x100
[ 9.060117] [<ffffffff81229da8>] ? __alloc_fd+0xc8/0x190
[ 9.060117] [<ffffffff8120ae3e>] do_sys_open+0x13e/0x2a0
[ 9.060117] [<ffffffff810a112d>] ? __put_cred+0x3d/0x50
[ 9.060117] [<ffffffff8120a1f8>] ? SyS_access+0x1e8/0x230
[ 9.060117] [<ffffffff8120afbe>] SyS_open+0x1e/0x20
[ 9.060117] [<ffffffff81824ef2>] entry_SYSCALL_64_fastpath+0x16/0x71
[ 9.060117] Code: 89 43 78 ff 14 25 08 bf e2 81 4d 85 e4 74 22 49 8b 44 24 28 44 89 f6 48 89 df 48 8b 80 58 04 00 00 48 8b 00 48 8b 80 e0 00 00 00 <ff> 90 58 01 00 00 48 8b 7b 48 48 85 ff 74 05 e8 bc e5 f7 ff 48
[ 9.060117] RIP [<ffffffffc01d1c70>] __put_nfs_open_context+0xa0/0x100 [nfs]
[ 9.060117] RSP <ffff88013ab8b878>
[ 9.060117] CR2: 0000000000000158
[ 9.060117] ---[ end trace 4bf38e0df912649b ]---
[ 9.060117] Fixing recursive fault but reboot is needed!

Revision history for this message
Dan Schatzberg (schatzberg-dan) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1566471

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
Revision history for this message
Dan Schatzberg (schatzberg-dan) wrote :

I cannot run the command as the machine won't fully boot. I added such a log collected from a boot where I did not use the overlayfs

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

Would it be possible for you to test the latest upstream kernel? Refer to https://wiki.ubuntu.com/KernelMainlineBuilds . Please test the latest v4.6 kernel[0].

If this bug is fixed in the mainline kernel, please add the following tag 'kernel-fixed-upstream'.

If the mainline kernel does not fix this bug, please add the tag: 'kernel-bug-exists-upstream'.

Once testing of the upstream kernel is complete, please mark this bug as "Confirmed".

Thanks in advance.

[0] http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.6-rc2-wily/

Changed in linux (Ubuntu):
importance: Undecided → Medium
status: Confirmed → Incomplete
Revision history for this message
Dan Schatzberg (schatzberg-dan) wrote :

When I use the upstream kernel you linked me to, my initramfs doesn't have the overlayfs kernel module included. If you have any pointers on how to fix that, let me know.

Revision history for this message
Dan Schatzberg (schatzberg-dan) wrote :

I managed to edit the initramfs to work and I hit the same oops on 4.6-rc2

tags: added: kernel-bug-exists-upstream
Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Revision history for this message
Philipp Wendler (philw85) wrote :
Download full text (5.8 KiB)

I also experience this problem using the Xenial kernel 4.4.0-18.34~14.04.1 on Ubuntu 14.04.
I can even reproduce it as a non-root user by creating an overlay mount inside a user namespace.

After mounting an overlay over an NFS mount, I can successfully traverse existing directories and create, write, read, and remove new files. As soon as I try to read an existing file (from the lower layer NFS mount), the application that attempts the read dies and the syslog shows the kernel bug. The system continues running afterwards.

Furthermore, a similar crash occurs for NFS 4 mounts:

Apr 13 09:49:20 tortuga kernel: [ 4611.794037] BUG: unable to handle kernel NULL pointer dereference at 0000000000000160
Apr 13 09:49:20 tortuga kernel: [ 4611.794144] IP: [<ffffffffc088cd5d>] nfs4_file_open+0xcd/0x1d0 [nfsv4]
Apr 13 09:49:20 tortuga kernel: [ 4611.794202] PGD 414777067 PUD 302045067 PMD 0
Apr 13 09:49:20 tortuga kernel: [ 4611.794233] Oops: 0000 [#1] SMP
Apr 13 09:49:20 tortuga kernel: [ 4611.794255] Modules linked in: overlay rpcsec_gss_krb5 nfsv4 ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 xt_CHECKSUM iptable_mangle xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables ebtable_nat ebtables x_tables autofs4 bridge stp llc bnep rfcomm bluetooth nfsd auth_rpcgss nfs_acl nfs binfmt_misc lockd grace sunrpc fscache dm_crypt input_leds joydev snd_hda_codec_realtek snd_hda_codec_generic snd_hda_codec_hdmi snd_hda_intel snd_hda_codec hid_generic snd_hda_core snd_hwdep intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp dcdbas snd_pcm kvm_intel snd_seq_midi snd_seq_midi_event kvm snd_rawmidi usbhid dm_multipath hid snd_seq snd_seq_device irqbypass crct10dif_pclmul snd_timer crc32_pclmul serio_raw snd aesni_intel mei_me aes_x86_64 soundcore lrw gf128mul mei glue_helper ablk_helper shpchp cryptd ppdev msr lpc_ich cpuid parport_pc 8250_fintek mac_hid lp parport amdkfd amd_iommu_v2 radeon i2c_algo_bit ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops e1000e drm ahci psmouse ptp libahci pps_core fjes video [last unloaded: ipmi_msghandler]
Apr 13 09:49:20 tortuga kernel: [ 4611.794983] CPU: 4 PID: 14306 Comm: cat Not tainted 4.4.0-18-generic #34~14.04.1-Ubuntu
Apr 13 09:49:20 tortuga kernel: [ 4611.795027] Hardware name: Dell Inc. OptiPlex 790/0HY9JP, BIOS A07 09/10/2011
Apr 13 09:49:20 tortuga kernel: [ 4611.795067] task: ffff8800a9822940 ti: ffff8803e9d30000 task.ti: ffff8803e9d30000
Apr 13 09:49:20 tortuga kernel: [ 4611.795108] RIP: 0010:[<ffffffffc088cd5d>] [<ffffffffc088cd5d>] nfs4_file_open+0xcd/0x1d0 [nfsv4]
Apr 13 09:49:20 tortuga kernel: [ 4611.795171] RSP: 0018:ffff8803e9d33c18 EFLAGS: 00010246
Apr 13 09:49:20 tortuga kernel: [ 4611.795200] RAX: 0000000000000000 RBX: ffff8803e7d78700 RCX: ffff8803e9d33c38
Apr 13 09:49:20 tortuga kernel: [ 4611.795239] RDX: 0000000000008000 RSI: ffff8803f09a8540 RDI: ffff88041873a148
Apr 13 09:49:20 tortuga kernel: [ 4611.795278] RBP: ffff8803e9d33cb0 R08: 0000000000000000 R09: ffff88041cc03800
Apr 13 09:49:20 tortuga kernel: [ 4611.795317] R10: ffffffffc06c9230 R11: ffffea000f9f5e...

Read more...

oleg (overlayfs)
Changed in linux-lts-xenial (Ubuntu):
status: New → Confirmed
Revision history for this message
oleg (overlayfs) wrote :

On 2016-04-20 a commit was made to the upstream 4.4 stable tree which may fix this bug (not yet tested).

nfs: use file_dentry()

NFS may be used as lower layer of overlayfs and accessing f_path.dentry can lead to a crash. Fix by replacing direct access of file->f_path.dentry with the file_dentry() accessor, which will always return a native object. Fixes: 4bacc9c9234c ("overlayfs: Make f_path always point to the overlay and f_inode to the underlay")

https://git.kernel.org/cgit/linux/kernel/git/stable/linux-stable.git/commit/?h=linux-4.4.y&id=fda9797a6aaad1a8044614fbbdb265dda4328c41

Revision history for this message
Seth Forshee (sforshee) wrote :
Changed in linux (Ubuntu):
status: Confirmed → Incomplete
Revision history for this message
Philipp Wendler (philw85) wrote :

I tested 4.4.0-22.38_amd64 on Ubuntu 14.04 with an overlay over an NFS4 mount (same situation as in comment #7) and the crash when reading existing files from the lower layer is gone.

I did not test overlay over NFS3.

I still cannot successfully write to files that exist in the lower layer ("Operation not supported"), only to new files, but I guess this is not in the scope of this bug report.

Revision history for this message
Seth Forshee (sforshee) wrote :

Marking fix released based on the feedback in comment #10.

Philipp: Thanks for testing. You're correct, the problem writing is outside the scope and would require a new bug report.

Changed in linux (Ubuntu):
status: Incomplete → Fix Released
Changed in linux-lts-xenial (Ubuntu):
status: Confirmed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.