Kernel error/traffic stops during large NFS transfers from VM

Bug #1366857 reported by Jeya ganesh babu J
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Juniper Openstack
Status tracked in Trunk
R1.1
Fix Committed
Critical
Unassigned
Trunk
Invalid
Critical
Unassigned

Bug Description

The issue happens when a VM is run as an NFS server and large data transfer happens between the NFS server VM and other VMs or to the Compute nodes. The Kernel on the compute node that runs the NFS errors and the data transfer from/to the VM stops. The following is the error displayed.

[135588.871016] BUG: unable to handle kernel NULL pointer dereference at
   (null)
[135588.895016] IP: [<ffffffff81141f25>] put_page+0x5/0x40
[135588.903465] PGD 0
[135588.911785] Oops: 0000 [#1] SMP
[135588.919796] Modules linked in: vhost_net(F) macvtap(F) macvlan(F) ip6table_f
ilter(F) ip6_tables(F) ebtable_nat(F) ebtables(F) ipt_MASQUERADE(F) iptable_nat(
F) nf_nat_ipv4(F) nf_nat(F) nf_conntrack_ipv4(F) nf_defrag_ipv4(F) xt_state(F) n
f_conntrack(F) ipt_REJECT(F) xt_CHECKSUM(F) iptable_mangle(F) xt_tcpudp(F) iptab
le_filter(F) ip_tables(F) x_tables(F) bridge(F) stp(F) llc(F) nbd(F) vrouter(OF)
 vesafb(F) xfs(F) ib_iser(F) rdma_cm(F) ib_cm(F) iw_cm(F) ib_sa(F) ib_mad(F) ib_
core(F) ib_addr(F) iscsi_tcp(F) libiscsi_tcp(F) libiscsi(F) scsi_transport_iscsi
(F) nfsd(F) nfsv4(F) nfs_acl(F) auth_rpcgss(F) nfs(F) coretemp(F) kvm_intel(F) k
vm(F) ghash_clmulni_intel(F) aesni_intel(F) fscache(F) ablk_helper(F) cryptd(F)
lrw(F) aes_x86_64(F) lockd(F) xts(F) gf128mul(F) dm_multipath(F) scsi_dh(F) sunr
pc(F) sb_edac(F) edac_core(F) mei(F) ioatdma(F) gpio_ich(F) joydev(F) microcode(
F) wmi(F) mac_hid(F) lpc_ich(F) lp(F) parport(F) ses(F) enclosure(F) hid_generic
(F) usbhid(F) hid(F) ahci(F) libahci(F) igb(F) ixgbe(F) mpt2sas(F) dca(F) ptp(F)
 scsi_transport_sas(F) pps_core(F) raid_class(F) mdio(F) btrfs(F) zlib_deflate(F
) libcrc32c(F)
[135589.059484] CPU 26
[135589.059610] Pid: 9475, comm: vhost-9473 Tainted: GF O 3.8.0-29-gene
ric #42~precise1-Ubuntu Supermicro SSG-6027R-E1R12L/X9DRD-7LN4F
[135589.089948] RIP: 0010:[<ffffffff81141f25>] [<ffffffff81141f25>] put_page+0x
5/0x40
[135589.110255] RSP: 0018:ffff8806f2e75bf0 EFLAGS: 00010202
[135589.120535] RAX: 0000000000000140 RBX: ffff88200412d400 RCX: ffff882007260ec
[135589.141816] RDX: 0000000000000000 RSI: ffff882007260e00 RDI: 0000000000000000
[135589.163365] RBP: ffff8806f2e75c08 R08: 0000000000000001 R09: 0000000000001000
[135589.185232] R10: ffff882003d04518 R11: 0000000000000001 R12: 0000000000000012
[135589.207945] R13: 000000000000f362 R14: ffffffff814f354b R15: 0000000000000042
[135589.230686] FS: 0000000000000000(0000) GS:ffff88207fd40000(0000) knlGS:0000000000000000
[135589.253617] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[135589.265253] CR2: 0000000000000000 CR3: 00000020258f1000 CR4: 00000000000427e0
[135589.288051] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[135589.311538] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[135589.333796] Process vhost-9473 (pid: 9475, threadinfo ffff8806f2e74000, task ffff8806f2c3dd00)
[135589.356366] Stack:
[135589.367390] ffffffff815d57a8 ffff88200412d400 ffff88200412d400 ffff8806f2e75c18
[135589.389623] ffffffff815d58c5 ffff8806f2e75c38 ffffffff815d58ee ffffea006610d140
[135589.412151] ffff88200839c800 ffff8806f2e75c78 ffffffff815d5945 ffff882003d08498
[135589.434931] Call Trace:
[135589.446162] [<ffffffff815d57a8>] ? skb_release_data.part.43+0x48/0x110
[135589.457550] [<ffffffff815d58c5>] skb_release_data+0x55/0x60
[135589.468733] [<ffffffff815d58ee>] __kfree_skb+0x1e/0x30
[135589.479607] [<ffffffff815d5945>] kfree_skb+0x45/0xc0
[135589.490165] [<ffffffff814f354b>] tun_get_user+0x61b/0x640
[135589.500473] [<ffffffff814f35c4>] tun_sendmsg+0x54/0x80
[135589.511390] [<ffffffffa055fca7>] handle_tx+0x307/0x5e0 [vhost_net]
[135589.521563] [<ffffffffa055ffb5>] handle_tx_kick+0x15/0x20 [vhost_net]
[135589.531300] [<ffffffffa055ce9d>] vhost_worker+0xfd/0x1a0 [vhost_net]
[135589.541095] [<ffffffffa055cda0>] ? vhost_set_memory+0x130/0x130 [vhost_net]
[135589.550581] [<ffffffff8107f1b0>] kthread+0xc0/0xd0
[135589.559892] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0
[135589.569654] [<ffffffff816fc82c>] ret_from_fork+0x7c/0xb0
[135589.578669] [<ffffffff8107f0f0>] ? flush_kthread_worker+0xb0/0xb0
[135589.587694] Code: fc 00 00 00 00 e8 ac fe ff ff 48 63 45 fc 65 48 01 04 25 58 08 01 00 c9 c3 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 66 66 66 66 90 <48> f7 07 00 c0 00 00 55 48 89 e5 75 15 f0 ff 4f 1c 0f 94 c0 84
[135589.614638] RIP [<ffffffff81141f25>] put_page+0x5/0x40
[135589.623362] RSP <ffff8806f2e75bf0>
[135589.631844] CR2: 0000000000000000
[135589.652617] ---[ end trace c53738dbfbdc0bdf ]---

The issue is because of an experimental zero copy introduced in the 3.8.0-29 ubuntu Kernel. This issue got fixed in 3.8.0-31 ubuntu kernel. The kernel.org commits for the fix are as below

https://github.com/torvalds/linux/commit/885291761dba2bfe04df4c0f7bb75e4c920ab82e
https://github.com/torvalds/linux/commit/3dd5c3308e8b671e8e8882ba972f51cefbe9fd0d
https://github.com/torvalds/linux/commit/61d46bf979d5cd7c164709a80ad5676a35494aae
https://github.com/torvalds/linux/commit/ece793fcfc417b3925844be88a6a6dc82ae8f7c6

Revision history for this message
Jeya ganesh babu J (jjeya) wrote :

The issue is fixed by choosing a kernel version that has the fix. The ubuntu kernel version chosen is 3.13.0-34-generic

Changed in juniperopenstack:
status: New → Fix Released
status: Fix Released → New
information type: Proprietary → Public
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.