Activity log for bug #1497048

Date Who What changed Old value New value Message
2015-09-17 23:39:11 Dave Chiluk bug added bug
2015-09-18 00:00:07 Brad Figg linux (Ubuntu): status New Incomplete
2015-09-18 00:00:09 Brad Figg tags sts sts trusty
2015-09-18 01:56:05 Dave Chiluk description A user has reported to us the following crash stack trace. _________________________________________________________________________________________ [415165.417433] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a3 [415165.417759] IP: [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.418073] PGD 0 [415165.418161] Oops: 0000 [#1] SMP [415165.418299] Modules linked in: l2tp_eth l2tp_netlink l2tp_core vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan ip_tunnel dm_crypt gpio_ich dm_multipath bridge scsi_dh stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev kvm shpchp sb_edac ipmi_si edac_core acpi_power_meter lpc_ich mac_hid xfs btrfs xor raid6_pq libcrc32c ses enclosure hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel [415165.421570] aesni_intel ixgbe igb aes_x86_64 lrw dca gf128mul glue_helper ptp ablk_helper usbhid cryptd megaraid_sas pps_core hid mdio i2c_algo_bit wmi [415165.427942] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.0-53-generic #89-Ubuntu [415165.440183] Hardware name: Cisco Systems Inc UCSC-C240-M3S/UCSC-C240-M3S, BIOS C240M3.2.0.1a.0.042820140036 04/28/2014 [415165.452693] task: ffff882012d01800 ti: ffff882012cfc000 task.ti: ffff882012cfc000 [415165.465847] RIP: 0010:[<ffffffffa015e24f>] [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.480003] RSP: 0018:ffff88203fce3b88 EFLAGS: 00010296 [415165.487411] RAX: 0000000000000000 RBX: ffff88203fce3ce8 RCX: ffff88203fce3ce8 [415165.502430] RDX: 0000000000000000 RSI: 000000000000000e RDI: ffffffff81cdab00 [415165.517448] RBP: ffff88203fce3bc8 R08: 0000000000000001 R09: 0000000000000000 [415165.532701] R10: 0000000000410000 R11: 000000000f9365e3 R12: ffff88203fce3ce8 [415165.548698] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e [415165.564653] FS: 0000000000000000(0000) GS:ffff88203fce0000(0000) knlGS:0000000000000000 [415165.580681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [415165.588725] CR2: 00000000000000a3 CR3: 0000000001c0e000 CR4: 00000000000427e0 [415165.604495] Stack: [415165.612127] ffffffff81d1ca68 ffff881fbd6c6c00 0000000000000009 0000000000000000 [415165.627360] ffff88203fce3ce8 0000000000000000 000000000000000e 0000000000000000 [415165.642642] ffff88203fce3cb8 ffffffffa015e5a1 0000000000000010 ffffffff81cdab00 [415165.657955] Call Trace: [415165.665405] <IRQ> [415165.665500] [415165.672684] [<ffffffffa015e5a1>] queue_gso_packets+0xa1/0x1f0 [openvswitch] [415165.680015] [<ffffffffa015de7b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch] [415165.694425] [<ffffffffa01607f5>] ovs_dp_upcall+0xe5/0xf0 [openvswitch] [415165.701807] [<ffffffffa016090f>] ovs_dp_process_received_packet+0x10f/0x120 [openvswitch] [415165.716228] [<ffffffffa0166aca>] ovs_vport_receive+0x2a/0x30 [openvswitch] [415165.723591] [<ffffffffa0167391>] netdev_frame_hook+0xc1/0x120 [openvswitch] [415165.730799] [<ffffffff81626892>] __netif_receive_skb_core+0x262/0x840 [415165.737909] [<ffffffff81626e88>] __netif_receive_skb+0x18/0x60 [415165.744824] [<ffffffff81627a1e>] process_backlog+0xae/0x1a0 [415165.751644] [<ffffffff81627272>] net_rx_action+0x152/0x250 [415165.758248] [<ffffffff8106cc6c>] __do_softirq+0xec/0x2c0 [415165.764694] [<ffffffff8106d1b5>] irq_exit+0x105/0x110 [415165.770968] [<ffffffff81735c26>] do_IRQ+0x56/0xc0 [415165.777058] [<ffffffff8172b32d>] common_interrupt+0x6d/0x6d [415165.783041] <EOI> [415165.783127] [415165.788840] [<ffffffff815d523f>] ? cpuidle_enter_state+0x4f/0xc0 [415165.794659] [<ffffffff815d5369>] cpuidle_idle_call+0xb9/0x1f0 [415165.800468] [<ffffffff8101d34e>] arch_cpu_idle+0xe/0x30 [415165.806126] [<ffffffff810bf0a5>] cpu_startup_entry+0xc5/0x290 [415165.811862] [<ffffffff810414dd>] start_secondary+0x21d/0x2d0 [415165.817479] Code: 32 74 04 48 89 71 08 5b 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 49 89 d6 41 55 41 54 53 48 89 cb 48 83 ec 18 <f6> 82 a3 00 00 00 10 48 89 7d c8 48 c7 45 d0 00 00 00 00 0f 85 [415165.834611] RIP [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.845643] RSP <ffff88203fce3b88> [415165.851171] CR2: 00000000000000a3 _________________________________________________________________________________________ After analysis we provided a 3.13 kernel patched with commit 1e16aa3ddf863c6b9f37eddf52503230a62dedb3 and 330966e501ffe282d7184fde4518d5e0c24bc7f8. As a result the fairly consistent crash is no longer occuring. We attempted to push the patch through the stable process here http://marc.info/?l=linux-netdev&m=143631594021618&w=2 and again http://marc.info/?l=linux-netdev&m=143951671004053&w=2 Unfortunately upstream stable has yet to accept these upstream. [Impact] * With certain complicated network configurations as occur in Openstack clouds the kernel crashes with the below stack trace. * We have observed kernel panics when an openvswitch bridge is populated with virtual devices (veth, for example) that have expansive feature sets that include NETIF_F_GSO_GRE. The failure occurs when foreign GRE encapsulated traffic (explicitly not including the initial packets of a connection) arrives at the system (likely via a switch flood event). The packets are GRO accumulated, and passed to the OVS receive processing. As the connection is not in the OVS kernel datapath table, the call path is: ovs_dp_upcall -> queue_gso_packets -> __skb_gso_segment(skb, NETIF_F_SG, false) Without 1e16aa3ddf863c6b9f37eddf52503230a62dedb3, __skb_gso_segment returns NULL,as the features from the device (including _GSO_GRE) are used in place of the _SG feature supplied to the call. The kernel panics on a subsequent dereference of the NULL pointer in queue_userspace_packet(). [Test Case] * We have no easy reproduce procedure. [Regression Potential] * Both patches are pulled from upstream, but not accepted nor rejected as stable patches. Stable threads http://marc.info/?l=linux-netdev&m=143631594021618&w=2 http://marc.info/?l=linux-netdev&m=143951671004053&w=2 * This patch has been in place in a large cloud where the issue used to occur frequently now for 50 days without related incident. [Other Info] * 330966e501ffe282d7184fde4518d5e0c24bc7f8 is included as well, as it obviously avoids possible NULL dereferences in similar areas of code. As such we'd like to see both patches included. ________________________________________________________________________[415165.417433] BUG: unable to handle kernel NULL pointer dereference at 00000000000000a3 [415165.417759] IP: [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.418073] PGD 0 [415165.418161] Oops: 0000 [#1] SMP [415165.418299] Modules linked in: l2tp_eth l2tp_netlink l2tp_core vhost_net vhost macvtap macvlan xt_conntrack ipt_REJECT dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag veth xt_CHECKSUM iptable_mangle ipt_MASQUERADE iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp ip6table_filter ip6_tables iptable_filter ip_tables x_tables nbd ib_iser rdma_cm iw_cm ib_cm ib_sa ib_mad ib_core ib_addr iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi openvswitch gre vxlan ip_tunnel dm_crypt gpio_ich dm_multipath bridge scsi_dh stp llc intel_rapl x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel joydev kvm shpchp sb_edac ipmi_si edac_core acpi_power_meter lpc_ich mac_hid xfs btrfs xor raid6_pq libcrc32c ses enclosure hid_generic crct10dif_pclmul crc32_pclmul ghash_clmulni_intel [415165.421570] aesni_intel ixgbe igb aes_x86_64 lrw dca gf128mul glue_helper ptp ablk_helper usbhid cryptd megaraid_sas pps_core hid mdio i2c_algo_bit wmi [415165.427942] CPU: 7 PID: 0 Comm: swapper/7 Not tainted 3.13.0-53-generic #89-Ubuntu [415165.440183] Hardware name: Cisco Systems Inc UCSC-C240-M3S/UCSC-C240-M3S, BIOS C240M3.2.0.1a.0.042820140036 04/28/2014 [415165.452693] task: ffff882012d01800 ti: ffff882012cfc000 task.ti: ffff882012cfc000 [415165.465847] RIP: 0010:[<ffffffffa015e24f>] [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.480003] RSP: 0018:ffff88203fce3b88 EFLAGS: 00010296 [415165.487411] RAX: 0000000000000000 RBX: ffff88203fce3ce8 RCX: ffff88203fce3ce8 [415165.502430] RDX: 0000000000000000 RSI: 000000000000000e RDI: ffffffff81cdab00 [415165.517448] RBP: ffff88203fce3bc8 R08: 0000000000000001 R09: 0000000000000000 [415165.532701] R10: 0000000000410000 R11: 000000000f9365e3 R12: ffff88203fce3ce8 [415165.548698] R13: 0000000000000000 R14: 0000000000000000 R15: 000000000000000e [415165.564653] FS: 0000000000000000(0000) GS:ffff88203fce0000(0000) knlGS:0000000000000000 [415165.580681] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [415165.588725] CR2: 00000000000000a3 CR3: 0000000001c0e000 CR4: 00000000000427e0 [415165.604495] Stack: [415165.612127] ffffffff81d1ca68 ffff881fbd6c6c00 0000000000000009 0000000000000000 [415165.627360] ffff88203fce3ce8 0000000000000000 000000000000000e 0000000000000000 [415165.642642] ffff88203fce3cb8 ffffffffa015e5a1 0000000000000010 ffffffff81cdab00 [415165.657955] Call Trace: [415165.665405] <IRQ> [415165.665500] [415165.672684] [<ffffffffa015e5a1>] queue_gso_packets+0xa1/0x1f0 [openvswitch] [415165.680015] [<ffffffffa015de7b>] ? ovs_execute_actions+0x2b/0x30 [openvswitch] [415165.694425] [<ffffffffa01607f5>] ovs_dp_upcall+0xe5/0xf0 [openvswitch] [415165.701807] [<ffffffffa016090f>] ovs_dp_process_received_packet+0x10f/0x120 [openvswitch] [415165.716228] [<ffffffffa0166aca>] ovs_vport_receive+0x2a/0x30 [openvswitch] [415165.723591] [<ffffffffa0167391>] netdev_frame_hook+0xc1/0x120 [openvswitch] [415165.730799] [<ffffffff81626892>] __netif_receive_skb_core+0x262/0x840 [415165.737909] [<ffffffff81626e88>] __netif_receive_skb+0x18/0x60 [415165.744824] [<ffffffff81627a1e>] process_backlog+0xae/0x1a0 [415165.751644] [<ffffffff81627272>] net_rx_action+0x152/0x250 [415165.758248] [<ffffffff8106cc6c>] __do_softirq+0xec/0x2c0 [415165.764694] [<ffffffff8106d1b5>] irq_exit+0x105/0x110 [415165.770968] [<ffffffff81735c26>] do_IRQ+0x56/0xc0 [415165.777058] [<ffffffff8172b32d>] common_interrupt+0x6d/0x6d [415165.783041] <EOI> [415165.783127] [415165.788840] [<ffffffff815d523f>] ? cpuidle_enter_state+0x4f/0xc0 [415165.794659] [<ffffffff815d5369>] cpuidle_idle_call+0xb9/0x1f0 [415165.800468] [<ffffffff8101d34e>] arch_cpu_idle+0xe/0x30 [415165.806126] [<ffffffff810bf0a5>] cpu_startup_entry+0xc5/0x290 [415165.811862] [<ffffffff810414dd>] start_secondary+0x21d/0x2d0 [415165.817479] Code: 32 74 04 48 89 71 08 5b 5d c3 66 90 66 66 66 66 90 55 48 89 e5 41 57 41 89 f7 41 56 49 89 d6 41 55 41 54 53 48 89 cb 48 83 ec 18 <f6> 82 a3 00 00 00 10 48 89 7d c8 48 c7 45 d0 00 00 00 00 0f 85 [415165.834611] RIP [<ffffffffa015e24f>] queue_userspace_packet+0x1f/0x2d0 [openvswitch] [415165.845643] RSP <ffff88203fce3b88> [415165.851171] CR2: 00000000000000a3 _________________________________________________________________________________________ After analysis we provided a 3.13 kernel patched with commit 1e16aa3ddf863c6b9f37eddf52503230a62dedb3 and 330966e501ffe282d7184fde4518d5e0c24bc7f8. As a result the fairly consistent crash is no longer occuring. We attempted to push the patch through the stable process here http://marc.info/?l=linux-netdev&m=143631594021618&w=2 and again http://marc.info/?l=linux-netdev&m=143951671004053&w=2 Unfortunately upstream stable has yet to accept these upstream.
2015-09-22 16:33:59 Dave Chiluk linux (Ubuntu): status Incomplete In Progress
2015-09-22 16:34:12 Dave Chiluk nominated for series Ubuntu Trusty
2015-09-22 16:34:21 Dave Chiluk linux (Ubuntu): importance Undecided Medium
2015-09-23 14:52:18 Luis Henriques bug task added linux (Ubuntu Trusty)
2015-09-23 14:52:26 Luis Henriques linux (Ubuntu): status In Progress Invalid
2015-09-23 15:05:34 Luis Henriques linux (Ubuntu Trusty): status New Fix Committed
2015-09-23 15:06:10 Dave Chiluk linux (Ubuntu Trusty): assignee Dave Chiluk (chiluk)
2015-10-08 14:39:47 Luis Henriques tags sts trusty sts trusty verification-needed-trusty
2015-10-13 15:15:18 Dave Chiluk tags sts trusty verification-needed-trusty sts trusty verification-done-trusty
2015-10-15 09:32:51 Mathew Hodson linux (Ubuntu Trusty): milestone trusty-updates
2015-10-15 09:32:58 Mathew Hodson linux (Ubuntu): milestone trusty-updates
2015-10-15 09:33:06 Mathew Hodson linux (Ubuntu Trusty): importance Undecided Medium
2015-10-19 16:03:09 Launchpad Janitor linux (Ubuntu Trusty): status Fix Committed Fix Released
2015-10-19 16:03:09 Launchpad Janitor cve linked 2015-7312