kernel BUG at skbuff.h:1486 Insufficient linear data in skb __skb_pull.part.7+0x4/0x6 [openvswitch]

Bug #1655683 reported by Andrew Crawford
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Fix Released
High
Unassigned
Trusty
Fix Released
High
Andrew Crawford

Bug Description

Since 2016-12-30 EST we have been experiencing repeated crashes of our OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h line 1486:

1471 /**
1472 * skb_peek - peek at the head of an &sk_buff_head
1473 * @list_: list to peek at
1474 *
1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
1476 * be careful with this one. A peek leaves the buffer on the
1477 * list and someone else may run off with it. You must hold
1478 * the appropriate locks or have a private queue to do this.
1479 *
1480 * Returns %NULL for an empty list or a pointer to the head element.
1481 * The reference count is not incremented and the reference is therefore
1482 * volatile. Use with caution.
1483 */
1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
1485 {
1486 struct sk_buff *skb = list_->next;
1487
1488 if (skb == (struct sk_buff *)list_)
1489 skb = NULL;
1490 return skb;
1491 }

This generally results in a full panic crash of the Neutron node and connectivity breaking for VMs within the cloud. However, after using crash-dumptools to collect information on the crashes over the past three days, the kernel loaded by kexec during the crashdump appears in about 2 out of 3 crash instances to continue running, and we see a flap of the neutron services instead of a full panic that brings the Neutron server down and necessitates a hard reboot.

I believe that this is a manifestation of the openvswitch and issue described on 2017-01-08 as:

"OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb."

https://patchwork.ozlabs.org/patch/712373/

I have not tested the patch provided above yet.

Other information and a few sample dmesg outputs from the crash: (multiple dumps available)

# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04

# apt-cache policy openvswitch
N: Unable to locate package openvswitch
root@neutron01:/var/crash# apt-cache policy openvswitch-common
openvswitch-common:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy neutron-plugin-openvswitch-agent
neutron-plugin-openvswitch-agent:
  Installed: 1:2014.1.5-0ubuntu7
  Candidate: 1:2014.1.5-0ubuntu7
  Version table:
 *** 1:2014.1.5-0ubuntu7 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2014.1.3-0ubuntu1.1 0
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     1:2014.1-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

example dmesg:

############## dmesg.201701060019

> [33100.131019] ------------[ cut here ]------------
> [33100.131176] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
> [33100.131424] invalid opcode: 0000 [#1] SMP
> [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_temp_thermal intel_powerclamp coretemp usbnet kvm_intel mii kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich wmi ipmi_si bonding shpchp ioatdma lp mac_hid parport ahci libahci sfc igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
> [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
> [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
> [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
> [33100.134325] RIP: 0010:[<ffffffffa02321c9>] [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.134628] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
> [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
> [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
> [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
> [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
> [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
> [33100.141118] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
> [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [33100.163382] Stack:
> [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
> [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
> [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
> [33100.202184] Call Trace:
> [33100.207553] <IRQ>
> [33100.207617]
> [33100.212849] [<ffffffffa022bbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
> [33100.218139] [<ffffffffa022a8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
> [33100.228464] [<ffffffffa0230b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> [33100.233727] [<ffffffffa0231ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
> [33100.238898] [<ffffffffa0222745>] gre_cisco_rcv+0x65/0xba [gre]
> [33100.243974] [<ffffffffa02222cd>] gre_rcv+0x5d/0x80 [gre]
> [33100.248938] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
> [33100.253823] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
> [33100.258547] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
> [33100.263138] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
> [33100.267636] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
> [33100.272134] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
> [33100.276544] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
> [33100.280999] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
> [33100.285447] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
> [33100.289886] [<ffffffff81070315>] irq_exit+0x105/0x110
> [33100.294224] [<ffffffff81740066>] do_IRQ+0x56/0xc0
> [33100.298433] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
> [33100.302613] <EOI>
> [33100.302676]
> [33100.306717] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
> [33100.310816] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
> [33100.314828] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
> [33100.318732] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
> [33100.322479] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
> [33100.326138] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
> [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
> [33100.340962] RIP [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.344857] RSP <ffff88046fd03bb0>

############## dmesg.201701080127

[ 911.714512] ------------[ cut here ]------------
[ 911.714670] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[ 911.714917] invalid opcode: 0000 [#1] SMP
[ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_devintf gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cdc_ether aesni_intel aes_x86_64 lrw gf128mul glue_helper usbnet ablk_helper cryptd sb_edac mii edac_core lpc_ich bonding mac_hid ipmi_si shpchp wmi lp ioatdma parport ahci sfc libahci igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
[ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
[ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 911.717827] RIP: 0010:[<ffffffffa01c61c9>] [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.718128] RSP: 0018:ffff88046fc03bb0 EFLAGS: 00010297
[ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
[ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
[ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
[ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
[ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
[ 911.724614] FS: 0000000000000000(0000) GS:ffff88046fc00000(0000) knlGS:0000000000000000
[ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
[ 911.746800] Stack:
[ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
[ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
[ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
[ 911.785550] Call Trace:
[ 911.790915] <IRQ>
[ 911.790979]
[ 911.796163] [<ffffffffa01bfbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[ 911.801437] [<ffffffffa01be8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[ 911.811769] [<ffffffffa01c4b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[ 911.817038] [<ffffffffa01c5ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[ 911.822211] [<ffffffffa01b6745>] gre_cisco_rcv+0x65/0xba [gre]
[ 911.827280] [<ffffffffa01b62cd>] gre_rcv+0x5d/0x80 [gre]
[ 911.832213] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[ 911.837094] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[ 911.841810] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[ 911.846397] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[ 911.850889] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[ 911.855384] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[ 911.859796] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[ 911.864208] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[ 911.868654] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[ 911.873093] [<ffffffff81070315>] irq_exit+0x105/0x110
[ 911.877442] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[ 911.881654] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[ 911.885832] <EOI>
[ 911.885896]
[ 911.889937] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[ 911.894036] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[ 911.898017] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[ 911.901888] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[ 911.905643] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[ 911.909308] [<ffffffff8171b2e7>] rest_init+0x77/0x80
[ 911.912842] [<ffffffff81d34f6a>] start_kernel+0x432/0x43d
[ 911.916281] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[ 911.919767] [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
[ 911.923347] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[ 911.926859] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[ 911.940880] RIP [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.944483] RSP <ffff88046fc03bb0>

############## dmesg.201701071542

[23738.192626] ------------[ cut here ]------------
[23738.192782] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[23738.193031] invalid opcode: 0000 [#1] SMP
[23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul cdc_ether crc32_pclmul usbnet mii ghash_clmulni_intel aesni_intel aes_x86_64 lrw lpc_ich sb_edac gf128mul glue_helper ablk_helper cryptd edac_core bonding wmi ipmi_si mac_hid shpchp lp ioatdma parport ahci libahci igb dca sfc e1000e mtd i2c_algo_bit ptp pps_core megaraid_sas mdio
[23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
[23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
[23738.195936] RIP: 0010:[<ffffffffa02441c9>] [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.196238] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
[23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
[23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
[23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
[23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
[23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
[23738.202738] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
[23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
[23738.224978] Stack:
[23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
[23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
[23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
[23738.263818] Call Trace:
[23738.269200] <IRQ>
[23738.269264]
[23738.274454] [<ffffffffa023dbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[23738.279737] [<ffffffffa023c8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[23738.290071] [<ffffffffa0242b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[23738.295339] [<ffffffffa0243ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[23738.300513] [<ffffffffa0206745>] gre_cisco_rcv+0x65/0xba [gre]
[23738.305587] [<ffffffffa02062cd>] gre_rcv+0x5d/0x80 [gre]
[23738.310531] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[23738.315420] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[23738.320146] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[23738.324743] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[23738.329244] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[23738.333744] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[23738.338158] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[23738.342576] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[23738.347025] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[23738.351463] [<ffffffff81070315>] irq_exit+0x105/0x110
[23738.355804] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[23738.360010] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[23738.364183] <EOI>
[23738.364246]
[23738.368280] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[23738.372372] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[23738.376347] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[23738.380212] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[23738.383958] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[23738.387612] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
[23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[23738.402433] RIP [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.406297] RSP <ffff88046fd03bb0>

###########################

CVE References

Revision history for this message
Andrew Crawford (acrawford) wrote :
Revision history for this message
Brad Figg (brad-figg) wrote : Missing required logs.

This bug is missing log files that will aid in diagnosing the problem. From a terminal window please run:

apport-collect 1655683

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: trusty
Revision history for this message
Andrew Crawford (acrawford) wrote :

I built a new kernel from the ubuntu kernel source package for linux-image-3.13.0-106-generic. I am using the ubuntu shipped .config for linux-image-3.13.0-106-generic. I am testing the patch linked below in production.

https://patchwork.ozlabs.org/patch/712373/

Revision history for this message
Andrew Crawford (acrawford) wrote :
Revision history for this message
Andrew Crawford (acrawford) wrote :
Revision history for this message
Andrew Crawford (acrawford) wrote :
Revision history for this message
Andrew Crawford (acrawford) wrote :

Sorry, apport-collect is trying to open a browser from a server where none is installed. I generated the report with apport-cli and have attached here instead.

Revision history for this message
Andrew Crawford (acrawford) wrote :

The kernel running in the above apport report is the patched kernel FYI.

Changed in linux (Ubuntu):
status: Incomplete → Confirmed
Changed in linux (Ubuntu):
importance: Undecided → High
Changed in linux (Ubuntu Trusty):
status: New → Incomplete
importance: Undecided → High
status: Incomplete → Confirmed
Revision history for this message
Tim Gardner (timg-tpi) wrote :
Changed in linux (Ubuntu Trusty):
assignee: nobody → Andrew Crawford (acrawford)
status: Confirmed → In Progress
Changed in linux (Ubuntu):
status: Confirmed → Fix Released
Revision history for this message
Andrew Crawford (acrawford) wrote :

Between 2017-01-05 and about 2017-01-10 EST we had 17 recorded crashes on the neutron node.

After patching as described above, we have had no crashes.

Not being particularly familiar with kernel internals, or the details of the skb data structure, I was not able to confirm explicitly the cause by sending the packets that trigger the crash. I am trusting that the choice to drop the packets with the patch doesn't have any significant side effects for the openvswitch module.

A clearer explanation with earlier patch(es) may be found here:

https://patchwork.ozlabs.org/patch/559944/
https://patchwork.ozlabs.org/patch/712373/

It also may be of note to others that gso and gro offloading to the NIC are turned off on all of our interfaces.

I am not sure how timely the application of the corresponding patch in the upstream stable kernel will be, so here is the patch I used.

attached is the patch for my test build, using the sources in comment #3

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I built a Trusty test kernel with the patch. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1655683/

Can you test this kernel and see if it resolves this bug?

Revision history for this message
Joseph Salisbury (jsalisbury) wrote :

I see the patch was already submitted for SRU per comment #9, so no need to test the kernel posted in comment #11.

Revision history for this message
Andrew Crawford (acrawford) wrote :

Hello Joseph, do you need me to test just kernel-image and headers? or all debs in that dir, thanks for clarification.

-Andrew

Revision history for this message
Andrew Crawford (acrawford) wrote :

Ok scratch comment #13, must have been half asleep when I read comment #12 , is there a status change I need to make for Trusty to move this along? Thanks.

Revision history for this message
Andrew Crawford (acrawford) wrote :

I have moved this to "fix committed", can someone verify that this patch will be backported to the Trusty 3.13 series kernel, or do I need to take any additional steps? I am not sure what my responsibility is in moving this forward from here.

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Tim Gardner (timg-tpi) wrote :

This patch has not been committed to the Trusty repository. It has been reviewed on the kernel team mail list and should be committed soon.

Changed in linux (Ubuntu Trusty):
status: Fix Committed → In Progress
Revision history for this message
Andrew Crawford (acrawford) wrote :

Thanks Tim for the clarification. I will not make any more status changes.

Revision history for this message
Tim Gardner (timg-tpi) wrote :

This patch should be released with 3.13.0-109.156

Changed in linux (Ubuntu Trusty):
status: In Progress → Fix Committed
Revision history for this message
Thadeu Lima de Souza Cascardo (cascardo) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-trusty' to 'verification-done-trusty'. If the problem still exists, change the tag 'verification-needed-trusty' to 'verification-failed-trusty'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-trusty
Revision history for this message
Po-Hsu Lin (cypressyew) wrote :

Hello Andrew,

Can you help us to verify this?

Thank you!

Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 3.13.0-109.156

---------------
linux (3.13.0-109.156) trusty; urgency=low

  [ Thadeu Lima de Souza Cascardo ]

  * Release Tracking Bug
    - LP: #1662186

  [ Luis Henriques ]
  * Backport Dirty COW patch to prevent wineserver freeze (LP: #1658270)
    - ARM: 7985/1: mm: implement pte_accessible for faulting mappings
    - ARM: 8108/1: mm: Introduce {pte,pmd}_isset and {pte,pmd}_isclear
    - ARM: 8037/1: mm: support big-endian page tables
    - ARM: 8109/1: mm: Modify pte_write and pmd_write logic for LPAE
    - arm64: mm: Route pmd thp functions through pte equivalents
    - mm: fix huge zero page accounting in smaps report
    - SAUCE: mm: Respect FOLL_FORCE/FOLL_COW for thp

  * kernel BUG at skbuff.h:1486 Insufficient linear data in skb
    __skb_pull.part.7+0x4/0x6 [openvswitch] (LP: #1655683)
    - SAUCE: openvswitch: gre: filter gre packets

  * CVE-2016-7911
    - block: fix use-after-free in sys_ioprio_get()

  * CVE-2016-7910
    - block: fix use-after-free in seq file

  * Xen MSI setup code incorrectly re-uses cached pirq (LP: #1656381)
    - SAUCE: xen: do not re-use pirq number cached in pci device msi msg data

 -- Thadeu Lima de Souza Cascardo <email address hidden> Tue, 07 Feb 2017 09:26:42 -0200

Changed in linux (Ubuntu Trusty):
status: Fix Committed → Fix Released
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.