kernel BUG at skbuff.h:1486 Insufficient linear data in skb __skb_pull.part.7+0x4/0x6 [openvswitch]

Bug #1655117 reported by Andrew Crawford
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
openvswitch (Ubuntu)
New
Undecided
Unassigned

Bug Description

Since 2016-12-30 EST we have been experiencing repeated crashes of our OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h line 1486:

1471 /**
1472 * skb_peek - peek at the head of an &sk_buff_head
1473 * @list_: list to peek at
1474 *
1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
1476 * be careful with this one. A peek leaves the buffer on the
1477 * list and someone else may run off with it. You must hold
1478 * the appropriate locks or have a private queue to do this.
1479 *
1480 * Returns %NULL for an empty list or a pointer to the head element.
1481 * The reference count is not incremented and the reference is therefore
1482 * volatile. Use with caution.
1483 */
1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
1485 {
1486 struct sk_buff *skb = list_->next;
1487
1488 if (skb == (struct sk_buff *)list_)
1489 skb = NULL;
1490 return skb;
1491 }

This generally results in a full panic crash of the Neutron node and connectivity breaking for VMs within the cloud. However, after using crash-dumptools to collect information on the crashes over the past three days, the kernel loaded by kexec during the crashdump appears in about 2 out of 3 crash instances to continue running, and we see a flap of the neutron services instead of a full panic that brings the Neutron server down and necessitates a hard reboot.

I believe that this is a manifestation of the openvswitch and issue described on 2017-01-08 as:

"OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb."

https://patchwork.ozlabs.org/patch/712373/

I have not tested the patch provided above yet.

Other information and a few sample dmesg outputs from the crash: (multiple dumps available)

# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04

# apt-cache policy openvswitch
N: Unable to locate package openvswitch
root@neutron01:/var/crash# apt-cache policy openvswitch-common
openvswitch-common:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy openvswitch-switch
openvswitch-switch:
  Installed: 2.0.2-0ubuntu0.14.04.3
  Candidate: 2.0.2-0ubuntu0.14.04.3
  Version table:
 *** 2.0.2-0ubuntu0.14.04.3 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     2.0.1+git20140120-0ubuntu2 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

# apt-cache policy neutron-plugin-openvswitch-agent
neutron-plugin-openvswitch-agent:
  Installed: 1:2014.1.5-0ubuntu7
  Candidate: 1:2014.1.5-0ubuntu7
  Version table:
 *** 1:2014.1.5-0ubuntu7 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty-updates/main amd64 Packages
        100 /var/lib/dpkg/status
     1:2014.1.3-0ubuntu1.1 0
        500 http://security.ubuntu.com/ubuntu/ trusty-security/main amd64 Packages
     1:2014.1-0ubuntu1 0
        500 http://us.archive.ubuntu.com/ubuntu/ trusty/main amd64 Packages

example dmesg:

############## dmesg.201701060019

> [33100.131019] ------------[ cut here ]------------
> [33100.131176] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
> [33100.131424] invalid opcode: 0000 [#1] SMP
> [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_temp_thermal intel_powerclamp coretemp usbnet kvm_intel mii kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd sb_edac edac_core lpc_ich wmi ipmi_si bonding shpchp ioatdma lp mac_hid parport ahci libahci sfc igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
> [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
> [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
> [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
> [33100.134325] RIP: 0010:[<ffffffffa02321c9>] [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.134628] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
> [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
> [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
> [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
> [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
> [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
> [33100.141118] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
> [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [33100.163382] Stack:
> [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
> [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
> [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
> [33100.202184] Call Trace:
> [33100.207553] <IRQ>
> [33100.207617]
> [33100.212849] [<ffffffffa022bbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
> [33100.218139] [<ffffffffa022a8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
> [33100.228464] [<ffffffffa0230b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
> [33100.233727] [<ffffffffa0231ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
> [33100.238898] [<ffffffffa0222745>] gre_cisco_rcv+0x65/0xba [gre]
> [33100.243974] [<ffffffffa02222cd>] gre_rcv+0x5d/0x80 [gre]
> [33100.248938] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
> [33100.253823] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
> [33100.258547] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
> [33100.263138] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
> [33100.267636] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
> [33100.272134] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
> [33100.276544] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
> [33100.280999] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
> [33100.285447] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
> [33100.289886] [<ffffffff81070315>] irq_exit+0x105/0x110
> [33100.294224] [<ffffffff81740066>] do_IRQ+0x56/0xc0
> [33100.298433] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
> [33100.302613] <EOI>
> [33100.302676]
> [33100.306717] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
> [33100.310816] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
> [33100.314828] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
> [33100.318732] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
> [33100.322479] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
> [33100.326138] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
> [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
> [33100.340962] RIP [<ffffffffa02321c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
> [33100.344857] RSP <ffff88046fd03bb0>

############## dmesg.201701080127

[ 911.714512] ------------[ cut here ]------------
[ 911.714670] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[ 911.714917] invalid opcode: 0000 [#1] SMP
[ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel ipmi_devintf gpio_ich kvm crct10dif_pclmul crc32_pclmul ghash_clmulni_intel cdc_ether aesni_intel aes_x86_64 lrw gf128mul glue_helper usbnet ablk_helper cryptd sb_edac mii edac_core lpc_ich bonding mac_hid ipmi_si shpchp wmi lp ioatdma parport ahci sfc libahci igb e1000e mtd dca i2c_algo_bit ptp pps_core megaraid_sas mdio
[ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
[ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 911.717827] RIP: 0010:[<ffffffffa01c61c9>] [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.718128] RSP: 0018:ffff88046fc03bb0 EFLAGS: 00010297
[ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
[ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
[ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
[ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
[ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
[ 911.724614] FS: 0000000000000000(0000) GS:ffff88046fc00000(0000) knlGS:0000000000000000
[ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
[ 911.746800] Stack:
[ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
[ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
[ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
[ 911.785550] Call Trace:
[ 911.790915] <IRQ>
[ 911.790979]
[ 911.796163] [<ffffffffa01bfbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[ 911.801437] [<ffffffffa01be8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[ 911.811769] [<ffffffffa01c4b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[ 911.817038] [<ffffffffa01c5ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[ 911.822211] [<ffffffffa01b6745>] gre_cisco_rcv+0x65/0xba [gre]
[ 911.827280] [<ffffffffa01b62cd>] gre_rcv+0x5d/0x80 [gre]
[ 911.832213] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[ 911.837094] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[ 911.841810] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[ 911.846397] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[ 911.850889] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[ 911.855384] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[ 911.859796] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[ 911.864208] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[ 911.868654] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[ 911.873093] [<ffffffff81070315>] irq_exit+0x105/0x110
[ 911.877442] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[ 911.881654] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[ 911.885832] <EOI>
[ 911.885896]
[ 911.889937] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[ 911.894036] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[ 911.898017] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[ 911.901888] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[ 911.905643] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[ 911.909308] [<ffffffff8171b2e7>] rest_init+0x77/0x80
[ 911.912842] [<ffffffff81d34f6a>] start_kernel+0x432/0x43d
[ 911.916281] [<ffffffff81d34941>] ? repair_env_string+0x5c/0x5c
[ 911.919767] [<ffffffff81d34120>] ? early_idt_handler_array+0x120/0x120
[ 911.923347] [<ffffffff81d345ee>] x86_64_start_reservations+0x2a/0x2c
[ 911.926859] [<ffffffff81d34733>] x86_64_start_kernel+0x143/0x152
[ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[ 911.940880] RIP [<ffffffffa01c61c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[ 911.944483] RSP <ffff88046fc03bb0>

############## dmesg.201701071542

[23738.192626] ------------[ cut here ]------------
[23738.192782] kernel BUG at /build/linux-mi9H1O/linux-3.13.0/include/linux/skbuff.h:1486!
[23738.193031] invalid opcode: 0000 [#1] SMP
[23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul cdc_ether crc32_pclmul usbnet mii ghash_clmulni_intel aesni_intel aes_x86_64 lrw lpc_ich sb_edac gf128mul glue_helper ablk_helper cryptd edac_core bonding wmi ipmi_si mac_hid shpchp lp ioatdma parport ahci libahci igb dca sfc e1000e mtd i2c_algo_bit ptp pps_core megaraid_sas mdio
[23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
[23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]-/00Y8473, BIOS -[VVE136AUS-1.60]- 12/12/2013
[23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
[23738.195936] RIP: 0010:[<ffffffffa02441c9>] [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.196238] RSP: 0018:ffff88046fd03bb0 EFLAGS: 00010297
[23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
[23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
[23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
[23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
[23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
[23738.202738] FS: 0000000000000000(0000) GS:ffff88046fd00000(0000) knlGS:0000000000000000
[23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
[23738.224978] Stack:
[23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
[23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
[23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
[23738.263818] Call Trace:
[23738.269200] <IRQ>
[23738.269264]
[23738.274454] [<ffffffffa023dbc5>] ovs_flow_extract+0x935/0xb30 [openvswitch]
[23738.279737] [<ffffffffa023c8c5>] ovs_dp_process_received_packet+0x55/0x120 [openvswitch]
[23738.290071] [<ffffffffa0242b5a>] ovs_vport_receive+0x2a/0x30 [openvswitch]
[23738.295339] [<ffffffffa0243ba3>] gre_rcv+0xa3/0xc0 [openvswitch]
[23738.300513] [<ffffffffa0206745>] gre_cisco_rcv+0x65/0xba [gre]
[23738.305587] [<ffffffffa02062cd>] gre_rcv+0x5d/0x80 [gre]
[23738.310531] [<ffffffff81666358>] ip_local_deliver_finish+0xa8/0x210
[23738.315420] [<ffffffff81666658>] ip_local_deliver+0x48/0x80
[23738.320146] [<ffffffff81665fdd>] ip_rcv_finish+0x7d/0x350
[23738.324743] [<ffffffff81666928>] ip_rcv+0x298/0x3d0
[23738.329244] [<ffffffff8162f566>] __netif_receive_skb_core+0x696/0x870
[23738.333744] [<ffffffff8162f758>] __netif_receive_skb+0x18/0x60
[23738.338158] [<ffffffff8163030e>] process_backlog+0xae/0x1a0
[23738.342576] [<ffffffff8162fb3a>] net_rx_action+0x14a/0x270
[23738.347025] [<ffffffff8106fd8c>] __do_softirq+0xfc/0x310
[23738.351463] [<ffffffff81070315>] irq_exit+0x105/0x110
[23738.355804] [<ffffffff81740066>] do_IRQ+0x56/0xc0
[23738.360010] [<ffffffff817356ed>] common_interrupt+0x6d/0x6d
[23738.364183] <EOI>
[23738.364246]
[23738.368280] [<ffffffff815dc982>] ? cpuidle_enter_state+0x52/0xc0
[23738.372372] [<ffffffff815dc978>] ? cpuidle_enter_state+0x48/0xc0
[23738.376347] [<ffffffff815dcacc>] cpuidle_idle_call+0xdc/0x220
[23738.380212] [<ffffffff8101e44e>] arch_cpu_idle+0xe/0x30
[23738.383958] [<ffffffff810c2b31>] cpu_startup_entry+0xc1/0x2b0
[23738.387612] [<ffffffff810427cd>] start_secondary+0x21d/0x2d0
[23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[23738.402433] RIP [<ffffffffa02441c9>] __skb_pull.part.7+0x4/0x6 [openvswitch]
[23738.406297] RSP <ffff88046fd03bb0>

###########################

Revision history for this message
Andrew Crawford (acrawford) wrote :
Revision history for this message
Andrew Crawford (acrawford) wrote :

Hi all, it looks like the patch I referenced above is indeed aimed at the openvswitch kernel module, should have looked more closely at the outset, so this bug really belongs with ubuntu-kernel, and I believe, specifically the pre-DKMS openvswitch kernel module.

Looking into the ubuntu kernel source for /net/openvswitch/vport-gre.c
https://github.com/Canonical-kernel/Ubuntu-kernel/blob/master/net/openvswitch/vport-gre.c

The patch mentioned above at patchwork is not present.

I am not familiar with the upstream kernel process. looking into it.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.