kernel BUG at skbuff.h:1486 Insufficient linear data in skb __skb_pull.part.7+0x4/0x6 [openvswitch]
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openvswitch (Ubuntu) |
New
|
Undecided
|
Unassigned |
Bug Description
Since 2016-12-30 EST we have been experiencing repeated crashes of our OpenStack Icehouse / Trusty Neutron node with a kernel BUG at skbuff.h line 1486:
1471 /**
1472 * skb_peek - peek at the head of an &sk_buff_head
1473 * @list_: list to peek at
1474 *
1475 * Peek an &sk_buff. Unlike most other operations you _MUST_
1476 * be careful with this one. A peek leaves the buffer on the
1477 * list and someone else may run off with it. You must hold
1478 * the appropriate locks or have a private queue to do this.
1479 *
1480 * Returns %NULL for an empty list or a pointer to the head element.
1481 * The reference count is not incremented and the reference is therefore
1482 * volatile. Use with caution.
1483 */
1484 static inline struct sk_buff *skb_peek(const struct sk_buff_head *list_)
1485 {
1486 struct sk_buff *skb = list_->next;
1487
1488 if (skb == (struct sk_buff *)list_)
1489 skb = NULL;
1490 return skb;
1491 }
This generally results in a full panic crash of the Neutron node and connectivity breaking for VMs within the cloud. However, after using crash-dumptools to collect information on the crashes over the past three days, the kernel loaded by kexec during the crashdump appears in about 2 out of 3 crash instances to continue running, and we see a flap of the neutron services instead of a full panic that brings the Neutron server down and necessitates a hard reboot.
I believe that this is a manifestation of the openvswitch and issue described on 2017-01-08 as:
"OVS can only process L2 packets. But OVS GRE receive handler
can accept IP-GRE packets. When such packet is processed by
OVS datapath it can trigger following assert failure due
to insufficient linear data in skb."
https:/
I have not tested the patch provided above yet.
Other information and a few sample dmesg outputs from the crash: (multiple dumps available)
# lsb_release -rd
Description: Ubuntu 14.04.5 LTS
Release: 14.04
# apt-cache policy openvswitch
N: Unable to locate package openvswitch
root@neutron01:
openvswitch-common:
Installed: 2.0.2-0ubuntu0.
Candidate: 2.0.2-0ubuntu0.
Version table:
*** 2.0.2-0ubuntu0.
500 http://
100 /var/lib/
2.
500 http://
# apt-cache policy openvswitch-switch
openvswitch-switch:
Installed: 2.0.2-0ubuntu0.
Candidate: 2.0.2-0ubuntu0.
Version table:
*** 2.0.2-0ubuntu0.
500 http://
100 /var/lib/
2.
500 http://
# apt-cache policy neutron-
neutron-
Installed: 1:2014.1.5-0ubuntu7
Candidate: 1:2014.1.5-0ubuntu7
Version table:
*** 1:2014.1.5-0ubuntu7 0
500 http://
100 /var/lib/
1:
500 http://
1:
500 http://
example dmesg:
############## dmesg.201701060019
> [33100.131019] ------------[ cut here ]------------
> [33100.131176] kernel BUG at /build/
> [33100.131424] invalid opcode: 0000 [#1] SMP
> [33100.131560] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich cdc_ether x86_pkg_
> [33100.133560] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
> [33100.133800] Hardware name: IBM System x3650 M4 : -[7915AC1]
> [33100.134096] task: ffff880469da4800 ti: ffff880469dae000 task.ti: ffff880469dae000
> [33100.134325] RIP: 0010:[<
> [33100.134628] RSP: 0018:ffff88046f
> [33100.134792] RAX: ffff880035d73866 RBX: ffff880461efb600 RCX: ffff880035d73800
> [33100.135011] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
> [33100.135231] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880035d73800
> [33100.135451] R10: ffff880461efb600 R11: 0000000000000000 R12: ffff88046fd03c18
> [33100.135671] R13: ffff880866a88a80 R14: ffff88046fd03c18 R15: ffff880461e49480
> [33100.141118] FS: 000000000000000
> [33100.152198] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [33100.157796] CR2: 00007fc30157d090 CR3: 0000000001c0e000 CR4: 00000000001407e0
> [33100.163382] Stack:
> [33100.168800] ffff88046fd03be0 ffffffffa022bbc5 ffffffff81cdaf00 ffff880461efb600
> [33100.179942] ffffe8fbefd04890 ffff880866a88a80 ffff88046fd03cc8 ffffffffa022a8c5
> [33100.191068] ffffffff81cdaf00 0000000000000001 ffff880866cb70c4 ffff8804541b6180
> [33100.202184] Call Trace:
> [33100.207553] <IRQ>
> [33100.207617]
> [33100.212849] [<ffffffffa022b
> [33100.218139] [<ffffffffa022a
> [33100.228464] [<ffffffffa0230
> [33100.233727] [<ffffffffa0231
> [33100.238898] [<ffffffffa0222
> [33100.243974] [<ffffffffa0222
> [33100.248938] [<ffffffff81666
> [33100.253823] [<ffffffff81666
> [33100.258547] [<ffffffff81665
> [33100.263138] [<ffffffff81666
> [33100.267636] [<ffffffff8162f
> [33100.272134] [<ffffffff8162f
> [33100.276544] [<ffffffff81630
> [33100.280999] [<ffffffff8162f
> [33100.285447] [<ffffffff8106f
> [33100.289886] [<ffffffff81070
> [33100.294224] [<ffffffff81740
> [33100.298433] [<ffffffff81735
> [33100.302613] <EOI>
> [33100.302676]
> [33100.306717] [<ffffffff815dc
> [33100.310816] [<ffffffff815dc
> [33100.314828] [<ffffffff815dc
> [33100.318732] [<ffffffff8101e
> [33100.322479] [<ffffffff810c2
> [33100.326138] [<ffffffff81042
> [33100.329686] Code: a0 e8 8c 86 e3 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 70 42 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
> [33100.340962] RIP [<ffffffffa0232
> [33100.344857] RSP <ffff88046fd03bb0>
############## dmesg.201701080127
[ 911.714512] ------------[ cut here ]------------
[ 911.714670] kernel BUG at /build/
[ 911.714917] invalid opcode: 0000 [#1] SMP
[ 911.715053] Modules linked in: xt_nat xt_conntrack xt_REDIRECT xt_tcpudp ip6table_filter ip6_tables iptable_filter iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c x86_pkg_
[ 911.717060] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.13.0-106-generic #153-Ubuntu
[ 911.717301] Hardware name: IBM System x3650 M4 : -[7915AC1]
[ 911.717597] task: ffffffff81c15480 ti: ffffffff81c00000 task.ti: ffffffff81c00000
[ 911.717827] RIP: 0010:[<
[ 911.718128] RSP: 0018:ffff88046f
[ 911.718291] RAX: ffff880079de52e6 RBX: ffff880463335000 RCX: ffff880079de5280
[ 911.718511] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fc03c98
[ 911.718731] RBP: ffff88046fc03bb0 R08: 0000000000000000 R09: ffff880079de5280
[ 911.718951] R10: ffff880463335000 R11: 0000000000000000 R12: ffff88046fc03c18
[ 911.719171] R13: ffff880468b60c00 R14: ffff88046fc03c18 R15: ffff8804631a0b40
[ 911.724614] FS: 000000000000000
[ 911.735614] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 911.741214] CR2: 00007f1898042d70 CR3: 0000000001c0e000 CR4: 00000000001407f0
[ 911.746800] Stack:
[ 911.752201] ffff88046fc03be0 ffffffffa01bfbc5 ffffffff81cdaf00 ffff880463335000
[ 911.763305] ffffe8fbefc04890 ffff880468b60c00 ffff88046fc03cc8 ffffffffa01be8c5
[ 911.774433] ffffffff81cdaf00 0000000000000001 ffff8804675cf9c4 ffff88045941d380
[ 911.785550] Call Trace:
[ 911.790915] <IRQ>
[ 911.790979]
[ 911.796163] [<ffffffffa01bf
[ 911.801437] [<ffffffffa01be
[ 911.811769] [<ffffffffa01c4
[ 911.817038] [<ffffffffa01c5
[ 911.822211] [<ffffffffa01b6
[ 911.827280] [<ffffffffa01b6
[ 911.832213] [<ffffffff81666
[ 911.837094] [<ffffffff81666
[ 911.841810] [<ffffffff81665
[ 911.846397] [<ffffffff81666
[ 911.850889] [<ffffffff8162f
[ 911.855384] [<ffffffff8162f
[ 911.859796] [<ffffffff81630
[ 911.864208] [<ffffffff8162f
[ 911.868654] [<ffffffff8106f
[ 911.873093] [<ffffffff81070
[ 911.877442] [<ffffffff81740
[ 911.881654] [<ffffffff81735
[ 911.885832] <EOI>
[ 911.885896]
[ 911.889937] [<ffffffff815dc
[ 911.894036] [<ffffffff815dc
[ 911.898017] [<ffffffff815dc
[ 911.901888] [<ffffffff8101e
[ 911.905643] [<ffffffff810c2
[ 911.909308] [<ffffffff8171b
[ 911.912842] [<ffffffff81d34
[ 911.916281] [<ffffffff81d34
[ 911.919767] [<ffffffff81d34
[ 911.923347] [<ffffffff81d34
[ 911.926859] [<ffffffff81d34
[ 911.930305] Code: a0 e8 8c 46 ea e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 30 49 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[ 911.940880] RIP [<ffffffffa01c6
[ 911.944483] RSP <ffff88046fc03bb0>
############## dmesg.201701071542
[23738.192626] ------------[ cut here ]------------
[23738.192782] kernel BUG at /build/
[23738.193031] invalid opcode: 0000 [#1] SMP
[23738.193167] Modules linked in: xt_nat xt_conntrack ip6table_filter ip6_tables iptable_filter xt_REDIRECT xt_tcpudp iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack ip_tables x_tables veth openvswitch gre vxlan ip_tunnel libcrc32c ipmi_devintf gpio_ich x86_pkg_
[23738.195169] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 3.13.0-106-generic #153-Ubuntu
[23738.195410] Hardware name: IBM System x3650 M4 : -[7915AC1]
[23738.195706] task: ffff880869959800 ti: ffff880469da4000 task.ti: ffff880469da4000
[23738.195936] RIP: 0010:[<
[23738.196238] RSP: 0018:ffff88046f
[23738.196402] RAX: ffff880453cad7e6 RBX: ffff88045d1e7200 RCX: ffff880453cad780
[23738.196622] RDX: 0000000000000210 RSI: 0000000000000214 RDI: ffff88046fd03c98
[23738.196842] RBP: ffff88046fd03bb0 R08: 0000000000000000 R09: ffff880453cad780
[23738.197062] R10: ffff88045d1e7200 R11: 0000000000000000 R12: ffff88046fd03c18
[23738.197283] R13: ffff880466dbc0c0 R14: ffff88046fd03c18 R15: ffff880462a32f00
[23738.202738] FS: 000000000000000
[23738.213771] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[23738.219381] CR2: 00007efcd7eee090 CR3: 0000000001c0e000 CR4: 00000000001407e0
[23738.224978] Stack:
[23738.230390] ffff88046fd03be0 ffffffffa023dbc5 ffffffff81cdaf00 ffff88045d1e7200
[23738.241516] ffffe8fbefd04770 ffff880466dbc0c0 ffff88046fd03cc8 ffffffffa023c8c5
[23738.252668] ffffffff81cdaf00 0000000000000001 ffff880462a54244 ffff88045d1c4100
[23738.263818] Call Trace:
[23738.269200] <IRQ>
[23738.269264]
[23738.274454] [<ffffffffa023d
[23738.279737] [<ffffffffa023c
[23738.290071] [<ffffffffa0242
[23738.295339] [<ffffffffa0243
[23738.300513] [<ffffffffa0206
[23738.305587] [<ffffffffa0206
[23738.310531] [<ffffffff81666
[23738.315420] [<ffffffff81666
[23738.320146] [<ffffffff81665
[23738.324743] [<ffffffff81666
[23738.329244] [<ffffffff8162f
[23738.333744] [<ffffffff8162f
[23738.338158] [<ffffffff81630
[23738.342576] [<ffffffff8162f
[23738.347025] [<ffffffff8106f
[23738.351463] [<ffffffff81070
[23738.355804] [<ffffffff81740
[23738.360010] [<ffffffff81735
[23738.364183] <EOI>
[23738.364246]
[23738.368280] [<ffffffff815dc
[23738.372372] [<ffffffff815dc
[23738.376347] [<ffffffff815dc
[23738.380212] [<ffffffff8101e
[23738.383958] [<ffffffff810c2
[23738.387612] [<ffffffff81042
[23738.391156] Code: a0 e8 8c 66 e2 e0 c6 05 5d 31 00 00 01 eb 11 48 89 d0 8b 16 31 f6 48 8b 38 e8 a4 50 41 e1 eb 05 b8 ea ff ff ff 5d c3 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 55 48 89 e5 0f 0b 55 48 89 e5 0f 0b 00 00
[23738.402433] RIP [<ffffffffa0244
[23738.406297] RSP <ffff88046fd03bb0>
#######
Hi all, it looks like the patch I referenced above is indeed aimed at the openvswitch kernel module, should have looked more closely at the outset, so this bug really belongs with ubuntu-kernel, and I believe, specifically the pre-DKMS openvswitch kernel module.
Looking into the ubuntu kernel source for /net/openvswitc h/vport- gre.c /github. com/Canonical- kernel/ Ubuntu- kernel/ blob/master/ net/openvswitch /vport- gre.c
https:/
The patch mentioned above at patchwork is not present.
I am not familiar with the upstream kernel process. looking into it.