vmxnet3 driver could causes kernel panic with v4.4 if LRO enabled.
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Eric Desrochers |
Bug Description
[Impact]
It has been brought to my attention that a Trusty Vmware Virtual Machine running kernel v4.4.0-36 crashed with the following stacktrace :
PANIC: "kernel BUG at /build/
...
#0 [ffff88042d683aa0] machine_kexec at ffffffff8105987c
#1 [ffff88042d683af8] crash_kexec at ffffffff81105d23
#2 [ffff88042d683bc0] oops_end at ffffffff81030a79
#3 [ffff88042d683be8] die at ffffffff81030f7b
#4 [ffff88042d683c18] do_trap at ffffffff8102e04d
#5 [ffff88042d683c68] do_error_trap at ffffffff8102e5a7
#6 [ffff88042d683d20] do_invalid_op at ffffffff8102e840
#7 [ffff88042d683d30] invalid_op at ffffffff817f900e
[exception RIP: vmxnet3_
RIP: ffffffffc004e448 RSP: ffff88042d683de8 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff880424099668 RCX: 0000000000000000
RDX: 00000000000005f2 RSI: 00000000000005f2 RDI: ffff88042a61f400
RBP: ffff88042d683e50 R8: 0000000000000000 R9: 0000000000000000
R10: ffff88042902b470 R11: ffff8804293406a8 R12: ffff880424098840
R13: ffff880424099580 R14: ffff88042a61ec00 R15: ffff88042933ae00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#8 [ffff88042d683de0] vmxnet3_
#9 [ffff88042d683e58] vmxnet3_
#10 [ffff88042d683e90] net_rx_action at ffffffff816f3544
#11 [ffff88042d683f00] __do_softirq at ffffffff81081e7d
#12 [ffff88042d683f68] irq_exit at ffffffff81082255
#13 [ffff88042d683f78] do_IRQ at ffffffff817f9ee6
--- <IRQ stack> ---
#14 [ffff880426c73f30] ret_from_intr at ffffffff817f7fc2
[exception RIP: unknown or invalid address]
RIP: fffffffffffffffb RSP: 00007fe17e59bf48 RFLAGS: 00000001
RAX: 00007fe18564ed58 RBX: 00007fe2064ce848 RCX: 00007fe185612d60
RDX: 00007fe20b47eb30 RSI: 00007fe185640d38 RDI: 00007fe18564ed50
RBP: ffffffff817f7fe5 R8: 00007fe185100068 R9: 0000000000037ce0
R10: 0000000000134ad8 R11: 00007fe17e4b7028 R12: 00007fe185100068
R13: 00007fe185632380 R14: 0000000000000000 R15: ffffffff81003a64
ORIG_RAX: 0000000000000001 CS: 7fe185640d38 SS: ffffffffffffff91
bt: WARNING: possibly bogus exception frame
RIP: 00000000004e92bb RSP: 00007fe20b47ea40 RFLAGS: 00000283
RAX: 0000000000000001 RBX: 00007fe18564ed58 RCX: fffffffffffffffb
RDX: 00007fe185640d38 RSI: 0000000000000001 RDI: 00007fe17e59bf48
RBP: 00007fe185100068 R8: 00007fe18564ed50 R9: 00007fe185640d38
R10: 00007fe20b47eb30 R11: 00007fe185612d60 R12: 0000000000037ce0
R13: 0000000000134ad8 R14: 00007fe17e4b7028 R15: 00007fe2064ce848
ORIG_RAX: ffffffffffffff91 CS: 0033 SS: 002b
[Test Case]
* There is no real reproducer, the problem occurred randomly if SegCnt == 1 on a Trusty VMware Virtual Machine using Xenial kernel with LRO enabled in the VMware environment.
[Regression Potential]
* none expected
* Commit can be found in upstream linux stable
* Yakkety and Zesty kernel has the patch already
[Other Info]
* Upstream commit :
5021953 vmxnet3: segCnt can be 1 for LRO packets
[Original Description]
It has been brought to my attention that a Trusty Vmware Virtual Machine running kernel v4.4.0-36 crashed with the following stacktrace :
PANIC: "kernel BUG at /build/
...
#0 [ffff88042d683aa0] machine_kexec at ffffffff8105987c
#1 [ffff88042d683af8] crash_kexec at ffffffff81105d23
#2 [ffff88042d683bc0] oops_end at ffffffff81030a79
#3 [ffff88042d683be8] die at ffffffff81030f7b
#4 [ffff88042d683c18] do_trap at ffffffff8102e04d
#5 [ffff88042d683c68] do_error_trap at ffffffff8102e5a7
#6 [ffff88042d683d20] do_invalid_op at ffffffff8102e840
#7 [ffff88042d683d30] invalid_op at ffffffff817f900e
[exception RIP: vmxnet3_
RIP: ffffffffc004e448 RSP: ffff88042d683de8 RFLAGS: 00010246
RAX: 0000000000000001 RBX: ffff880424099668 RCX: 0000000000000000
RDX: 00000000000005f2 RSI: 00000000000005f2 RDI: ffff88042a61f400
RBP: ffff88042d683e50 R8: 0000000000000000 R9: 0000000000000000
R10: ffff88042902b470 R11: ffff8804293406a8 R12: ffff880424098840
R13: ffff880424099580 R14: ffff88042a61ec00 R15: ffff88042933ae00
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0000
#8 [ffff88042d683de0] vmxnet3_
#9 [ffff88042d683e58] vmxnet3_
#10 [ffff88042d683e90] net_rx_action at ffffffff816f3544
#11 [ffff88042d683f00] __do_softirq at ffffffff81081e7d
#12 [ffff88042d683f68] irq_exit at ffffffff81082255
#13 [ffff88042d683f78] do_IRQ at ffffffff817f9ee6
--- <IRQ stack> ---
#14 [ffff880426c73f30] ret_from_intr at ffffffff817f7fc2
[exception RIP: unknown or invalid address]
RIP: fffffffffffffffb RSP: 00007fe17e59bf48 RFLAGS: 00000001
RAX: 00007fe18564ed58 RBX: 00007fe2064ce848 RCX: 00007fe185612d60
RDX: 00007fe20b47eb30 RSI: 00007fe185640d38 RDI: 00007fe18564ed50
RBP: ffffffff817f7fe5 R8: 00007fe185100068 R9: 0000000000037ce0
R10: 0000000000134ad8 R11: 00007fe17e4b7028 R12: 00007fe185100068
R13: 00007fe185632380 R14: 0000000000000000 R15: ffffffff81003a64
ORIG_RAX: 0000000000000001 CS: 7fe185640d38 SS: ffffffffffffff91
bt: WARNING: possibly bogus exception frame
RIP: 00000000004e92bb RSP: 00007fe20b47ea40 RFLAGS: 00000283
RAX: 0000000000000001 RBX: 00007fe18564ed58 RCX: fffffffffffffffb
RDX: 00007fe185640d38 RSI: 0000000000000001 RDI: 00007fe17e59bf48
RBP: 00007fe185100068 R8: 00007fe18564ed50 R9: 00007fe185640d38
R10: 00007fe20b47eb30 R11: 00007fe185612d60 R12: 0000000000037ce0
R13: 0000000000134ad8 R14: 00007fe17e4b7028 R15: 00007fe2064ce848
ORIG_RAX: ffffffffffffff91 CS: 0033 SS: 002b
Changed in linux (Ubuntu): | |
status: | Confirmed → In Progress |
summary: |
- vmxnet3 driver causes kernel panic w/ kernel v4.4 + vmxnet3 driver could causes kernel panic with v4.4 if LRO enabled. |
Changed in linux (Ubuntu Xenial): | |
status: | New → In Progress |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Xenial): | |
assignee: | nobody → Eric Desrochers (slashd) |
Changed in linux (Ubuntu): | |
assignee: | Eric Desrochers (slashd) → nobody |
status: | In Progress → Fix Released |
description: | updated |
description: | updated |
Changed in linux (Ubuntu Xenial): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-xenial removed: verification-needed-xenial |
Note that the affected system has LRO turn on.
The system crashed on :
#7 [ffff88042d683d30] invalid_op at ffffffff817f900e rq_rx_complete+ 3016]
[exception RIP: vmxnet3_
which is referring to line 1353 in "drivers/ net/vmxnet3/ vmxnet3_ drv.c" :
0xffffffffc004e448 is in vmxnet3_ rq_rx_complete (drivers/ net/vmxnet3/ vmxnet3_ drv.c:1353) . CDTYPE_ RXCOMP_ LRO) { RxCompDescExt *rcdlro; RxCompDescExt *)rcd;
1348 rcd->type == VMXNET3_
1349 struct Vmxnet3_
1350 rcdlro = (struct Vmxnet3_
1351
1352 segCnt = rcdlro->segCnt;
==> 1353 BUG_ON(segCnt <= 1);
1354 mss = rcdlro->mss;
1355 if (unlikely(segCnt <= 1))
1356 segCnt = 0;
1357 } else {
BUG_ON(condition) are used as a debugging help when something in the kernel goes wrong.
The condition here execute BUG_ON if SegCnt is less or equal than (<=) 1.
SegCnt being the "Number of aggregated packets" :
# drivers/ net/vmxnet3/ vmxnet3_ defs.h
u8 segCnt; /* Number of aggregated packets */
Looking at the crashdump I can confirm that at the moment of the crash SegCnt was set to 1 :
crash> * Vmxnet3_ RxCompDescExt. segCnt ffff88042933ae00
segCnt = 1 '\001'
According to commit "50219538ffc049 3a2b451a3aa0191 138ef8bfe9d" , segCnt can be 1 for LRO packets and introduce the following change :
- BUG_ON(segCnt <= 1);
+ WARN_ON_ONCE(segCnt == 0);
[2] - commit 50219538ffc0493 a2b451a3aa01911 38ef8bfe9d
--
Author: Shrikrishna Khare <email address hidden>
Date: Wed Jun 8 07:40:53 2016 -0700
vmxnet3: segCnt can be 1 for LRO packets
The device emulation may send segCnt of 1 for LRO packets.
Signed-off-by: Shrikrishna Khare <email address hidden>
---