kernel crash with stress CT offload traffic
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux-bluefield (Ubuntu) |
Invalid
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
High
|
Roi Dayan |
Bug Description
[SRU Justification]
= Impact =
A potential race between cancelling offloaded traffic timeouts on busy systems and those timeouts triggering could potentially crash the system.
= Fix =
Picking a patch (and its pre-req which just moves code from local code into a header) that sets sufficiently large timeout values to prevent those from accidentally triggering will solve the problem.
= Testcase =
See original description below.
= Regression Potential =
If those large timeouts never happen (from the code description those are set to days) and are not stopped by the offload functions, this could lead to stuck traffic and possibly running out of buffers/memory.
--- original description ---
Configuring CT offload with OVS and running stress http traffic that opens conns, send short data and close the conns. there is a race that could potentially crash the system.
X86 side:
/etc/init.d/openibd restart
ifconfig $1 up
ifconfig $2 up
tc qdisc del dev $1 ingress
tc qdisc del dev $2 ingress
sleep 5
tc qdisc add dev $1 ingress
tc qdisc add dev $2 ingress
tc filter add dev $1 protocol all parent ffff: flower action mirred egress redirect dev $2
tc filter add dev $2 protocol all parent ffff: flower action mirred egress redirect dev $1
ip l set dev $1 promisc on
ip l set dev $2 promisc on
arm side:
ovs-vsctl set Open_vSwitch . other_config:
service openvswitch restart
for br in `ovs-vsctl list-br`;
do
ovs-vsctl del-br $br
done
ovs-vsctl add-br ovsbr1
ovs-vsctl add-port ovsbr1 p0
ovs-vsctl add-port ovsbr1 pf0hpf
ovs-vsctl add-br ovsbr2
ovs-vsctl add-port ovsbr2 p1
ovs-vsctl add-port ovsbr2 pf1hpf
ovs-ofctl del-flows ovsbr1
ovs-ofctl add-flow ovsbr1 arp,actions=normal
ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=
ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=
ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=
# ovs-vsctl show
9b68adbd-
Bridge ovsbr2
Port ovsbr2
Port pf1hpf
Port p1
Bridge ovsbr1
Port p0
Port ovsbr1
Port pf0hpf
ovs_version: "2.14.1"
dmesg:
1285.179728] Failed to associated timeout policy `ovs_test_tp'
[ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c
[ 1587.430043] Mem abort info:
[ 1587.432929] ESR = 0x96000004
[ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0
ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0
t virtual address 000000000000004[ 1587.453188] Data abort info:
c
[ 1587.458924] ISV = 0, ISS = 0x00000004
[ 1587.462977] CM = 0, WnR = 0
[ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc
[ 1587.472420] [000000000000004c] pgd=00000000000
[ 1587.430043] Mem abort info:
[ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_
[ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-
[ 1587.578523] Hardware name: https:/
[ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c
[ 1587.589851] Workqueue: events rht_deferred_worker
[ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)
(current EL), IL = 32 bits
[ 158[ 1587.589859] pc : rhashtable_
7.441377] SET = 0, FnV = 0
[ 1587.447279] EA = 0, S1PTW = 0
[ 1587.453188] Data abort info:
[ 1587.458924] ISV = 0, ISS = 0x00000004
[ 1587.462977] CM = 0, WnR = 0
[ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc
[ 1587.472420] [000000000000004c] pgd=00000000000
[ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_
[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-
[ 1587.578523] Hardware name: https:/
[ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c
[ 1587.589851] Workqueue: events rht_deferred_worker
[ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 1587.589859] pc : rhashtable_
[ 1587.589861] lr : rhashtable_
[ 1587.589862] sp : ffff800013ebbcf0
[ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
[ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
[ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
[ 1587.598798] Mem abort info:
[ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
[ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
[ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
[ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
[ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
[ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
[ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
[ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
[ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
[ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
[ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
[ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
[ 1587.603505] Call trace:
[ 1587.603515] rhashtable_
[ 1587.603517] rht_deferred_
[ 1587.603523] process_
[ 1587.603531] worker_
[ 1587.603533] kthread+0x138/0x150
[ 1587.603537] ret_from_
[ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
[ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
[ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
[ 1587.611162] ESR = 0x96000004
[ 1587.589861] lr : rhashtable_
[ 1587.589862] sp : ffff800013ebbcf0
[ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
[ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
[ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
[ 1587.598798] Mem abort info:
[ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
[ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
[ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
[ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
[ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
[ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
[ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
[ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
[ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
[ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
[ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
[ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
[ 1587.603505] Call trace:
[ 1587.603515] rhashtable_
[ 1587.603517] rht_deferred_
[ 1587.603523] process_
[ 1587.603531] worker_
[ 1587.603533] kthread+0x138/0x150
[ 1587.603537] ret_from_
[ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
[ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
[ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
[ 1587.611162] ESR = 0x96000004
[ 1587.911485] SMP: stopping secondary CPUs
[ 1587.911541] Kernel Offset: disabled
[ 1587.911545] CPU features: 0x0002,20006008
[ 1587.911547] Memory Limit: none
[ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
CVE References
Changed in linux-bluefield (Ubuntu Focal): | |
importance: | Undecided → High |
status: | New → Triaged |
Changed in linux-bluefield (Ubuntu): | |
status: | New → Invalid |
description: | updated |
Changed in linux-bluefield (Ubuntu Focal): | |
assignee: | nobody → Roi Dayan (roidayan) |
status: | Triaged → In Progress |
Changed in linux-bluefield (Ubuntu Focal): | |
status: | In Progress → Fix Committed |
tags: |
added: verification-done-focal removed: verification-needed-focal |
there is already a patch in upstream kernel solving this. was tested. will submit.
07f8edbfd279 netfilter: flowtable: Set offload timeout when adding flow