kernel crash with stress CT offload traffic

Bug #1922672 reported by Roi Dayan on 2021-04-06
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux-bluefield (Ubuntu)
Undecided
Unassigned
Focal
High
Unassigned

Bug Description

Configuring CT offload with OVS and running stress http traffic that opens conns, send short data and close the conns. there is a race that could potentially crash the system.

X86 side:

/etc/init.d/openibd restart

ifconfig $1 up
ifconfig $2 up

tc qdisc del dev $1 ingress
tc qdisc del dev $2 ingress

sleep 5

tc qdisc add dev $1 ingress
tc qdisc add dev $2 ingress

tc filter add dev $1 protocol all parent ffff: flower action mirred egress redirect dev $2
tc filter add dev $2 protocol all parent ffff: flower action mirred egress redirect dev $1

ip l set dev $1 promisc on
ip l set dev $2 promisc on
arm side:

ovs-vsctl set Open_vSwitch . other_config:hw-offload=true

service openvswitch restart

for br in `ovs-vsctl list-br`;
do
        ovs-vsctl del-br $br
done

ovs-vsctl add-br ovsbr1
ovs-vsctl add-port ovsbr1 p0
ovs-vsctl add-port ovsbr1 pf0hpf

ovs-vsctl add-br ovsbr2
ovs-vsctl add-port ovsbr2 p1
ovs-vsctl add-port ovsbr2 pf1hpf

ovs-ofctl del-flows ovsbr1
ovs-ofctl add-flow ovsbr1 arp,actions=normal
ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=ct(table=1)"
ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+new actions=ct(, commit),normal"
ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+est actions=normal"

# ovs-vsctl show
9b68adbd-406b-4f72-8b4c-312d9379b8b9
    Bridge ovsbr2
        Port ovsbr2
            Interface ovsbr2
                type: internal
        Port pf1hpf
            Interface pf1hpf
        Port p1
            Interface p1
    Bridge ovsbr1
        Port p0
            Interface p0
        Port ovsbr1
            Interface ovsbr1
                type: internal
        Port pf0hpf
            Interface pf0hpf
    ovs_version: "2.14.1"
 dmesg:

 1285.179728] Failed to associated timeout policy `ovs_test_tp'
[ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c
[ 1587.430043] Mem abort info:
[ 1587.432929] ESR = 0x96000004
[ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits
[ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0
ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0
t virtual address 000000000000004[ 1587.453188] Data abort info:
c
[ 1587.458924] ISV = 0, ISS = 0x00000004
[ 1587.462977] CM = 0, WnR = 0
[ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000
[ 1587.472420] [000000000000004c] pgd=0000000000000000
[ 1587.430043] Mem abort info:
[ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk
[ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu
[ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021

[ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c
[ 1587.589851] Workqueue: events rht_deferred_worker
[ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)
(current EL), IL = 32 bits
[ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410
7.441377] SET = 0, FnV = 0
[ 1587.447279] EA = 0, S1PTW = 0
[ 1587.453188] Data abort info:
[ 1587.458924] ISV = 0, ISS = 0x00000004
[ 1587.462977] CM = 0, WnR = 0
[ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000
[ 1587.472420] [000000000000004c] pgd=0000000000000000
[ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP
[ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk
[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu
[ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021
[ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c
[ 1587.589851] Workqueue: events rht_deferred_worker
[ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO)
[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410
[ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410
[ 1587.589862] sp : ffff800013ebbcf0
[ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
[ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
[ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
[ 1587.598798] Mem abort info:
[ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
[ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
[ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
[ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
[ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
[ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
[ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
[ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
[ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
[ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
[ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
[ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
[ 1587.603505] Call trace:
[ 1587.603515] rhashtable_rehash_table+0xfc/0x410
[ 1587.603517] rht_deferred_worker+0x18c/0x298
[ 1587.603523] process_one_work+0x1c4/0x480
[ 1587.603531] worker_thread+0x54/0x430
[ 1587.603533] kthread+0x138/0x150
[ 1587.603537] ret_from_fork+0x10/0x1c
[ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
[ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
[ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
[ 1587.611162] ESR = 0x96000004
[ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410
[ 1587.589862] sp : ffff800013ebbcf0
[ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0
[ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000
[ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000
[ 1587.598798] Mem abort info:
[ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e
[ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400
[ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000
[ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27
[ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000
[ 1587.603492] x13: 4301001003000030 x12: 0000040000000000
[ 1587.603494] x11: 0000000000000000 x10: 0000000000000001
[ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000
[ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301
[ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400
[ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401
[ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000
[ 1587.603505] Call trace:
[ 1587.603515] rhashtable_rehash_table+0xfc/0x410
[ 1587.603517] rht_deferred_worker+0x18c/0x298
[ 1587.603523] process_one_work+0x1c4/0x480
[ 1587.603531] worker_thread+0x54/0x430
[ 1587.603533] kthread+0x138/0x150
[ 1587.603537] ret_from_fork+0x10/0x1c
[ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375)
[ 1587.603554] ---[ end trace 8b876994a5c4b259 ]---
[ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt
[ 1587.611162] ESR = 0x96000004
[ 1587.911485] SMP: stopping secondary CPUs
[ 1587.911541] Kernel Offset: disabled
[ 1587.911545] CPU features: 0x0002,20006008
[ 1587.911547] Memory Limit: none
[ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---

Roi Dayan (roidayan) wrote :

there is already a patch in upstream kernel solving this. was tested. will submit.
07f8edbfd279 netfilter: flowtable: Set offload timeout when adding flow

Stefan Bader (smb) on 2021-04-09
Changed in linux-bluefield (Ubuntu Focal):
importance: Undecided → High
status: New → Triaged
Changed in linux-bluefield (Ubuntu):
status: New → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers