Activity log for bug #1922672

Date Who What changed Old value New value Message
2021-04-06 08:50:15 Roi Dayan bug added bug
2021-04-09 13:25:15 Stefan Bader nominated for series Ubuntu Focal
2021-04-09 13:25:15 Stefan Bader bug task added linux-bluefield (Ubuntu Focal)
2021-04-09 13:25:46 Stefan Bader linux-bluefield (Ubuntu Focal): importance Undecided High
2021-04-09 13:25:46 Stefan Bader linux-bluefield (Ubuntu Focal): status New Triaged
2021-04-09 13:25:56 Stefan Bader linux-bluefield (Ubuntu): status New Invalid
2021-04-20 14:01:43 Stefan Bader description Configuring CT offload with OVS and running stress http traffic that opens conns, send short data and close the conns. there is a race that could potentially crash the system. X86 side: /etc/init.d/openibd restart ifconfig $1 up ifconfig $2 up tc qdisc del dev $1 ingress tc qdisc del dev $2 ingress sleep 5 tc qdisc add dev $1 ingress tc qdisc add dev $2 ingress tc filter add dev $1 protocol all parent ffff: flower action mirred egress redirect dev $2 tc filter add dev $2 protocol all parent ffff: flower action mirred egress redirect dev $1 ip l set dev $1 promisc on ip l set dev $2 promisc on arm side: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true service openvswitch restart for br in `ovs-vsctl list-br`; do ovs-vsctl del-br $br done ovs-vsctl add-br ovsbr1 ovs-vsctl add-port ovsbr1 p0 ovs-vsctl add-port ovsbr1 pf0hpf ovs-vsctl add-br ovsbr2 ovs-vsctl add-port ovsbr2 p1 ovs-vsctl add-port ovsbr2 pf1hpf ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 arp,actions=normal ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+new actions=ct(, commit),normal" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+est actions=normal" # ovs-vsctl show 9b68adbd-406b-4f72-8b4c-312d9379b8b9 Bridge ovsbr2 Port ovsbr2 Interface ovsbr2 type: internal Port pf1hpf Interface pf1hpf Port p1 Interface p1 Bridge ovsbr1 Port p0 Interface p0 Port ovsbr1 Interface ovsbr1 type: internal Port pf0hpf Interface pf0hpf ovs_version: "2.14.1" dmesg: 1285.179728] Failed to associated timeout policy `ovs_test_tp' [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c [ 1587.430043] Mem abort info: [ 1587.432929] ESR = 0x96000004 [ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits [ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0 ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0 t virtual address 000000000000004[ 1587.453188] Data abort info: c [ 1587.458924] ISV = 0, ISS = 0x00000004 [ 1587.462977] CM = 0, WnR = 0 [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 [ 1587.472420] [000000000000004c] pgd=0000000000000000 [ 1587.430043] Mem abort info: [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk [ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c [ 1587.589851] Workqueue: events rht_deferred_worker [ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) (current EL), IL = 32 bits [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 7.441377] SET = 0, FnV = 0 [ 1587.447279] EA = 0, S1PTW = 0 [ 1587.453188] Data abort info: [ 1587.458924] ISV = 0, ISS = 0x00000004 [ 1587.462977] CM = 0, WnR = 0 [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 [ 1587.472420] [000000000000004c] pgd=0000000000000000 [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c [ 1587.589851] Workqueue: events rht_deferred_worker [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 [ 1587.589862] sp : ffff800013ebbcf0 [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 [ 1587.598798] Mem abort info: [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 [ 1587.589862] sp : ffff800013ebbcf0 [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 [ 1587.598798] Mem abort info: [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.911485] SMP: stopping secondary CPUs [ 1587.911541] Kernel Offset: disabled [ 1587.911545] CPU features: 0x0002,20006008 [ 1587.911547] Memory Limit: none [ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- [SRU Justification] = Impact = A potential race between cancelling offloaded traffic timeouts on busy systems and those timeouts triggering could potentially crash the system. = Fix = Picking a patch (and its pre-req which just moves code from local code into a header) that sets sufficiently large timeout values to prevent those from accidentally triggering will solve the problem. = Testcase = See original description below. = Regression Potential = If those large timeouts never happen (from the code description those are set to days) and are not stopped by the offload functions, this could lead to stuck traffic and possibly running out of buffers/memory. --- original description --- Configuring CT offload with OVS and running stress http traffic that opens conns, send short data and close the conns. there is a race that could potentially crash the system. X86 side: /etc/init.d/openibd restart ifconfig $1 up ifconfig $2 up tc qdisc del dev $1 ingress tc qdisc del dev $2 ingress sleep 5 tc qdisc add dev $1 ingress tc qdisc add dev $2 ingress tc filter add dev $1 protocol all parent ffff: flower action mirred egress redirect dev $2 tc filter add dev $2 protocol all parent ffff: flower action mirred egress redirect dev $1 ip l set dev $1 promisc on ip l set dev $2 promisc on arm side: ovs-vsctl set Open_vSwitch . other_config:hw-offload=true service openvswitch restart for br in `ovs-vsctl list-br`; do         ovs-vsctl del-br $br done ovs-vsctl add-br ovsbr1 ovs-vsctl add-port ovsbr1 p0 ovs-vsctl add-port ovsbr1 pf0hpf ovs-vsctl add-br ovsbr2 ovs-vsctl add-port ovsbr2 p1 ovs-vsctl add-port ovsbr2 pf1hpf ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 arp,actions=normal ovs-ofctl add-flow ovsbr1 "table=0, ip,ct_state=-trk actions=ct(table=1)" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+new actions=ct(, commit),normal" ovs-ofctl add-flow ovsbr1 "table=1, ip,ct_state=+trk+est actions=normal" # ovs-vsctl show 9b68adbd-406b-4f72-8b4c-312d9379b8b9     Bridge ovsbr2         Port ovsbr2             Interface ovsbr2                 type: internal         Port pf1hpf             Interface pf1hpf         Port p1             Interface p1     Bridge ovsbr1         Port p0             Interface p0         Port ovsbr1             Interface ovsbr1                 type: internal         Port pf0hpf             Interface pf0hpf     ovs_version: "2.14.1"  dmesg:  1285.179728] Failed to associated timeout policy `ovs_test_tp' [ 1587.421221] Unable to handle kernel NULL pointer dereference at virtual address 000000000000004c [ 1587.430043] Mem abort info: [ 1587.432929] ESR = 0x96000004 [ 1587.436025] EC = 0x25: DABT (current EL), IL = 32 bits [ 1587.421221] Unable to handle k[ 1587.441377] SET = 0, FnV = 0 ernel NULL pointer dereference a[ 1587.447279] EA = 0, S1PTW = 0 t virtual address 000000000000004[ 1587.453188] Data abort info: c [ 1587.458924] ISV = 0, ISS = 0x00000004 [ 1587.462977] CM = 0, WnR = 0 [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 [ 1587.472420] [000000000000004c] pgd=0000000000000000 [ 1587.430043] Mem abort info: [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk [ 1587.432929] ESR = 0x96000004[ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c [ 1587.589851] Workqueue: events rht_deferred_worker [ 1587.436025] EC = 0x25: DABT [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) (current EL), IL = 32 bits [ 158[ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 7.441377] SET = 0, FnV = 0 [ 1587.447279] EA = 0, S1PTW = 0 [ 1587.453188] Data abort info: [ 1587.458924] ISV = 0, ISS = 0x00000004 [ 1587.462977] CM = 0, WnR = 0 [ 1587.465945] user pgtable: 4k pages, 48-bit VAs, pgdp=00000003cc276000 [ 1587.472420] [000000000000004c] pgd=0000000000000000 [ 1587.477324] Internal error: Oops: 96000004 [#1] PREEMPT SMP [ 1587.485641] Modules linked in: act_mirred act_skbedit xt_conntrack xt_MASQUERADE nf_conntrack_netlink xt_addrtype iptable_filter iptable_nat bpfilter br_netfilter bridge xfrm_user xfrm_algo target_core_mod 8021q garp stp mrp llc ovk [ 1587.566060] CPU: 2 PID: 2212 Comm: kworker/2:3 Tainted: G OE 5.4.0-1007-bluefield #10-Ubuntu [ 1587.578523] Hardware name: https://www.mellanox.com BlueField SoC/BlueField SoC, BIOS BlueField:3.6.0-17-gd17a51a Mar 5 2021 [ 1587.579483] Unable to handle kernel NULL pointer dereference at virtual address 000000000000006c [ 1587.589851] Workqueue: events rht_deferred_worker [ 1587.589856] pstate: 80000005 (Nzcv daif -PAN -UAO) [ 1587.589859] pc : rhashtable_rehash_table+0xfc/0x410 [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 [ 1587.589862] sp : ffff800013ebbcf0 [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 [ 1587.598798] Mem abort info: [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.589861] lr : rhashtable_rehash_table+0x2e0/0x410 [ 1587.589862] sp : ffff800013ebbcf0 [ 1587.589864] x29: ffff800013ebbcf0 x28: ffff0002cd8138b0 [ 1587.589866] x27: 000000000000004c x26: ffff0002d3c00000 [ 1587.589873] x25: ffff0002cd8138b1 x24: ffff0002cd800000 [ 1587.598798] Mem abort info: [ 1587.603483] x23: ffff0003ebbf5700 x22: 000000000000270e [ 1587.603486] x21: 000000000000004c x20: ffff00030cc8f400 [ 1587.603487] x19: ffff0002fabf5a28 x18: ffff800008fe8000 [ 1587.603489] x17: 000000002610869a x16: 00000000e3e01d27 [ 1587.603491] x15: 0000000000000000 x14: 06000000a46b5000 [ 1587.603492] x13: 4301001003000030 x12: 0000040000000000 [ 1587.603494] x11: 0000000000000000 x10: 0000000000000001 [ 1587.603496] x9 : 0000000020000000 x8 : 0000000000000000 [ 1587.603497] x7 : 0000000000000001 x6 : ffff0002d3c6c301 [ 1587.603499] x5 : ffff0002d3c00040 x4 : ffff00030cc8f400 [ 1587.603501] x3 : 00000000000138b0 x2 : ffff00030cc8f401 [ 1587.603503] x1 : ffff00030cc8f400 x0 : 0000000000000000 [ 1587.603505] Call trace: [ 1587.603515] rhashtable_rehash_table+0xfc/0x410 [ 1587.603517] rht_deferred_worker+0x18c/0x298 [ 1587.603523] process_one_work+0x1c4/0x480 [ 1587.603531] worker_thread+0x54/0x430 [ 1587.603533] kthread+0x138/0x150 [ 1587.603537] ret_from_fork+0x10/0x1c [ 1587.603542] Code: d2800014 14000003 aa1b03f4 aa1503fb (f9400375) [ 1587.603554] ---[ end trace 8b876994a5c4b259 ]--- [ 1587.603558] Kernel panic - not syncing: Fatal exception in interrupt [ 1587.611162] ESR = 0x96000004 [ 1587.911485] SMP: stopping secondary CPUs [ 1587.911541] Kernel Offset: disabled [ 1587.911545] CPU features: 0x0002,20006008 [ 1587.911547] Memory Limit: none [ 1588.062206] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]---
2021-04-20 14:02:11 Stefan Bader linux-bluefield (Ubuntu Focal): status Triaged In Progress
2021-04-20 14:02:11 Stefan Bader linux-bluefield (Ubuntu Focal): assignee Roi Dayan (roidayan)
2021-04-20 14:34:22 Stefan Bader linux-bluefield (Ubuntu Focal): status In Progress Fix Committed
2021-04-24 00:28:30 Ubuntu Kernel Bot tags verification-needed-focal
2021-04-28 12:54:34 Roi Dayan tags verification-needed-focal verification-done-focal
2021-05-10 19:53:32 Launchpad Janitor linux-bluefield (Ubuntu Focal): status Fix Committed Fix Released
2021-05-10 19:53:32 Launchpad Janitor cve linked 2021-29650