Activity log for bug #1799393

Date Who What changed Old value New value Message
2018-10-23 09:39:25 bugproxy bug added bug
2018-10-23 09:39:27 bugproxy tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin---
2018-10-23 09:39:28 bugproxy ubuntu: assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
2018-10-23 09:39:30 bugproxy affects ubuntu linux (Ubuntu)
2018-10-23 09:50:07 Frank Heimes bug task added ubuntu-power-systems
2018-10-23 09:50:23 Frank Heimes ubuntu-power-systems: importance Undecided Critical
2018-10-23 09:50:37 Frank Heimes ubuntu-power-systems: assignee Canonical Kernel Team (canonical-kernel-team)
2018-10-23 18:30:32 Joseph Salisbury linux (Ubuntu): importance Undecided Critical
2018-10-23 18:30:38 Joseph Salisbury linux (Ubuntu): assignee Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) Joseph Salisbury (jsalisbury)
2018-10-23 18:30:42 Joseph Salisbury linux (Ubuntu): status New In Progress
2018-10-23 19:22:06 Frank Heimes ubuntu-power-systems: status New In Progress
2018-10-31 18:00:59 Joseph Salisbury nominated for series Ubuntu Cosmic
2018-10-31 18:00:59 Joseph Salisbury bug task added linux (Ubuntu Cosmic)
2018-10-31 18:01:06 Joseph Salisbury linux (Ubuntu Cosmic): status New In Progress
2018-10-31 18:01:09 Joseph Salisbury linux (Ubuntu Cosmic): importance Undecided Critical
2018-10-31 18:01:11 Joseph Salisbury linux (Ubuntu Cosmic): assignee Joseph Salisbury (jsalisbury)
2018-10-31 18:05:19 Joseph Salisbury description == Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 == ---Problem Description--- At the system if u do ethtool -S enP48p1s0f0 | grep wqe_err rx_wqe_err: 1 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 1 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 rx11_wqe_err: 0 rx12_wqe_err: 0 rx13_wqe_err: 0 rx14_wqe_err: 0 rx15_wqe_err: 0 Will see that rx side is hitting issue. ---Additional Hardware Info--- Mellanox CX5 Ethernet 100G lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Machine Type = P9 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Using a CX5 Ethernet 100G card lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] just configure IP ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up then partner system configure IP and then try ping -f ping -f 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ........................................^C --- 33.33.33.33 ping statistics --- 5413 packets transmitted, 5373 received, 0% packet loss, time 934ms rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms # ping 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ^C --- 33.33.33.33 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1071ms then at the recv system then do ethtool -S enP48p1s0f0 | grep wqe_err rx_wqe_err: 1 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 1 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 rx11_wqe_err: 0 rx12_wqe_err: 0 rx13_wqe_err: 0 rx14_wqe_err: 0 rx15_wqe_err: 0 you will see rx_wqe_err with a counter non-zero. This is fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0 == Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 == I did a git clone to the cosmic tree and loaded the kernel in a system. kernel 4.18.12 and I can recreate it. lspci | grep Mell | grep ConnectX-5 0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] :~# ethtool -S enp1s0f0 | grep wqe_err rx_wqe_err: 2 rx0_wqe_err: 1 rx1_wqe_err: 1 rx2_wqe_err: 0 rx3_wqe_err: 0 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 ... Let me check if the proposed patch needs backport or not. == Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 == I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport) using a kernel 4.18.12+. With the proposed patch I do not see wqe err and ping does not stop. ethtool -S enp1s0f0 | grep wqe_err rx_wqe_err: 0 rx0_wqe_err: 0 rx1_wqe_err: 0 rx2_wqe_err: 0 rx3_wqe_err: 0 rx4_wqe_err: 0 rx5_wqe_err: 0 rx6_wqe_err: 0 rx7_wqe_err: 0 rx8_wqe_err: 0 rx9_wqe_err: 0 rx10_wqe_err: 0 ... == SRU Justification == The requested commit fixes a regression introduce by mainline commit 3a2f70331226, in v4.18-rc1. The commit is only needed in Cosmic. Do to the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core) == Fix == 37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API") == Regression Potential == Low. This commit has been cc'd to stable, so it has had additional upstream review. == Test Case == A test kernel was built with this patch and tested by the original bug reporter. The bug reporter states the test kernel resolved the bug. == Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 == ---Problem Description--- At the system if u do ethtool -S enP48p1s0f0 | grep wqe_err      rx_wqe_err: 1      rx0_wqe_err: 0      rx1_wqe_err: 0      rx2_wqe_err: 0      rx3_wqe_err: 1      rx4_wqe_err: 0      rx5_wqe_err: 0      rx6_wqe_err: 0      rx7_wqe_err: 0      rx8_wqe_err: 0      rx9_wqe_err: 0      rx10_wqe_err: 0      rx11_wqe_err: 0      rx12_wqe_err: 0      rx13_wqe_err: 0      rx14_wqe_err: 0      rx15_wqe_err: 0 Will see that rx side is hitting issue. ---Additional Hardware Info--- Mellanox CX5 Ethernet 100G lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] Machine Type = P9 ---Debugger--- A debugger is not configured ---Steps to Reproduce--- Using a CX5 Ethernet 100G card lspci 0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] just configure IP ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up then partner system configure IP and then try ping -f ping -f 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ........................................^C --- 33.33.33.33 ping statistics --- 5413 packets transmitted, 5373 received, 0% packet loss, time 934ms rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms # ping 33.33.33.33 PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data. ^C --- 33.33.33.33 ping statistics --- 2 packets transmitted, 0 received, 100% packet loss, time 1071ms then at the recv system then do ethtool -S enP48p1s0f0 | grep wqe_err      rx_wqe_err: 1      rx0_wqe_err: 0      rx1_wqe_err: 0      rx2_wqe_err: 0      rx3_wqe_err: 1      rx4_wqe_err: 0      rx5_wqe_err: 0      rx6_wqe_err: 0      rx7_wqe_err: 0      rx8_wqe_err: 0      rx9_wqe_err: 0      rx10_wqe_err: 0      rx11_wqe_err: 0      rx12_wqe_err: 0      rx13_wqe_err: 0      rx14_wqe_err: 0      rx15_wqe_err: 0 you will see rx_wqe_err with a counter non-zero. This is fixed by this patch: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0 == Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 == I did a git clone to the cosmic tree and loaded the kernel in a system. kernel 4.18.12 and I can recreate it. lspci | grep Mell | grep ConnectX-5 0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] 0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex] :~# ethtool -S enp1s0f0 | grep wqe_err      rx_wqe_err: 2      rx0_wqe_err: 1      rx1_wqe_err: 1      rx2_wqe_err: 0      rx3_wqe_err: 0      rx4_wqe_err: 0      rx5_wqe_err: 0      rx6_wqe_err: 0      rx7_wqe_err: 0      rx8_wqe_err: 0      rx9_wqe_err: 0      rx10_wqe_err: 0 ... Let me check if the proposed patch needs backport or not. == Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 == I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport) using a kernel 4.18.12+. With the proposed patch I do not see wqe err and ping does not stop. ethtool -S enp1s0f0 | grep wqe_err      rx_wqe_err: 0      rx0_wqe_err: 0      rx1_wqe_err: 0      rx2_wqe_err: 0      rx3_wqe_err: 0      rx4_wqe_err: 0      rx5_wqe_err: 0      rx6_wqe_err: 0      rx7_wqe_err: 0      rx8_wqe_err: 0      rx9_wqe_err: 0      rx10_wqe_err: 0 ...
2018-11-07 06:53:42 Khaled El Mously linux (Ubuntu Cosmic): status In Progress Fix Committed
2018-11-12 15:45:17 Frank Heimes linux (Ubuntu): status In Progress Fix Committed
2018-11-12 15:45:20 Frank Heimes ubuntu-power-systems: status In Progress Fix Committed
2018-11-15 11:04:11 Brad Figg tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-needed-cosmic
2018-11-16 03:19:21 bugproxy tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-needed-cosmic architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic
2018-12-03 08:49:32 Launchpad Janitor linux (Ubuntu Cosmic): status Fix Committed Fix Released
2018-12-03 08:49:32 Launchpad Janitor cve linked 2018-18653
2018-12-03 08:49:32 Launchpad Janitor cve linked 2018-18955
2018-12-03 08:49:32 Launchpad Janitor cve linked 2018-6559
2018-12-03 14:44:55 Andrew Cloke linux (Ubuntu): status Fix Committed Fix Released
2018-12-03 14:44:57 Andrew Cloke ubuntu-power-systems: status Fix Committed Fix Released
2019-02-14 12:13:11 Brad Figg tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic verification-needed-bionic
2019-02-14 15:00:37 bugproxy tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic verification-needed-bionic architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic
2019-02-18 15:55:59 Jerry Clement bug added subscriber Jerry Clement
2019-07-24 20:54:15 Brad Figg tags architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic
2019-12-06 00:09:28 bugproxy tags architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin1810 verification-done-bionic verification-done-cosmic