2018-10-23 09:39:25 |
bugproxy |
bug |
|
|
added bug |
2018-10-23 09:39:27 |
bugproxy |
tags |
|
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- |
|
2018-10-23 09:39:28 |
bugproxy |
ubuntu: assignee |
|
Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
|
2018-10-23 09:39:30 |
bugproxy |
affects |
ubuntu |
linux (Ubuntu) |
|
2018-10-23 09:50:07 |
Frank Heimes |
bug task added |
|
ubuntu-power-systems |
|
2018-10-23 09:50:23 |
Frank Heimes |
ubuntu-power-systems: importance |
Undecided |
Critical |
|
2018-10-23 09:50:37 |
Frank Heimes |
ubuntu-power-systems: assignee |
|
Canonical Kernel Team (canonical-kernel-team) |
|
2018-10-23 18:30:32 |
Joseph Salisbury |
linux (Ubuntu): importance |
Undecided |
Critical |
|
2018-10-23 18:30:38 |
Joseph Salisbury |
linux (Ubuntu): assignee |
Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) |
Joseph Salisbury (jsalisbury) |
|
2018-10-23 18:30:42 |
Joseph Salisbury |
linux (Ubuntu): status |
New |
In Progress |
|
2018-10-23 19:22:06 |
Frank Heimes |
ubuntu-power-systems: status |
New |
In Progress |
|
2018-10-31 18:00:59 |
Joseph Salisbury |
nominated for series |
|
Ubuntu Cosmic |
|
2018-10-31 18:00:59 |
Joseph Salisbury |
bug task added |
|
linux (Ubuntu Cosmic) |
|
2018-10-31 18:01:06 |
Joseph Salisbury |
linux (Ubuntu Cosmic): status |
New |
In Progress |
|
2018-10-31 18:01:09 |
Joseph Salisbury |
linux (Ubuntu Cosmic): importance |
Undecided |
Critical |
|
2018-10-31 18:01:11 |
Joseph Salisbury |
linux (Ubuntu Cosmic): assignee |
|
Joseph Salisbury (jsalisbury) |
|
2018-10-31 18:05:19 |
Joseph Salisbury |
description |
== Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 ==
---Problem Description---
At the system if u do
ethtool -S enP48p1s0f0 | grep wqe_err
rx_wqe_err: 1
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 1
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
rx11_wqe_err: 0
rx12_wqe_err: 0
rx13_wqe_err: 0
rx14_wqe_err: 0
rx15_wqe_err: 0
Will see that rx side is hitting issue.
---Additional Hardware Info---
Mellanox CX5 Ethernet 100G
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Machine Type = P9
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Using a CX5 Ethernet 100G card
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
just configure IP
ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
then partner system configure IP and then try ping -f
ping -f 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
........................................^C
--- 33.33.33.33 ping statistics ---
5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
# ping 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
^C
--- 33.33.33.33 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1071ms
then at the recv system then do
ethtool -S enP48p1s0f0 | grep wqe_err
rx_wqe_err: 1
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 1
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
rx11_wqe_err: 0
rx12_wqe_err: 0
rx13_wqe_err: 0
rx14_wqe_err: 0
rx15_wqe_err: 0
you will see rx_wqe_err with a counter non-zero.
This is fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0
== Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 ==
I did a git clone to the cosmic tree and loaded the kernel in a system.
kernel 4.18.12 and I can recreate it.
lspci | grep Mell | grep ConnectX-5
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
:~# ethtool -S enp1s0f0 | grep wqe_err
rx_wqe_err: 2
rx0_wqe_err: 1
rx1_wqe_err: 1
rx2_wqe_err: 0
rx3_wqe_err: 0
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
...
Let me check if the proposed patch needs backport or not.
== Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 ==
I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport)
using a kernel 4.18.12+.
With the proposed patch I do not see wqe err and ping does not stop.
ethtool -S enp1s0f0 | grep wqe_err
rx_wqe_err: 0
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 0
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
... |
== SRU Justification ==
The requested commit fixes a regression introduce by mainline commit
3a2f70331226, in v4.18-rc1. The commit is only needed in Cosmic. Do to
the regression, A Mellanox CX5 stops pinging with rx_wqe_err (mlx5_core)
== Fix ==
37fdffb217a4 ("net/mlx5: WQ, fixes for fragmented WQ buffers API")
== Regression Potential ==
Low. This commit has been cc'd to stable, so it has had additional
upstream review.
== Test Case ==
A test kernel was built with this patch and tested by the original bug reporter.
The bug reporter states the test kernel resolved the bug.
== Comment: #0 - Michael Ranweiler - 2018-10-18 11:34:40 ==
---Problem Description---
At the system if u do
ethtool -S enP48p1s0f0 | grep wqe_err
rx_wqe_err: 1
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 1
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
rx11_wqe_err: 0
rx12_wqe_err: 0
rx13_wqe_err: 0
rx14_wqe_err: 0
rx15_wqe_err: 0
Will see that rx side is hitting issue.
---Additional Hardware Info---
Mellanox CX5 Ethernet 100G
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
Machine Type = P9
---Debugger---
A debugger is not configured
---Steps to Reproduce---
Using a CX5 Ethernet 100G card
lspci
0030:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
just configure IP
ifconfig enP48p1s0f0 33.33.33.33 netmask 255.255.255.0 up
then partner system configure IP and then try ping -f
ping -f 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
........................................^C
--- 33.33.33.33 ping statistics ---
5413 packets transmitted, 5373 received, 0% packet loss, time 934ms
rtt min/avg/max/mdev = 0.015/0.019/0.669/0.010 ms, ipg/ewma 0.172/0.020 ms
# ping 33.33.33.33
PING 33.33.33.33 (33.33.33.33) 56(84) bytes of data.
^C
--- 33.33.33.33 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1071ms
then at the recv system then do
ethtool -S enP48p1s0f0 | grep wqe_err
rx_wqe_err: 1
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 1
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
rx11_wqe_err: 0
rx12_wqe_err: 0
rx13_wqe_err: 0
rx14_wqe_err: 0
rx15_wqe_err: 0
you will see rx_wqe_err with a counter non-zero.
This is fixed by this patch:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/?id=37fdffb217a45609edccbb8b407d031143f551c0
== Comment: #1 - Carol L. Soto - 2018-10-18 11:46:00 ==
I did a git clone to the cosmic tree and loaded the kernel in a system.
kernel 4.18.12 and I can recreate it.
lspci | grep Mell | grep ConnectX-5
0000:01:00.0 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0000:01:00.1 Ethernet controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.0 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
0030:01:00.1 Infiniband controller: Mellanox Technologies MT28800 Family [ConnectX-5 Ex]
:~# ethtool -S enp1s0f0 | grep wqe_err
rx_wqe_err: 2
rx0_wqe_err: 1
rx1_wqe_err: 1
rx2_wqe_err: 0
rx3_wqe_err: 0
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
...
Let me check if the proposed patch needs backport or not.
== Comment: #3 - Carol L. Soto - 2018-10-18 13:34:46 ==
I was able to apply the proposed patch as it to the cosmic git tree and no issue. (no need to backport)
using a kernel 4.18.12+.
With the proposed patch I do not see wqe err and ping does not stop.
ethtool -S enp1s0f0 | grep wqe_err
rx_wqe_err: 0
rx0_wqe_err: 0
rx1_wqe_err: 0
rx2_wqe_err: 0
rx3_wqe_err: 0
rx4_wqe_err: 0
rx5_wqe_err: 0
rx6_wqe_err: 0
rx7_wqe_err: 0
rx8_wqe_err: 0
rx9_wqe_err: 0
rx10_wqe_err: 0
... |
|
2018-11-07 06:53:42 |
Khaled El Mously |
linux (Ubuntu Cosmic): status |
In Progress |
Fix Committed |
|
2018-11-12 15:45:17 |
Frank Heimes |
linux (Ubuntu): status |
In Progress |
Fix Committed |
|
2018-11-12 15:45:20 |
Frank Heimes |
ubuntu-power-systems: status |
In Progress |
Fix Committed |
|
2018-11-15 11:04:11 |
Brad Figg |
tags |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-needed-cosmic |
|
2018-11-16 03:19:21 |
bugproxy |
tags |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-needed-cosmic |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic |
|
2018-12-03 08:49:32 |
Launchpad Janitor |
linux (Ubuntu Cosmic): status |
Fix Committed |
Fix Released |
|
2018-12-03 08:49:32 |
Launchpad Janitor |
cve linked |
|
2018-18653 |
|
2018-12-03 08:49:32 |
Launchpad Janitor |
cve linked |
|
2018-18955 |
|
2018-12-03 08:49:32 |
Launchpad Janitor |
cve linked |
|
2018-6559 |
|
2018-12-03 14:44:55 |
Andrew Cloke |
linux (Ubuntu): status |
Fix Committed |
Fix Released |
|
2018-12-03 14:44:57 |
Andrew Cloke |
ubuntu-power-systems: status |
Fix Committed |
Fix Released |
|
2019-02-14 12:13:11 |
Brad Figg |
tags |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic verification-needed-bionic |
|
2019-02-14 15:00:37 |
bugproxy |
tags |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-cosmic verification-needed-bionic |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic |
|
2019-02-18 15:55:59 |
Jerry Clement |
bug |
|
|
added subscriber Jerry Clement |
2019-07-24 20:54:15 |
Brad Figg |
tags |
architecture-ppc64le bugnameltc-172460 severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic |
architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic |
|
2019-12-06 00:09:28 |
bugproxy |
tags |
architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin--- verification-done-bionic verification-done-cosmic |
architecture-ppc64le bugnameltc-172460 cscc severity-critical targetmilestone-inin1810 verification-done-bionic verification-done-cosmic |
|