[Ubuntu 20.04] Striding RQ as Default for ConnectX-4

Bug #1868113 reported by bugproxy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Release Notes for Ubuntu
Fix Released
Undecided
Frank Heimes
Ubuntu on IBM z Systems
Fix Released
Medium
Unassigned
linux (Ubuntu)
Won't Fix
Medium
Skipper Bug Screeners

Bug Description

ello,
Within our Network Performance runs in the RoCE Express 2(.1) area, we noticed a performance regression with streaming workloads which could be mitigated by using an ethtool setting.

The Commit which switched the default value from "Striding RQ" to "Legacy RQ" for ConnectX-4 devices (RoCE Express 2(.1)) is attached here:
commit 5ffd81943d7a57423f204cd5844bf430b5634472 (refs/bisect/bad)
Author: Tariq Toukan <email address hidden>
Date: Tue Feb 20 15:17:54 2018 +0200
    net/mlx5e: RX, Always prefer Linear SKB configuration
    Prefer the linear SKB configuration of Legacy RQ over the
    non-linear one of Striding RQ.
    This implies that ConnectX-4 LX now uses legacy RQ by default,
    as it does not support the linear configuration of Striding RQ.
    Signed-off-by: Tariq Toukan <email address hidden>
    Signed-off-by: Saeed Mahameed <email address hidden>

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 2c634e50d051..333d4ed52b94 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -4405,9 +4405,16 @@ void mlx5e_build_nic_params(struct mlx5_core_dev *mdev,
        MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_CQE_COMPRESS, params->rx_cqe_compress_def);
        /* RQ */
- if (mlx5e_striding_rq_possible(mdev, params))
- MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ,
- !slow_pci_heuristic(mdev));
+ /* Prefer Striding RQ, unless any of the following holds:
+ * - Striding RQ configuration is not possible/supported.
+ * - Slow PCI heuristic.
+ * - Legacy RQ would use linear SKB while Striding RQ would use non-linear.
+ */
+ if (!slow_pci_heuristic(mdev) &&
+ mlx5e_striding_rq_possible(mdev, params) &&
+ (mlx5e_rx_mpwqe_is_linear_skb(mdev, params) ||
+ !mlx5e_rx_is_linear_skb(mdev, params)))
+ MLX5E_SET_PFLAG(params, MLX5E_PFLAG_RX_STRIDING_RQ, true);
        mlx5e_set_rq_type(mdev, params);
        mlx5e_init_rq_type_params(mdev, params);

We have modified the upstream-kernel to allow us running of measurements and compare differences between Legacy RQ vs Striding RQ. Here is an example below:

Kernel used: 5.4.0-rc7
The measurements run on a dedicated machine (z14) using uperf with streaming profiles (MTU size 1500).

Example throughput drop:
(traffic via a shared card, i.e. client and server using VFs from the same ConnectX-4)
--------------------------------------------------------------------------
| | Legacy RQ | Striding RQ |
--------------------------------------------------------------------------
|str-writex30k (1 connection) | 24.62Gb/s | 33.47Gb/s |
--------------------------------------------------------------------------
Additionaly, two tests with transactional workload using the ethtool proposed switch:
--------------------------------------------------------------------------
| | Legacy RQ | Striding RQ |
--------------------------------------------------------------------------
| rr1c-200x30k---1 | 4.12Gb/s | 5.66Gb/s |
--------------------------------------------------------------------------
| rr1c-200x30k--10 | 15.10Gb/s | 20.77Gb/s |
--------------------------------------------------------------------------

As concluded in the communication with Mellanox, there is a possibility to use a simple ethtool command to switch between the queuing methods, allowing us to avoid kernel code changes:
ethtool --set-priv-flags DEVNAME rx_striding_rq on
(To list the available settings you may use: ethtool --show-priv-flags DEVNAME)

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-184497 severity-medium targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
summary: - [Ubuntu 20.04] Striding RQ als Default für ConnectX-4 in Distros
+ [Ubuntu 20.04] Striding RQ as Default for ConnectX-4
Revision history for this message
Frank Heimes (fheimes) wrote :

Not sure if I understand that ticket correctly - which 5.4-rc7 was used?
Was is directly from upstream?
And why is still a RC kernel in use - 5.4 is released since quite some time?

Well, if only commit "net/mlx5e: RX, Always prefer Linear SKB configuration" is needed, then we are done, since it's already in the Ubuntu focal kernel since quite some time:

focal-clean$ git log --oneline --grep "net/mlx5e: RX, Always prefer Linear SKB configuration"
5ffd81943d7a net/mlx5e: RX, Always prefer Linear SKB configuration
focal-clean$ git tag --contains 5ffd81943d7a
Ubuntu-5.4-5.4.0-10.13
Ubuntu-5.4-5.4.0-11.14
Ubuntu-5.4-5.4.0-12.15
Ubuntu-5.4-5.4.0-13.16
Ubuntu-5.4-5.4.0-14.17
Ubuntu-5.4.0-15.18
Ubuntu-5.4.0-16.19
Ubuntu-5.4.0-17.20
Ubuntu-5.4.0-17.21
Ubuntu-5.4.0-18.22
Ubuntu-5.4.0-8.11
Ubuntu-5.4.0-9.12

Looks to me that you can simply move to the current Ubuntu focal kernel (ideally the one from proposed) and proceed with testing from there ...

Changed in linux (Ubuntu):
status: New → Incomplete
Changed in ubuntu-z-systems:
status: New → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-03-20 06:01 EDT-------
Hi,

first let me get to your questions regarding the kernel.

As you already guessed, this was an upstream (fedora) kernel and the latest version at the time we pursued the issue. That is also the answer to why we did not use the stable 5.4.
However, the problem appeared already with older kernel versions since the commit named above was introduced with 4.18.

Now getting to the problem.
Before the code change stated above, striding RQ was used with ConnectX-4 devices. By introducing the commit, the default RQ was set to "legacy RQ" for ConnectX-4 devices as the comment in the commit already indicates, "this implies that ConnectX-4 LX now uses legacy RQ by default".

As our performance tests showed for z14 and z15 it is beneficial to use striding RQ with ConnectX-4 (RoCE Express 2(.1)).

That is why we ask to switch the default for ConnectX-4 back to striding RQ when using a z14/z15. This can be done by calling the following command:
ethtool --set-priv-flags DEVNAME rx_striding_rq on

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-03-24 04:25 EDT-------
After all the discussion , please default ConnectX-4 back to striding RQ when using a z14/z15, even it could be set by the ethtool.

Many thanks in advance

Revision history for this message
Frank Heimes (fheimes) wrote :

Reverting the patch is not a good idea and would not help in all the cases mentioned - since it would impact all architectures, not only s390x, would also impact z13 and zEC12 where this should not be changed and would move us away from the upstream code base.

Instead please think about introducing a kernel config option that allows to toggle Striding RQ.
A kernel config option allows to set different defaults for different architectures, hence it could be at least changed for s390x in general.

Changed in ubuntu-z-systems:
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Undecided → Medium
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-03-26 06:29 EDT-------
Hi,
our intention was *never* to revert a patch. What we propose is to trigger the mentioned ethtool command (ethtool --set-priv-flags DEVNAME rx_striding_rq on) as soon as the distro is running on a z14/z15 machine and the ConnectX-4 driver is started.
Do you think something like that would be possible?

Revision history for this message
Frank Heimes (fheimes) wrote :

Well, that is unfortunately something that is usually not done and shipped with the distro, since it's too specific for a special generations of a system and the function is even given w/o this.
May I suggest that we pick this up in the release notes, like:

"
Performance tests showed that it is beneficial to use striding RQ with RoCE Express 2(.1) PCIe cards (ConnectX-4) on IBM z14 and LinuxONE Rockhopper II / Emperor II and newer - but by default it is not used.
Hence if one has RoCE 2(.1) hardware plugged in to such a system the enablement of striding RQ should be considered, like:
ethtool --set-priv-flags <ifname> rx_striding_rq on
For the reason of persistence one may also create a service file that sets this during boot-up.
"

Would that be ok make sense?

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Incomplete → Triaged
Changed in ubuntu-release-notes:
assignee: nobody → Frank Heimes (fheimes)
status: New → Confirmed
Changed in ubuntu-z-systems:
status: Triaged → Confirmed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-03-30 08:43 EDT-------
Hi,

I think this suggestion is fine, let us do it as the first step. We may discuss later on what our options are in regards of enabling architecture-specific tunings in distros.

Cheers, Danijel

Revision history for this message
Frank Heimes (fheimes) wrote :

Ok, I've added a paragraph to the (preliminary) release notes:
https://wiki.ubuntu.com/FocalFossa/ReleaseNotes#s390x
Please double check if you are fine with the description.

Changed in ubuntu-z-systems:
status: Confirmed → Fix Released
Changed in ubuntu-release-notes:
status: Confirmed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-03-31 09:58 EDT-------
IBM Bugzilla status-> closed, Fix Released with focal

Mathew Hodson (mhodson)
Changed in linux (Ubuntu):
status: Incomplete → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.