[Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor

Bug #1758662 reported by Talat Batheesh on 2018-03-25
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
linux (Ubuntu)
Medium
Joseph Salisbury
Bionic
Medium
Joseph Salisbury

Bug Description

reproduce:
[root@reg-l-vrt-41018-010 ~]# /usr/bin/mlnx_qos -i ens8 -s vendor,ets,vendor,ets,ets,strict,ets,vendor -t 0,36,0,55,4,0,5,0
Netlink error: Bad value. see dmesg.

[root@reg-l-vrt-41018-010 ~]# dmesg
[69718.992299] mlx4_en: ens8: TC[0]: Not supported TSA: 255

There is a upstream commit that fix the issue, please add it to bionic

commit a42b63c1ac1986f17f71bc91a6b0aaa14d4dae71
Author: Moni Shoua <email address hidden>
Date: Thu Dec 28 16:26:11 2017 +0200

    net/mlx4_en: Change default QoS settings

    Change the default mapping between TC and TCG as follows:

    Prio | TC/TCG
             | from to
             | (set by FW) (set by SW)
    ---------+-----------------------------------
    0 | 0/0 0/7
    1 | 1/0 0/6
    2 | 2/0 0/5
    3 | 3/0 0/4
    4 | 4/0 0/3
    5 | 5/0 0/2
    6 | 6/0 0/1
    7 | 7/0 0/0

    These new settings cause that a pause frame for any prio stops
    traffic for all prios.

    Fixes: 564c274c3df0 ("net/mlx4_en: DCB QoS support")
    Signed-off-by: Moni Shoua <email address hidden>
    Signed-off-by: Maor Gottlieb <email address hidden>
    Signed-off-by: Tariq Toukan <email address hidden>
    Signed-off-by: David S. Miller <email address hidden>

diff --git a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
index 5f41dc9..1a0c3bf8 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_dcb_nl.c
@@ -310,6 +310,7 @@ static int mlx4_en_ets_validate(struct mlx4_en_priv *priv, struct ieee_ets *ets)
                }

                switch (ets->tc_tsa[i]) {
+ case IEEE_8021QAZ_TSA_VENDOR:
                case IEEE_8021QAZ_TSA_STRICT:
                        break;
                case IEEE_8021QAZ_TSA_ETS:
@@ -347,6 +348,10 @@ static int mlx4_en_config_port_scheduler(struct mlx4_en_priv *priv,
        /* higher TC means higher priority => lower pg */
        for (i = IEEE_8021QAZ_MAX_TCS - 1; i >= 0; i--) {
                switch (ets->tc_tsa[i]) {
+ case IEEE_8021QAZ_TSA_VENDOR:
+ pg[i] = MLX4_EN_TC_VENDOR;
+ tc_tx_bw[i] = MLX4_EN_BW_MAX;
+ break;
                case IEEE_8021QAZ_TSA_STRICT:
                        pg[i] = num_strict++;
                        tc_tx_bw[i] = MLX4_EN_BW_MAX;
diff --git a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
index 99051a2..21bc17f 100644
--- a/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
+++ b/drivers/net/ethernet/mellanox/mlx4/en_netdev.c
@@ -3336,6 +3336,13 @@ int mlx4_en_init_netdev(struct mlx4_en_dev *mdev, int port,
        priv->msg_enable = MLX4_EN_MSG_LEVEL;
 #ifdef CONFIG_MLX4_EN_DCB
        if (!mlx4_is_slave(priv->mdev->dev)) {
+ u8 prio;
+
+ for (prio = 0; prio < IEEE_8021QAZ_MAX_TCS; ++prio) {
+ priv->ets.prio_tc[prio] = prio;
+ priv->ets.tc_tsa[prio] = IEEE_8021QAZ_TSA_VENDOR;
+ }
+
                priv->dcbx_cap = DCB_CAP_DCBX_VER_CEE | DCB_CAP_DCBX_HOST |
                        DCB_CAP_DCBX_VER_IEEE;
                priv->flags |= MLX4_EN_DCB_ENABLED;
diff --git a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
index 2b72677..7db3d0d 100644
--- a/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
+++ b/drivers/net/ethernet/mellanox/mlx4/mlx4_en.h
@@ -479,6 +479,7 @@ struct mlx4_en_frag_info {
 #define MLX4_EN_BW_MIN 1
 #define MLX4_EN_BW_MAX 100 /* Utilize 100% of the line */

+#define MLX4_EN_TC_VENDOR 0
 #define MLX4_EN_TC_ETS 7

 enum dcb_pfc_type {

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1758662

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: bionic
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in linux (Ubuntu Bionic):
assignee: nobody → Joseph Salisbury (jsalisbury)
status: Incomplete → In Progress
Joseph Salisbury (jsalisbury) wrote :

I built a test kernel with commit a42b63c1ac1986f17f71bc91a6b0aaa14d4dae71. The test kernel can be downloaded from:
http://kernel.ubuntu.com/~jsalisbury/lp1758662

Can you test this kernel and see if it resolves this bug?

Note, to test this kernel, you need to install both the linux-image and linux-image-extra .deb packages.

Thanks in advance!

Talat Batheesh (talat-b87) wrote :

Thank you,
will test and update.

Talat Batheesh (talat-b87) wrote :

After testing the build, the issue didn't reproduced.
Thanks.

Seth Forshee (sforshee) on 2018-04-02
Changed in linux (Ubuntu Bionic):
status: In Progress → Fix Committed
Launchpad Janitor (janitor) wrote :
Download full text (40.4 KiB)

This bug was fixed in the package linux - 4.15.0-15.16

---------------
linux (4.15.0-15.16) bionic; urgency=medium

  * linux: 4.15.0-15.16 -proposed tracker (LP: #1761177)

  * FFe: Enable configuring resume offset via sysfs (LP: #1760106)
    - PM / hibernate: Make passing hibernate offsets more friendly

  * /dev/bcache/by-uuid links not created after reboot (LP: #1729145)
    - SAUCE: (no-up) bcache: decouple emitting a cached_dev CHANGE uevent

  * Ubuntu18.04:POWER9:DD2.2 - Unable to start a KVM guest with default machine
    type(pseries-bionic) complaining "KVM implementation does not support
    Transactional Memory, try cap-htm=off" (kvm) (LP: #1752026)
    - powerpc: Use feature bit for RTC presence rather than timebase presence
    - powerpc: Book E: Remove unused CPU_FTR_L2CSR bit
    - powerpc: Free up CPU feature bits on 64-bit machines
    - powerpc: Add CPU feature bits for TM bug workarounds on POWER9 v2.2
    - powerpc/powernv: Provide a way to force a core into SMT4 mode
    - KVM: PPC: Book3S HV: Work around transactional memory bugs in POWER9
    - KVM: PPC: Book3S HV: Work around XER[SO] bug in fake suspend mode
    - KVM: PPC: Book3S HV: Work around TEXASR bug in fake suspend state

  * Important Kernel fixes to be backported for Power9 (kvm) (LP: #1758910)
    - powerpc/mm: Fixup tlbie vs store ordering issue on POWER9

  * Ubuntu 18.04 - IO Hang on some namespaces when running HTX with 16
    namespaces (Bolt / NVMe) (LP: #1757497)
    - powerpc/64s: Fix lost pending interrupt due to race causing lost update to
      irq_happened

  * fwts-efi-runtime-dkms 18.03.00-0ubuntu1: fwts-efi-runtime-dkms kernel module
    failed to build (LP: #1760876)
    - [Packaging] include the retpoline extractor in the headers

linux (4.15.0-14.15) bionic; urgency=medium

  * linux: 4.15.0-14.15 -proposed tracker (LP: #1760678)

  * [Bionic] mlx4 ETH - mlnx_qos failed when set some TC to vendor
    (LP: #1758662)
    - net/mlx4_en: Change default QoS settings

  * AT_BASE_PLATFORM in AUXV is absent on kernels available on Ubuntu 17.10
    (LP: #1759312)
    - powerpc/64s: Fix NULL AT_BASE_PLATFORM when using DT CPU features

  * Bionic update to 4.15.15 stable release (LP: #1760585)
    - net: dsa: Fix dsa_is_user_port() test inversion
    - openvswitch: meter: fix the incorrect calculation of max delta_t
    - qed: Fix MPA unalign flow in case header is split across two packets.
    - tcp: purge write queue upon aborting the connection
    - qed: Fix non TCP packets should be dropped on iWARP ll2 connection
    - sysfs: symlink: export sysfs_create_link_nowarn()
    - net: phy: relax error checking when creating sysfs link netdev->phydev
    - devlink: Remove redundant free on error path
    - macvlan: filter out unsupported feature flags
    - net: ipv6: keep sk status consistent after datagram connect failure
    - ipv6: old_dport should be a __be16 in __ip6_datagram_connect()
    - ipv6: sr: fix NULL pointer dereference when setting encap source address
    - ipv6: sr: fix scheduling in RCU when creating seg6 lwtunnel state
    - mlxsw: spectrum_buffers: Set a minimum quota for CPU port traffic
    - net: phy: Tell caller result ...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers