net/mlx5e: Add missing capability check for uplink follow

Bug #1921104 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
High
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Undecided
Frank Heimes
Focal
Fix Released
High
Canonical Kernel Team
Groovy
Fix Released
High
Canonical Kernel Team
Hirsute
Fix Released
Undecided
Frank Heimes

Bug Description

SRU Justification:
==================

[Impact]

* Since older firmware may not support the uplink state setting, this can lead to problems.

* Now expose firmware indication that it supports setting eswitch uplink state to follow the physical link.

* If a kernel without the backport is used on an adapter which does not have the latest adapter firmware, the adapter silently drops outgoing traffic.

* This is a regression which was introduced with kernel 5.4.0-48.

[Fix]

* upstream fix (as in 5.11):
  9c9be85f6b59d80efe4705109c0396df18d4e11d 9c9be85f6b59 "net/mlx5e: Add missing capability check for uplink follow"

* backport for focal: https://launchpadlibrarian.net/529543695/0001-Backport-net-mlx5e-Add-missing-capability-check-for-.patch

* backport for groovy: https://launchpadlibrarian.net/529775887/0001-Backport-groovy-net-mlx5e-Add-missing-capability-che.patch

[Test Case]

* Two IBM Z or LinuxONE systems, installed with Ubuntu Server 20.04 or 20.10 on LPAR, are needed.

* Each with RoCE Express 2.x adapters (Mellanox ConnectX4/5) attached and firmware 16.29.1006 or earlier.

* Assign an IP address to the adapters on both systems and try to ping one node from the other.

* The ping will just fail with the stock Ubuntu kernels (not having the patch), but will succeed with kernels that incl. the patches (like the test builds from the PPA mentioned below).

* Due to the lack of hardware this needs to be verified by IBM.

[Regression Potential]

* Undesired / erroneous behavior in case the modified if condition is assembled in a wrong way.

* Again wrong behavior in case the modification of the capability bits in mlx5_ifc_cmd_hca_cap_bits are wrong.

* All modification are limited to the mlx5 driver only.

* The changes are relatively limited with effectively two lines removed and 4 added (three of them adjustments of the capability bits only).

* The modifications were done and tested by IBM and reviewed by Mellanox (see LP comments), based on a PPA test build.

[Other]

* The above patch/commit was upstream accepted with kernel 5.11.

* Hence the patch is not needed for hirsute, just needs to be SRUed for groovy and focal.

* The commit couldn't be cleanly cherry-picked, mainly due to changed context, hence the backport(s).

__________

Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.

Available fix with kernel 5.11.
https://github.com/torvalds/linux/commit/9c9be85f6b59d80efe4705109c0396df18d4e11d

Now required for Ubuntu 20.04 via backport patch.

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-192185 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-24 07:23 EDT-------
Expose firmware indication that it supports setting eswitch uplink state
to follow (follow the physical link). Condition setting the eswitch
uplink admin-state with this capability bit. Older FW may not support
the uplink state setting.

Available fix with kernel 5.11.
https://github.com/torvalds/linux/commit/9c9be85f6b59d80efe4705109c0396df18d4e11d

Now required for Ubuntu 20.04 via backport patch.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment (attachment only) From <email address hidden> 2021-03-24 07:21 EDT-------

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → High
status: New → Triaged
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote : Re: net/mlx5e: Add missing capability check for uplink follow for Ubuntu 20.04

The patch/commit is as '9c9be85f6b59' in hirsute since Ubuntu-5.11.0-11.12.
Hence marking the hirsute entry as Fix Released.

The patch applies cleanly to focal master-next.
A test kernel is currently build here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1921104
So updating the focal entry to Triaged.

Need to check if the patch also applies to groovy's 5.8.

Changed in linux (Ubuntu Focal):
status: New → Triaged
Changed in linux (Ubuntu Hirsute):
status: New → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-25 04:24 EDT-------
I tested Frank's build successfully.

Using adapter firmware 16.27.2008.

BAD TEST
=====================
root@pok1-qz1-sr1-rk011-s01:~# uname -a
Linux pok1-qz1-sr1-rk011-s01 5.4.0-67-generic #75-Ubuntu SMP Fri Feb 19 18:00:48 UTC 2021 s390x s390x s390x GNU/Linux

root@pok1-qz1-sr1-rk011-s01:~# ifconfig p0 172.31.22.3/31 mtu 9100 up
root@pok1-qz1-sr1-rk011-s01:~# ping 172.31.22.2
PING 172.31.22.2 (172.31.22.2) 56(84) bytes of data.
^C
--- 172.31.22.2 ping statistics ---
13 packets transmitted, 0 received, 100% packet loss, time 12445ms

GOOD TEST
=====================
root@pok1-qz1-sr1-rk011-s01:~# uname -a
Linux pok1-qz1-sr1-rk011-s01 5.4.0-68-generic #76~lp1921104-Ubuntu SMP Wed Mar 24 11:39:26 UTC 2021 s390x s390x s390x GNU/Linux

root@pok1-qz1-sr1-rk011-s01:~# ifconfig p0 172.31.22.3/31 mtu 9100 up
root@pok1-qz1-sr1-rk011-s01:~# ping 172.31.22.2
PING 172.31.22.2 (172.31.22.2) 56(84) bytes of data.
64 bytes from 172.31.22.2: icmp_seq=1 ttl=0 time=122 ms
64 bytes from 172.31.22.2: icmp_seq=2 ttl=0 time=0.089 ms
64 bytes from 172.31.22.2: icmp_seq=3 ttl=0 time=0.075 ms

------- Comment From <email address hidden> 2021-03-25 04:26 EDT-------
We also got positive reviews from the Mellanox team:

Idan Werpoler 18:47
it is ok,they told me

Idan Werpoler 19:02
@Aya Levin reviewed the fix also and it?s looks fine.

Revision history for this message
Alexander Schmidt (alexs-h) wrote : Re: net/mlx5e: Add missing capability check for uplink follow for Ubuntu 20.04

Attaching backport for groovy.

repro on groovy vanilla kernel
==============================
root@pok1-qz1-sr1-rk011-s01:~# uname -a
Linux pok1-qz1-sr1-rk011-s01 5.8.0-48-generic #54 SMP Thu Mar 25 06:59:15 EDT 2021 s390x s390x s390x GNU/Linux
root@pok1-qz1-sr1-rk011-s01:~# ifconfig p0 172.31.22.3/31 mtu 9100 up
root@pok1-qz1-sr1-rk011-s01:~# ping 172.31.22.2
PING 172.31.22.2 (172.31.22.2) 56(84) bytes of data.
^C
--- 172.31.22.2 ping statistics ---
14 packets transmitted, 0 received, 100% packet loss, time 13489ms

verify fix on groovy kernel with backport
=========================================
root@pok1-qz1-sr1-rk011-s01:~# uname -a
Linux pok1-qz1-sr1-rk011-s01 5.8.0-48-generic #54 SMP Thu Mar 25 11:16:31 EDT 2021 s390x s390x s390x GNU/Linux
root@pok1-qz1-sr1-rk011-s01:~# ifconfig p0 172.31.22.3/31 mtu 9100 up
root@pok1-qz1-sr1-rk011-s01:~# ping 172.31.22.2
PING 172.31.22.2 (172.31.22.2) 56(84) bytes of data.
64 bytes from 172.31.22.2: icmp_seq=2 ttl=0 time=738 ms
64 bytes from 172.31.22.2: icmp_seq=3 ttl=0 time=0.085 ms
64 bytes from 172.31.22.2: icmp_seq=4 ttl=0 time=0.073 ms
64 bytes from 172.31.22.2: icmp_seq=5 ttl=0 time=0.136 ms
64 bytes from 172.31.22.2: icmp_seq=6 ttl=0 time=0.093 ms

Revision history for this message
Amir Tzin (amirtz) wrote :

I reviewed the groovy patch and It looks fine!

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx Alexander and thx Amir - I'll work now on the SRUs for groovy and focal ...

Revision history for this message
Frank Heimes (fheimes) wrote :

Test compile of patched groovy kernel is started:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1921104

Frank Heimes (fheimes)
Changed in linux (Ubuntu Groovy):
status: New → Triaged
Revision history for this message
Alexander Schmidt (alexs-h) wrote :

Additional comment for the SRU description: this backport fixes a regression which was introduced with the kernel 5.4.0-48 update for Ubuntu 20.04.

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-26 05:16 EDT-------
SRU Justification:

[Impact]
* Since older firmware may not support the uplink state setting, this can lead to problems.
* Now expose firmware indication that it supports setting eswitch uplink state to follow the physical link.

* <... some more details are needed on how things fail today, w/o the patch>

[Fix]

* upstream fix (as in 5.11):
9c9be85f6b59d80efe4705109c0396df18d4e11d 9c9be85f6b59 "net/mlx5e: Add missing capability check for uplink follow"

https://launchpadlibrarian.net/529543695/0001-Backport-net-mlx5e-Add-missing-capability-check-for-.patch

* backport for groovy:
https://launchpadlibrarian.net/529775887/0001-Backport-groovy-net-mlx5e-Add-missing-capability-che.patch

[Test Case]

* It requires an IBM Z or LinuxONE system, with groovy/focal installed in LPAR
and RoCE Express 2.x adapters attached.

* Due to the lack of hardware this needs to be verifid by IBM.

[Regression Potential]

* This backport fixes a regression which was introduced with the kernel 5.4.0-48 update for Ubuntu 20.04.

[Other]

* The above patch/commit was upstream accepted with kernel 5.11.

* Hence the patch is not needed for hirsute, just SRUs for groovy and focal are needed.

* But the commit couldn't be cleanly cherry-picked, due to changed context, hence the backport(s).

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-03-26 06:23 EDT-------
Additions for the Impact statement:

[Impact]
* Since older firmware may not support the uplink state setting, this can lead to problems.
* Now expose firmware indication that it supports setting eswitch uplink state to follow the physical link.
* If a kernel without the backport is used on an adapter which does not have the latest adapter firmware, the adapter silently drops outgoing traffic

Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote : Re: net/mlx5e: Add missing capability check for uplink follow for Ubuntu 20.04

Thx Alex, that allowed me to fill the gaps and I have consolidated the SRU Justification and added it to the bug description.

description: updated
Revision history for this message
Alexander Schmidt (alexs-h) wrote :

Hi Frank, thank you very much!

Frank Heimes (fheimes)
summary: - net/mlx5e: Add missing capability check for uplink follow for Ubuntu
- 20.04
+ net/mlx5e: Add missing capability check for uplink follow
Frank Heimes (fheimes)
Changed in linux (Ubuntu Groovy):
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
status: Triaged → In Progress
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
importance: Undecided → High
Changed in linux (Ubuntu Groovy):
importance: Undecided → High
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Focal):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted for groovy and focal:
https://lists.ubuntu.com/archives/kernel-team/2021-March/thread.html#118559
changing status to 'In Progress' for both entries (G and F).

Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Committed
tags: added: sts
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-groovy' to 'verification-done-groovy'. If the problem still exists, change the tag 'verification-needed-groovy' to 'verification-failed-groovy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-groovy
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-04-21 08:50 EDT-------
I verified the fix successfully on 5.4.0-73.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thank you Alex, adjusting the tags accordingly ...

tags: added: verification-done-focal verification-done-groovy
removed: verification-needed-focal verification-needed-groovy
Revision history for this message
Fabio Augusto Miranda Martins (fabio.martins) wrote :

Another customer has provided positive feedback that it fixes the issue on Focal:

5.4.0-73-generic #82-Ubuntu SMP Wed Apr 14 17:39:42 UTC 2021

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (40.7 KiB)

This bug was fixed in the package linux - 5.4.0-73.82

---------------
linux (5.4.0-73.82) focal; urgency=medium

  * focal/linux: 5.4.0-73.82 -proposed tracker (LP: #1923781)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * CIFS DFS entries not accessible with 5.4.0-71.74-generic (LP: #1923670)
    - Revert "cifs: Set CIFS_MOUNT_USE_PREFIX_PATH flag on setting
      cifs_sb->prepath."

  * CVE-2021-29650
    - Revert "netfilter: x_tables: Update remaining dereference to RCU"
    - Revert "netfilter: x_tables: Switch synchronization to RCU"
    - netfilter: x_tables: Use correct memory barriers.

  * LRMv4: switch to signing nvidia modules via the Ubuntu Modules signing key
    (LP: #1918134)
    - [Packaging] dkms-build{,--nvidia-N} sync back from LRMv4

  * 5.4 kernel: when iommu is on crashdump fails (LP: #1922738)
    - iommu/vt-d: Refactor find_domain() helper
    - iommu/vt-d: Add attach_deferred() helper
    - iommu/vt-d: Move deferred device attachment into helper function
    - iommu/vt-d: Do deferred attachment in iommu_need_mapping()
    - iommu/vt-d: Remove deferred_attach_domain()
    - iommu/vt-d: Simplify check in identity_mapping()

  * Backport mlx5e fix for tunnel offload (LP: #1921769)
    - net/mlx5e: Check tunnel offload is required before setting SWP

  * Bcache bypasse writeback on caching device with fragmentation (LP: #1900438)
    - bcache: consider the fragmentation when update the writeback rate

  * Fix implicit declaration warnings for kselftests/memfd test on newer
    releases (LP: #1910323)
    - selftests/memfd: Fix implicit declaration warnings

  * net/mlx5e: Add missing capability check for uplink follow (LP: #1921104)
    - net/mlx5e: Add missing capability check for uplink follow

  * [UBUNUT 21.04] s390/vtime: fix increased steal time accounting
    (LP: #1921498)
    - s390/vtime: fix increased steal time accounting

  * Mute/Mic-mute LEDs are not work on HP 850/840/440 G8 Laptops (LP: #1920030)
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 840 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 440 G8
    - ALSA: hda/realtek: fix mute/micmute LEDs for HP 850 G8

  * Focal update: v5.4.106 upstream stable release (LP: #1920246)
    - uapi: nfnetlink_cthelper.h: fix userspace compilation error
    - powerpc/pseries: Don't enforce MSI affinity with kdump
    - ath9k: fix transmitting to stations in dynamic SMPS mode
    - net: Fix gro aggregation for udp encaps with zero csum
    - net: check if protocol extracted by virtio_net_hdr_set_proto is correct
    - net: avoid infinite loop in mpls_gso_segment when mpls_hlen == 0
    - sh_eth: fix TRSCER mask for SH771x
    - can: skb: can_skb_set_owner(): fix ref counting if socket was closed before
      setting skb ownership
    - can: flexcan: assert FRZ bit in flexcan_chip_freeze()
    - can: flexcan: enable RX FIFO after FRZ/HALT valid
    - can: flexcan: invoke flexcan_chip_freeze() to enter freeze mode
    - can: tcan4x5x: tcan4x5x_init(): fix initialization - clear MRAM before
      entering Normal Mode
    - tcp: add sanity tests to TCP_QUEUE_SEQ
    - netfilter: nf_nat: undo erroneous tcp edemux lookup
    - ne...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Revision history for this message
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package linux - 5.8.0-53.60

---------------
linux (5.8.0-53.60) groovy; urgency=medium

  * CVE-2021-3491
    - io_uring: fix provide_buffers sign extension
    - io_uring: fix overflows checks in provide buffers
    - SAUCE: proc: Avoid mixing integer types in mem_rw()
    - SAUCE: io_uring: truncate lengths larger than MAX_RW_COUNT on provide
      buffers

  * CVE-2021-3490
    - bpf: Fix a verifier failure with xor
    - SAUCE: bpf: verifier: fix ALU32 bounds tracking with bitwise ops

  * CVE-2021-3489
    - SAUCE: bpf: ringbuf: deny reserve of buffers larger than ringbuf
    - SAUCE: bpf: prevent writable memory-mapping of read-only ringbuf pages

 -- Stefan Bader <email address hidden> Thu, 06 May 2021 07:43:20 +0200

Changed in linux (Ubuntu Groovy):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2021-05-19 02:18 EDT-------
IBM Bugzilla status->closed, Fix Released with all requested distros.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.