[UBUNTU 20.04] smc: SMC connections hang with later-level implementations

Bug #1882088 reported by bugproxy
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Skipper Bug Screeners
linux (Ubuntu)
Fix Released
Medium
Frank Heimes
Focal
Fix Released
Medium
Frank Heimes
Groovy
Fix Released
Medium
Frank Heimes

Bug Description

SRU Justification:
==================

[Impact]

* Connections from later-level SMC (protocol) versions to an SMC-enabled server on Linux hang.

* Later-level versions of SMC (although backwards-compatible) present a higher version number and use larger messages during the CLC handshake.

* The solution to avoid such hangs is to introduce toleration for later version numbers, and support CLC messages of arbitrary length.

[Fix]

* fb4f79264c0fc6fd5a68ffe3e31bfff97311e1f1 fb4f79264c0f "net/smc: tolerate future SMCD versions"

[Test Case]

* Requires two IBM z13/z13s GA2 or LinuxONE Rockhopper/Emperor systems with RoCE Express adapter v2(.1) for SMC-D usage.

* One system needs to run the initial SMC-D version, the other a newer version.

* Establish a connection between both system and monitor/verify if it's reliable or if it hangs.

[Regression Potential]

* The regression can be considered as medium to low:

* Since SMC-D is a pretty special way of doing shared memory communications and not that wide-spread.

* However, the code that is changed is common code.

* But the patch is straight forward and only modifies /net/smc/smc_clc.c and /net/smc/smc_clc.h

* It largely bumps limits (allows larger messages), adds a check and introduces toleration, rather than changing control or flow.

[Other]

* The above fix is currently in 'linux-next' and tagged with next-20200709.

* It is still assumed that it gets accepted for 5.8.

* However, since this is not guaranteed this SRU request is for focal and groovy - to make sure that no potential regressions are introduced in case the patch will not end up in 5.8.

__________

Description: smc: SMC connections hang with later-level implementations
Symptom: Connections from later-level SMC versions to an SMC-enabled
               server on Linux hang.
Problem: Later-level versions of SMC present, although backwards-
               compatible, a higher version number, and use larger messages
               during the CLC handshake.
Solution: Adjust for tolerating later version numbers, and support CLC
               messages of arbitrary length.
Reproduction: Enable a server on Linux for SMC, and connect using a later-
               level version of SMC

Applicable for: Ubuntu 20.04

Revision history for this message
bugproxy (bugproxy) wrote : Fix for the indicated issue

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-186071 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes)
affects: linux (Ubuntu) → smc-tools (Ubuntu)
affects: smc-tools (Ubuntu) → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-06-04 10:10 EDT-------
Note: This patch is currently not upstream. It will be part of the full SMC-D v2 implementation (see LP1853291), and was successfully tested with a reference implementation. However, we still have some way to go with our SMC-D v2 code, yet other providers of SMC-D v2 implementations asked us to ship this toleration patch early, so that users have a chance to have their systems up to date by the time they deliver their versions.

Revision history for this message
Frank Heimes (fheimes) wrote :

Please ignore my ignorance, but why not bringing this fix separately upstream now?
I think it will not harm the pre SMC-D v2 code (otherwise the backport here would introduce issues in the current 20.04 kernel), and having it upstream now would save some effort to you later and for us it would mean that we have a reliable patch, that is upstream accepted (signed-off) and there would be no fear about further changes to the code.
Therefore we (and I think that's similar for most distributions) rely on upstream accepted patches, for stability, traceability and manageability reasons.
On top it could also be marked upstream for an upstream stable release update and could with that automatically land in kernel 5.4 updates.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-04 11:02 EDT-------
That's a very valid question.
The problem is that we cannot talk about v2 yet for legal reasons. We're pending some internal processing and especially a new RFC on v2. We assume that we get a lot of resistance upstream if we post any code for v2 in absence of an RFC.

information type: Public → Private
Frank Heimes (fheimes)
information type: Private → Public
Revision history for this message
Petr Tesarik (ptesarik) wrote :

This patch should definitely appear in the upstream kernel, because all Linux distributions will need it at some point. If IBM engineers are not allowed to post the patch, Frank (or I) can probably take care of it in upstream.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-29 08:07 EDT-------
As I wrote in a previous comment, it's not so much a question 'if' we can post it upstream, but rather 'at which time': We assume that we get a lot of resistance upstream if we post any code for v2 in absence of an RFC right now, plus we cannot talk about v2 yet for legal reasons yet.
Once a new IETF RFC detailing v2 is out, we can post our full implementation of v2 upstream, and these patches will be part of that effort.

Revision history for this message
Frank Heimes (fheimes) wrote :

Changing to Incomplete for now until discussions concluded ...

Changed in ubuntu-z-systems:
status: New → Incomplete
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu Groovy):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-07-09 07:02 EDT-------
The patch was posted upstream, see https://www.spinics.net/lists/netdev/msg666808.html, and accepted by Dave Miller. I just checked his net-next tree, didn't see it there yet, but I would assume it will appear there soon.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-07-10 11:24 EDT-------
Here is a proper link:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=fb4f79264c0fc6fd5a68ffe3e31bfff97311e1f1

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx for sharing the updated and upstream link, Stefan!

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Incomplete → Triaged
Frank Heimes (fheimes)
Changed in linux (Ubuntu Groovy):
status: New → Triaged
Changed in linux (Ubuntu Focal):
status: New → Triaged
assignee: nobody → Frank Heimes (fheimes)
Changed in linux (Ubuntu Groovy):
assignee: nobody → Frank Heimes (fheimes)
Revision history for this message
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-July/thread.html#111977
Updating status to 'In Progress'.

Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
status: Triaged → In Progress
Changed in linux (Ubuntu Groovy):
status: Triaged → In Progress
description: updated
Stefan Bader (smb)
Changed in linux (Ubuntu Groovy):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Revision history for this message
Frank Heimes (fheimes) wrote :

A patched kernel (that I created during the SRU preparation and test compile) is available here:
https://people.canonical.com/~fheimes/lp1882088/
for further testing.

Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-18 05:13 EDT-------
I can start the test tomorrow.

Revision history for this message
bugproxy (bugproxy) wrote :
Download full text (4.4 KiB)

------- Comment From <email address hidden> 2020-08-20 09:32 EDT-------
I have installed an Ubuntu 20.4, and copied the deb-packages to /root/tmp/fheimes.
But I have never installed additional kernel packages on Ubuntu. It failed like this:

root@s8360032:~/tmp/fheimes# apt install /root/tmp/fheimes/*.deb
Reading package lists... Done
Building dependency tree
Reading state information... Done
Note, selecting 'linux-buildinfo-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-buildinfo-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
Note, selecting 'linux-headers-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-headers-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
Note, selecting 'linux-image-unsigned-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-image-unsigned-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
Note, selecting 'linux-modules-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-modules-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
Note, selecting 'linux-modules-extra-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-modules-extra-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
Note, selecting 'linux-tools-5.4.0-42-generic' instead of '/root/tmp/fheimes/linux-tools-5.4.0-42-generic_5.4.0-42.46_s390x.deb'
The following additional packages will be installed:
libdw1 linux-tools-5.4.0-42 linux-tools-common
Suggested packages:
fdutils linux-doc | linux-source-5.4.0 linux-tools
The following packages will be REMOVED:
linux-generic linux-image-5.4.0-42-generic linux-image-generic
The following NEW packages will be installed:
libdw1 linux-buildinfo-5.4.0-42-generic linux-image-unsigned-5.4.0-42-generic linux-tools-5.4.0-42
linux-tools-5.4.0-42-generic linux-tools-common
The following packages will be DOWNGRADED:
linux-headers-5.4.0-42-generic linux-modules-5.4.0-42-generic linux-modules-extra-5.4.0-42-generic
0 upgraded, 6 newly installed, 3 downgraded, 3 to remove and 0 not upgraded.
Need to get 5,185 kB/28.9 MB of archives.
After this operation, 23.6 MB of additional disk space will be used.
Do you want to continue? [Y/n] Y
Get:1 http://ports.ubuntu.com/ubuntu-ports focal/main s390x libdw1 s390x 0.176-1.1build1 [226 kB]
Get:2 /root/tmp/fheimes/linux-image-unsigned-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-image-unsigned-5.4.0-42-generic s390x 5.4.0-42.46 [6,658 kB]
Get:3 /root/tmp/fheimes/linux-buildinfo-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-buildinfo-5.4.0-42-generic s390x 5.4.0-42.46 [384 kB]
Get:4 /root/tmp/fheimes/linux-headers-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-headers-5.4.0-42-generic s390x 5.4.0-42.46 [709 kB]
Get:5 http://ports.ubuntu.com/ubuntu-ports focal-updates/main s390x linux-tools-common all 5.4.0-42.46 [197 kB]
Get:6 http://ports.ubuntu.com/ubuntu-ports focal-updates/main s390x linux-tools-5.4.0-42 s390x 5.4.0-42.46 [4,763 kB]
Get:7 /root/tmp/fheimes/linux-modules-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-modules-5.4.0-42-generic s390x 5.4.0-42.46 [9,310 kB]
Get:8 /root/tmp/fheimes/linux-modules-extra-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-modules-extra-5.4.0-42-generic s390x 5.4.0-42.46 [6,332 kB]
Get:9 /root/tmp/fheimes/linux-tools-5.4.0-42-generic_5.4.0-42.46_s390x.deb linux-tools-5.4.0-42-generic s390x ...

Read more...

Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (4.8 KiB)

Hi, please let me recommend a different approach to get the kernel from proposed installed:

# the package 'software-properties-common' contains the 'add-apt-repository' tool and should be installed by default these days, but anyway:
$ sudo apt install software-properties-common

# enable the proposed section of the Ubuntu archive:
$ sudo add-apt-repository "deb http://ports.ubuntu.com/ubuntu-ports/ $(lsb_release -sc)-proposed restricted main universe"

# the above should trigger an archive index update automatically these days, but anyway:
$ sudo apt update

# will see several kernel related packages:
$ apt list --upgradable | grep -i ^linux-image
linux-headers-generic/focal-proposed,focal-proposed 5.4.0.44.48 s390x [upgradable from: 5.4.0.42.46]
linux-image-generic/focal-proposed,focal-proposed 5.4.0.44.48 s390x [upgradable from: 5.4.0.42.46]
linux-libc-dev/focal-proposed,focal-proposed 5.4.0-44.48 s390x [upgradable from: 5.4.0-42.46]
linux-source-5.4.0/focal-proposed,focal-proposed 5.4.0-44.48 all [upgradable from: 5.4.0-42.46]
linux-source/focal-proposed,focal-proposed 5.4.0.44.48 all [upgradable from: 5.4.0.42.46]
linux-tools-common/focal-proposed,focal-proposed 5.4.0-44.48 all [upgradable from: 5.4.0-42.46]
linux-tools-generic/focal-proposed,focal-proposed 5.4.0.44.48 s390x [upgradable from: 5.4.0.42.46]

# you can now update to the new kernel packages (aka install them) in different way:

# 1) just install the package that's there (and trust that it's the latest version):
$ sudo apt install --install-recommends linux-image-generic
...

# 2) or by explicitly specifying the version number:
$ sudo apt install --install-recommends linux-image-generic=4.15.0.114.102

# 3) or (and probably easiest) upgrading everything to the latest available level in proposed:
$ sudo apt -y -q full-upgrade
...

# you may see messages like these:

  ┌───────────────────────┤ Pending kernel upgrade ├────────────────────────┐
  │ │
  │ Newer kernel available │
  │ │
  │ The currently running kernel version is 5.4.0-42-generic which is not │
  │ the expected kernel version . │
  │ │
  │ Restarting the system to load the new kernel will not be handled │
  │ automatically, so you should consider rebooting. │
  │ │
  │ <Ok> │
  │ │
  └─────────────────────────────────────────────────────────────────────────┘

                ┌────┤ Daemons using outdated libraries ├─────┐
                │ │
                │ │
                │ Which services should be restarted? │
             ...

Read more...

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-21 12:42 EDT-------
Yes, it helped. Thx! I am now running kernel 5.4.0-44-generic and started verification.

Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-08-21 14:10 EDT-------
My tests have been successful. Fix is validated.

Revision history for this message
Frank Heimes (fheimes) wrote :

Thx a lot Ursula - I've updated the tags ...

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Frank Heimes (fheimes) wrote :

The patch landed in 5.8 and we have 5.8 in groovy, hence updating the status of the groovy part to Fix Released.

Changed in linux (Ubuntu Groovy):
status: In Progress → Fix Released
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (97.9 KiB)

This bug was fixed in the package linux - 5.4.0-45.49

---------------
linux (5.4.0-45.49) focal; urgency=medium

  * focal/linux: 5.4.0-45.49 -proposed tracker (LP: #1893050)

  * [Potential Regression] dscr_inherit_exec_test from powerpc in
    ubuntu_kernel_selftests failed on B/E/F (LP: #1888332)
    - powerpc/64s: Don't init FSCR_DSCR in __init_FSCR()

linux (5.4.0-44.48) focal; urgency=medium

  * focal/linux: 5.4.0-44.48 -proposed tracker (LP: #1891049)

  * Packaging resync (LP: #1786013)
    - [Packaging] update helper scripts

  * ipsec: policy priority management is broken (LP: #1890796)
    - xfrm: policy: match with both mark and mask on user interfaces

linux (5.4.0-43.47) focal; urgency=medium

  * focal/linux: 5.4.0-43.47 -proposed tracker (LP: #1890746)

  * Packaging resync (LP: #1786013)
    - update dkms package versions

  * Devlink - add RoCE disable kernel support (LP: #1877270)
    - devlink: Add new "enable_roce" generic device param
    - net/mlx5: Document flow_steering_mode devlink param
    - net/mlx5: Handle "enable_roce" devlink param
    - IB/mlx5: Rename profile and init methods
    - IB/mlx5: Load profile according to RoCE enablement state
    - net/mlx5: Remove unneeded variable in mlx5_unload_one
    - net/mlx5: Add devlink reload
    - IB/mlx5: Do reverse sequence during device removal

  * msg_zerocopy.sh in net from ubuntu_kernel_selftests failed (LP: #1812620)
    - selftests/net: relax cpu affinity requirement in msg_zerocopy test

  * Enlarge hisi_sec2 capability (LP: #1890222)
    - Revert "UBUNTU: [Config] Disable hisi_sec2 temporarily"
    - crypto: hisilicon - update SEC driver module parameter

  * Fix missing HDMI/DP Audio on an HP Desktop (LP: #1890441)
    - ALSA: hda/hdmi: Add quirk to force connectivity

  * Fix IOMMU error on AMD Radeon Pro W5700 (LP: #1890306)
    - PCI: Mark AMD Navi10 GPU rev 0x00 ATS as broken

  * ASoC:amd:renoir: the dmic can't record sound after suspend and resume
    (LP: #1890220)
    - SAUCE: ASoC: amd: renoir: restore two more registers during resume

  * No sound, Dummy output on Acer Swift 3 SF314-57G with Ice Lake core-i7 CPU
    (LP: #1877757)
    - ASoC: SOF: Intel: hda: fix generic hda codec support

  * Fix right speaker of HP laptop (LP: #1889375)
    - SAUCE: hda/realtek: Fix right speaker of HP laptop

  * blk_update_request error when mount nvme partition (LP: #1872383)
    - SAUCE: nvme-pci: prevent SK hynix PC400 from using Write Zeroes command

  * soc/amd/renoir: detect dmic from acpi table (LP: #1887734)
    - ASoC: amd: add logic to check dmic hardware runtime
    - ASoC: amd: add ACPI dependency check
    - ASoC: amd: fixed kernel warnings

  * soc/amd/renoir: change the module name to make it work with ucm3
    (LP: #1888166)
    - AsoC: amd: add missing snd- module prefix to the acp3x-rn driver kernel
      module
    - SAUCE: remove a kernel module since its name is changed

  * Focal update: v5.4.55 upstream stable release (LP: #1890343)
    - AX.25: Fix out-of-bounds read in ax25_connect()
    - AX.25: Prevent out-of-bounds read in ax25_sendmsg()
    - dev: Defer free of skbs in flush_backlog
    - drivers/net/wan/x25_asy: Fix to make i...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-09-04 03:06 EDT-------
IBM Bugzilla status-> closed, Fix Released with all requested distros.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.