[UBUNTU 20.04] smc: SMC connections hang with later-level implementations

Bug #1882088 reported by bugproxy on 2020-06-04
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Undecided
Skipper Bug Screeners
linux (Ubuntu)
Status tracked in Groovy
Focal
Medium
Frank Heimes
Groovy
Medium
Frank Heimes

Bug Description

SRU Justification:
==================

[Impact]

* Connections from later-level SMC (protocol) versions to an SMC-enabled server on Linux hang.

* Later-level versions of SMC (although backwards-compatible) present a higher version number and use larger messages during the CLC handshake.

* The solution to avoid such hangs is to introduce toleration for later version numbers, and support CLC messages of arbitrary length.

[Fix]

* fb4f79264c0fc6fd5a68ffe3e31bfff97311e1f1 fb4f79264c0f "net/smc: tolerate future SMCD versions"

[Test Case]

* Requires two IBM z13/z13s GA2 or LinuxONE Rockhopper/Emperor systems with RoCE Express adapter v2(.1) for SMC-D usage.

* One system needs to run the initial SMC-D version, the other a newer version.

* Establish a connection between both system and monitor/verify if it's reliable or if it hangs.

[Regression Potential]

* The regression can be considered as medium to low:

* Since SMC-D is a pretty special way of doing shared memory communications and not that wide-spread.

* However, the code that is changed is common code.

* But the patch is straight forward and only modifies /net/smc/smc_clc.c and /net/smc/smc_clc.h

* It largely bumps limits (allows larger messages), adds a check and introduces toleration, rather than changing control or flow.

[Other]

* The above fix is currently in 'linux-next' and tagged with next-20200709.

* It is still assumed that it gets accepted for 5.8.

* However, since this is not guaranteed this SRU request is for focal and groovy - to make sure that no potential regressions are introduced in case the patch will not end up in 5.8.

__________

Description: smc: SMC connections hang with later-level implementations
Symptom: Connections from later-level SMC versions to an SMC-enabled
               server on Linux hang.
Problem: Later-level versions of SMC present, although backwards-
               compatible, a higher version number, and use larger messages
               during the CLC handshake.
Solution: Adjust for tolerating later version numbers, and support CLC
               messages of arbitrary length.
Reproduction: Enable a server on Linux for SMC, and connect using a later-
               level version of SMC

Applicable for: Ubuntu 20.04

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-186071 severity-high targetmilestone-inin2004
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Frank Heimes (fheimes) on 2020-06-04
affects: linux (Ubuntu) → smc-tools (Ubuntu)
affects: smc-tools (Ubuntu) → linux (Ubuntu)

------- Comment From <email address hidden> 2020-06-04 10:10 EDT-------
Note: This patch is currently not upstream. It will be part of the full SMC-D v2 implementation (see LP1853291), and was successfully tested with a reference implementation. However, we still have some way to go with our SMC-D v2 code, yet other providers of SMC-D v2 implementations asked us to ship this toleration patch early, so that users have a chance to have their systems up to date by the time they deliver their versions.

Frank Heimes (fheimes) wrote :

Please ignore my ignorance, but why not bringing this fix separately upstream now?
I think it will not harm the pre SMC-D v2 code (otherwise the backport here would introduce issues in the current 20.04 kernel), and having it upstream now would save some effort to you later and for us it would mean that we have a reliable patch, that is upstream accepted (signed-off) and there would be no fear about further changes to the code.
Therefore we (and I think that's similar for most distributions) rely on upstream accepted patches, for stability, traceability and manageability reasons.
On top it could also be marked upstream for an upstream stable release update and could with that automatically land in kernel 5.4 updates.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-04 11:02 EDT-------
That's a very valid question.
The problem is that we cannot talk about v2 yet for legal reasons. We're pending some internal processing and especially a new RFC on v2. We assume that we get a lot of resistance upstream if we post any code for v2 in absence of an RFC.

information type: Public → Private
Frank Heimes (fheimes) on 2020-06-26
information type: Private → Public
Petr Tesarik (ptesarik) wrote :

This patch should definitely appear in the upstream kernel, because all Linux distributions will need it at some point. If IBM engineers are not allowed to post the patch, Frank (or I) can probably take care of it in upstream.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-06-29 08:07 EDT-------
As I wrote in a previous comment, it's not so much a question 'if' we can post it upstream, but rather 'at which time': We assume that we get a lot of resistance upstream if we post any code for v2 in absence of an RFC right now, plus we cannot talk about v2 yet for legal reasons yet.
Once a new IETF RFC detailing v2 is out, we can post our full implementation of v2 upstream, and these patches will be part of that effort.

Frank Heimes (fheimes) wrote :

Changing to Incomplete for now until discussions concluded ...

Changed in ubuntu-z-systems:
status: New → Incomplete
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu Groovy):
assignee: Skipper Bug Screeners (skipper-screen-team) → nobody
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-07-09 07:02 EDT-------
The patch was posted upstream, see https://www.spinics.net/lists/netdev/msg666808.html, and accepted by Dave Miller. I just checked his net-next tree, didn't see it there yet, but I would assume it will appear there soon.

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2020-07-10 11:24 EDT-------
Here is a proper link:
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net.git/commit/?id=fb4f79264c0fc6fd5a68ffe3e31bfff97311e1f1

Frank Heimes (fheimes) wrote :

Thx for sharing the updated and upstream link, Stefan!

Frank Heimes (fheimes) on 2020-07-10
Changed in ubuntu-z-systems:
status: Incomplete → Triaged
Frank Heimes (fheimes) on 2020-07-10
Changed in linux (Ubuntu Groovy):
status: New → Triaged
Changed in linux (Ubuntu Focal):
status: New → Triaged
assignee: nobody → Frank Heimes (fheimes)
Changed in linux (Ubuntu Groovy):
assignee: nobody → Frank Heimes (fheimes)
Frank Heimes (fheimes) wrote :

Kernel SRU request submitted:
https://lists.ubuntu.com/archives/kernel-team/2020-July/thread.html#111977
Updating status to 'In Progress'.

Changed in ubuntu-z-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu Focal):
status: Triaged → In Progress
Changed in linux (Ubuntu Groovy):
status: Triaged → In Progress
description: updated
Stefan Bader (smb) on 2020-07-13
Changed in linux (Ubuntu Groovy):
importance: Undecided → Medium
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Frank Heimes (fheimes) wrote :

A patched kernel (that I created during the SRU preparation and test compile) is available here:
https://people.canonical.com/~fheimes/lp1882088/
for further testing.

Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers