nx842 - CRB request time out (-110) when uninstall NX modules and initiate NX request

Bug #1827755 reported by bugproxy on 2019-05-05
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
The Ubuntu-power-systems project
Critical
Manoj Iyer
linux (Ubuntu)
Critical
Manoj Iyer
Bionic
Critical
Unassigned

Bug Description

[Impact]
PowerPC 842 hardware compression support is currently broken, this effects workloads like zswap and others that exploit 842 hardware compression on Power.

[Test]
- Install nx-compress and nx-842-powernv modules
- Initiate NX request
- Uninstall these modules
- Initiate NX request again and we get CRB timeout with error -110
Test kernel available in the PPA, please see comment #4 and please see comment #5 that verifies the PPA kernel works as expected.

[Fix]
IBM has identified that the following upstream patch fixes the issue:
656ecc16e8fc crypto/nx: Initialize 842 high and normal RxFIFO control registers

[Regression Potential]
The patch only impacts the nx-842 modules, only available on PowerPC architecture and does not have any impact on other architectures or generic code. Risk of regression is very low.

[Other Info]
---Problem Description---
Normally nx-compress and nx-842-powernv modules are loaded when selects 842-nx compressor if not loaded and execute forever during system execution. So we will not see this bug in normal case.

But we are seeing NX CRB request timeout when uninstall these modules and load them or select 842-nx compressor.

---uname output---
18.04

Machine Type = P9 system

---Steps to Reproduce---
- Install nx-compress and nx-842-powernv modules
- Initiate NX request
- Uninstall these modules
- Initiate NX request again and we get CRB timeout with error -110

Patches are included in 4.19-rc1

6e708000ec2c93c2bde6a46aa2d6c3e80d4eaeb9 - powerpc/powernv: Export opal_check_token symbol
656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca5 - crypto/nx: Initialize 842 high and normal RxFIFO control registers

> Looks like the first commit was included in a recent 18.04 update
> (4.15.0-48.51), see
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1819989
>
> but I don't see the second one there yet.
>
> If this is still needed, I would suggest getting this bug mirrored to LP to
> put on Canonical's radar.

We need second commit (656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca) to fix this actual issue. But no use of having the first commit without second one. The first one just exports opal_check_token symbol which is used in the second commit.

bugproxy (bugproxy) on 2019-05-05
tags: added: architecture-ppc64le bugnameltc-166573 severity-medium targetmilestone-inin18041
Changed in ubuntu:
assignee: nobody → Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
affects: ubuntu → linux (Ubuntu)
Changed in ubuntu-power-systems:
importance: Undecided → Medium
status: New → Triaged
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Manoj Iyer (manjo) on 2019-05-06
Changed in linux (Ubuntu):
assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage) → Canonical Kernel Team (canonical-kernel-team)
importance: Undecided → Medium
Andrew Cloke (andrew-cloke) wrote :

Next steps: IBM to verify if this issue is already resolved in 19.04, and report back to this bug.

------- Comment From <email address hidden> 2019-05-16 13:25 EDT-------
This issue is already resolved in 19.04. For 18.04 I verified adding the second commit:
656ecc16e8fc2ab44b3d70e3fcc197a7020d0ca
fixes the problem:

root@ltc-boston25:/home/ubuntu/comp_selftest# cat /sys/kernel/debug/comp_selftest/status
^Creads 16 MBps: 45228/45132 peak 45697/46027 off 0/0/0 len 10000/10000/10000
root@ltc-boston25:/home/ubuntu/comp_selftest# cat /proc/version
Linux version 4.15.0-51-generic (root@ltc-wspoon3) (gcc version 7.3.0 (Ubuntu 7.3.0-27ubuntu1~18.04)) #55 SMP Thu May 16 11:23:12 CDT 2019

Manoj Iyer (manjo) wrote :

I have backported the patch and built a test kernel in ppa:ubuntu-power-triage/lp1827755 could you please verify that this kernel works for you?

Andrew Cloke (andrew-cloke) wrote :

IBM is also working on a justification for this patch, that they are planning to add to this bug.

Manoj Iyer (manjo) on 2019-05-20
Changed in ubuntu-power-systems:
status: Triaged → Incomplete
Changed in linux (Ubuntu):
status: New → Incomplete
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-20 15:09 EDT-------
I tried that kernel out and it works:
root@ltc-boston25:/home/ubuntu/comp_selftest# cat /proc/version
Linux version 4.15.0-51-generic (buildd@bos02-ppc64el-003) (gcc version 7.4.0 (Ubuntu 7.4.0-1ubuntu1~18.04)) #55~lp1827755+build.1-Ubuntu SMP Fri May 17 18:53:29 UTC 2019
root@ltc-boston25:/home/ubuntu/comp_selftest# insmod comp_selftest.ko
root@ltc-boston25:/home/ubuntu/comp_selftest# ./testit
root@ltc-boston25:/home/ubuntu/comp_selftest# dmesg |tail
<snip>
[ 354.958178] comp_selftest: loading out-of-tree module taints kernel.
[ 354.958396] comp_selftest: module verification failed: signature and/or required key missing - tainting kernel
[ 361.124620] compression self test starting
[ 361.124623] compressor: 842-nx
[ 361.124624] repeat: Y
[ 361.124625] threads: 16
[ 361.124627] offsets 0-0/1, 0-0/1, 0-0/1
[ 361.124630] lengths 10000-10000/1, 10000-10000/1, 10000-10000/1
root@ltc-boston25:/home/ubuntu/comp_selftest# cat /sys/kernel/debug/comp_selftest/status
^Creads 16 MBps: 45192/39510 peak 45509/43884 off 0/0/0 len 10000/10000/10000
root@ltc-boston25:/home/ubuntu/comp_selftest#

Where comp_selftest is the comp_selftest.c/README located here:
https://github.com/sukadev/linux/tree/ee0c8b0c3dcb8856c5998ac21e01916ea430f687/crypto/comp_selftest
and testit is just a copy/paste from the README of the commands to get the test running (up to status).

bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2019-05-20 15:29 EDT-------
We need this fix in 18.04.X. Without these patches, 842 compression is broken which can affect zswap or other workloads that can exploit HW compression. Simple fix which modifies only NX module

Changed in ubuntu-power-systems:
status: Incomplete → Triaged
Manoj Iyer (manjo) on 2019-05-21
Changed in ubuntu-power-systems:
importance: Medium → Critical
Changed in linux (Ubuntu):
importance: Medium → Critical
status: Incomplete → In Progress
Changed in ubuntu-power-systems:
status: Triaged → In Progress
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → Manoj Iyer (manjo)
Changed in ubuntu-power-systems:
assignee: Canonical Kernel Team (canonical-kernel-team) → Manoj Iyer (manjo)
Manoj Iyer (manjo) on 2019-05-22
description: updated
Manoj Iyer (manjo) on 2019-05-29
Changed in ubuntu-power-systems:
status: In Progress → Fix Committed
Frank Heimes (frank-heimes) wrote :

Changing to Fix Committed since the patch got applied.

Changed in linux (Ubuntu):
status: In Progress → Fix Committed
Stefan Bader (smb) on 2019-06-13
Changed in linux (Ubuntu Bionic):
importance: Undecided → Critical
status: New → Fix Committed
Changed in linux (Ubuntu):
status: Fix Committed → Fix Released

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Mike Ranweiler (mranweil) wrote :

I tested it and it looks good:
root@ltc-boston25:/home/ubuntu/comp_selftest# insmod comp_selftest.ko
root@ltc-boston25:/home/ubuntu/comp_selftest# ./testit
root@ltc-boston25:/home/ubuntu/comp_selftest# dmesg |tail
[ 10.561010] audit: type=1400 audit(1561358069.720:8): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/NetworkManager/nm-dhcp-helper" pid=3496 comm="apparmor_parser"
[ 10.561014] audit: type=1400 audit(1561358069.720:9): apparmor="STATUS" operation="profile_load" profile="unconfined" name="/usr/lib/connman/scripts/dhclient-script" pid=3496 comm="apparmor_parser"
[ 920.103139] comp_selftest: loading out-of-tree module taints kernel.
[ 920.103358] comp_selftest: module verification failed: signature and/or required key missing - tainting kernel
[ 931.871741] compression self test starting
[ 931.871744] compressor: 842-nx
[ 931.871745] repeat: Y
[ 931.871745] threads: 16
[ 931.871747] offsets 0-0/1, 0-0/1, 0-0/1
[ 931.871749] lengths 10000-10000/1, 10000-10000/1, 10000-10000/1
root@ltc-boston25:/home/ubuntu/comp_selftest# cat /sys/kernel/debug/comp_selftest/status
^Creads 16 MBps: 47121/42300 peak 47786/47470 off 0/0/0 len 10000/10000/10000
root@ltc-boston25:/home/ubuntu/comp_selftest# cat /proc/version
Linux version 4.15.0-53-generic (buildd@bos02-ppc64el-009) (gcc version 7.3.0 (Ubuntu 7.3.0-16ubuntu3)) #57-Ubuntu SMP Thu Jun 13 09:28:40 UTC 2019
root@ltc-boston25:/home/ubuntu/comp_selftest#

tags: added: verification-done-bionic
removed: verification-needed-bionic
Launchpad Janitor (janitor) wrote :
Download full text (11.2 KiB)

This bug was fixed in the package linux - 4.15.0-55.60

---------------
linux (4.15.0-55.60) bionic; urgency=medium

  * linux: 4.15.0-55.60 -proposed tracker (LP: #1834954)

  * Request backport of ceph commits into bionic (LP: #1834235)
    - ceph: use atomic_t for ceph_inode_info::i_shared_gen
    - ceph: define argument structure for handle_cap_grant
    - ceph: flush pending works before shutdown super
    - ceph: send cap releases more aggressively
    - ceph: single workqueue for inode related works
    - ceph: avoid dereferencing invalid pointer during cached readdir
    - ceph: quota: add initial infrastructure to support cephfs quotas
    - ceph: quota: support for ceph.quota.max_files
    - ceph: quota: don't allow cross-quota renames
    - ceph: fix root quota realm check
    - ceph: quota: support for ceph.quota.max_bytes
    - ceph: quota: update MDS when max_bytes is approaching
    - ceph: quota: add counter for snaprealms with quota
    - ceph: avoid iput_final() while holding mutex or in dispatch thread

  * QCA9377 isn't being recognized sometimes (LP: #1757218)
    - SAUCE: USB: Disable USB2 LPM at shutdown

  * hns: fix ICMP6 neighbor solicitation messages discard problem (LP: #1833140)
    - net: hns: fix ICMP6 neighbor solicitation messages discard problem
    - net: hns: fix unsigned comparison to less than zero

  * Fix occasional boot time crash in hns driver (LP: #1833138)
    - net: hns: Fix probabilistic memory overwrite when HNS driver initialized

  * use-after-free in hns_nic_net_xmit_hw (LP: #1833136)
    - net: hns: fix KASAN: use-after-free in hns_nic_net_xmit_hw()

  * hns: attempt to restart autoneg when disabled should report error
    (LP: #1833147)
    - net: hns: Restart autoneg need return failed when autoneg off

  * systemd 237-3ubuntu10.14 ADT test failure on Bionic ppc64el (test-seccomp)
    (LP: #1821625)
    - powerpc: sys_pkey_alloc() and sys_pkey_free() system calls
    - powerpc: sys_pkey_mprotect() system call

  * [UBUNTU] pkey: Indicate old mkvp only if old and curr. mkvp are different
    (LP: #1832625)
    - pkey: Indicate old mkvp only if old and current mkvp are different

  * [UBUNTU] kernel: Fix gcm-aes-s390 wrong scatter-gather list processing
    (LP: #1832623)
    - s390/crypto: fix gcm-aes-s390 selftest failures

  * System crashes on hot adding a core with drmgr command (4.15.0-48-generic)
    (LP: #1833716)
    - powerpc/numa: improve control of topology updates
    - powerpc/numa: document topology_updates_enabled, disable by default

  * Kernel modules generated incorrectly when system is localized to a non-
    English language (LP: #1828084)
    - scripts: override locale from environment when running recordmcount.pl

  * [UBUNTU] kernel: Fix wrong dispatching for control domain CPRBs
    (LP: #1832624)
    - s390/zcrypt: Fix wrong dispatching for control domain CPRBs

  * CVE-2019-11815
    - net: rds: force to destroy connection if t_sock is NULL in
      rds_tcp_kill_sock().

  * Sound device not detected after resume from hibernate (LP: #1826868)
    - drm/i915: Force 2*96 MHz cdclk on glk/cnl when audio power is enabled
    - drm/i915: Save the old CDCLK atomic state
...

Changed in linux (Ubuntu Bionic):
status: Fix Committed → Fix Released
Changed in ubuntu-power-systems:
status: Fix Committed → Fix Released
Brad Figg (brad-figg) on 2019-07-24
tags: added: cscc

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-xenial' to 'verification-done-xenial'. If the problem still exists, change the tag 'verification-needed-xenial' to 'verification-failed-xenial'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-xenial
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Duplicates of this bug

Other bug subscribers