【sec-0911】 fail to reset sec module

Bug #1943301 reported by Fred Kimmy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kunpeng920
Fix Released
Undecided
Ike Panhc
Ubuntu-20.04
Fix Released
Undecided
Ike Panhc
Ubuntu-20.04-hwe
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Invalid
Undecided
Unassigned
Focal
Fix Released
Medium
Ike Panhc

Bug Description

[Impact]
The crypto accelerator in Hi1620 SoC can not be reset.

[Test Plan]
1) echo 1 >/sys/devices/pci0000:74/0000:74:00.0/0000:75:00.1/reset
2) echo 1 >/sys/devices/pci0000:74/0000:74:00.0/0000:75:00.0/reset
3) busybox devmem 0x200141B01018 32 0x1
4) dmesg | grep "FLR resetting"

[Regression Risk]
hisi_qm only affects kunpeng920 platform. Minimal risk for other platform, and full regression test is needed on kunpeng920.

=======================

[Bug Description]

[Steps to Reproduce]

1、echo 1 >/sys/devices/pci0000:74/0000:74:00.0/0000:75:00.1/reset
2、echo 1 >/sys/devices/pci0000:74/0000:74:00.0/0000:75:00.0/reset
3、busybox devmem 0x200141B01018 32 0x1,

[Actual Results]
root@root:~# busybox devmem 0x200141B01018 32 0x1
root@root:~# dmesg
root@root:~#

[Expected Results]
reset ok

[Reproducibility]

[Additional information]
(Firmware version, kernel version, affected hardware, etc. if required):
OS: ubuntu 20.04.2
DRV(driver version): vermagic: 5.8.0-59-generic SMP mod_unload aarch64

[Resolution]
[v2,3/5] crypto: hisilicon/sec2 - update SEC initialization and reset 此patch未合入
https://patchwork.kernel.org/project/<email address hidden>/

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Is this the fix?

commit d0228aeb4d65c6550eac6e7c3a91520d2ceedf4f
Author: Longfang Liu <email address hidden>
Date: Tue Jul 7 09:15:39 2020 +0800

    crypto: hisilicon/sec2 - update SEC initialization and reset

    Updates the initialization and reset of SEC driver's
    register operation.

    Signed-off-by: Longfang Liu <email address hidden>
    Signed-off-by: Herbert Xu <email address hidden>

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Could you provide the reset ok message?

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Are these message expected when resetting completes?

$ sudo dmesg --clear
$ lspci | grep SEC
76:00.0 Network and computing encryption device: Huawei Technologies Co., Ltd. HiSilicon SEC Engine (rev 21)
b6:00.0 Network and computing encryption device: Huawei Technologies Co., Ltd. HiSilicon SEC Engine (rev 21)
$ echo 1 | sudo tee /sys/devices/pci0000:74/0000:74:01.0/0000:76:00.0/reset
1
$ echo 1 | sudo tee /sys/devices/pci0000:b4/0000:b4:01.0/0000:b6:00.0/reset
1
ubuntu@kreiken:~$ dmesg
[ 3436.555343] hisi_sec2 0000:76:00.0: FLR resetting...
[ 3436.661879] hisi_sec2 0000:76:00.0: FLR reset complete
[ 3449.145403] hisi_sec2 0000:b6:00.0: FLR resetting...
[ 3449.253377] hisi_sec2 0000:b6:00.0: FLR reset complete
$ cat /proc/version
Linux version 5.11.0-34-generic (buildd@bos02-arm64-015) (gcc (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0, GNU ld (GNU Binutils for Ubuntu) 2.34) #36~20.04.1-Ubuntu SMP Fri Aug 27 08:07:12 UTC 2021

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Marking as incomplete while waiting for confirmation about the reset messages.

Changed in kunpeng920:
status: New → Incomplete
Revision history for this message
Fred Kimmy (kongzizaixian) wrote (last edit ):

you use flr reset mode, this log is ok;you also use this following step to repruduce ras reset mode:
root@ubuntu:~# busybox devmem 0x200141B01018 32 0x1
root@ubuntu:~# dmesg
[49364.175033] {12}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[49364.175058] {12}[Hardware Error]: event severity: recoverable
[49364.175070] {12}[Hardware Error]: Error 0, type: recoverable
[49364.194827] {12}[Hardware Error]: section_type: PCIe error
[49364.200459] {12}[Hardware Error]: version: 4.0
[49364.205054] {12}[Hardware Error]: command: 0x0006, status: 0x0010
[49364.211291] {12}[Hardware Error]: device_id: 0000:b6:00.0
[49364.216837] {12}[Hardware Error]: slot: 0
[49364.221002] {12}[Hardware Error]: secondary_bus: 0x00
[49364.226202] {12}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa255
[49364.232872] {12}[Hardware Error]: class_code: 100000
[49364.238042] hisi_sec2 0000:b6:00.0: AER: aer_status: 0x00000000, aer_mask: 0x00000000
[49364.245841] hisi_sec2 0000:b6:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[49364.254165] hisi_sec2 0000:b6:00.0: AER: aer_uncor_severity: 0x00000000
[49364.260761] hisi_sec2 0000:b6:00.0: PCI error detected, state(=1)!!
[49364.260768] hisi_sec2 0000:b6:00.0: sec_axi_rresp_err_rint [error status=0x1] found
[49364.268398] hisi_sec2 0000:b6:00.0: Controller resetting...
[49364.268557] hisi_sec2 0000:b6:00.0: QM_DFX_MB_CNT = 0x00000000 ---> 0x00000008
[49364.268566] hisi_sec2 0000:b6:00.0: SEC_PF_ABNORMAL_INT_SOURCE = 0x00000000 ---> 0x00000001
[49364.268572] hisi_sec2 0000:b6:00.0: SEC_RAS_NFE_ENABLE = 0x00000177 ---> 0x00000176
[49364.269819] hisi_sec2 0000:b6:00.0: Controller reset complete
[49364.269841] pci 0000:b4:01.0: AER: device recovery successful
[49365.162750] pci 0000:bd:03.5: Removing from iommu group 46
root@ubuntu:~# uname -r
5.12.0-rc4+

Revision history for this message
Ike Panhc (ikepanhc) wrote :

The address for devmem does not work for kunpeng920 machine in our lab. We need updated address as SEC modules at PCI 75:00.0 and b5:00.0.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Next step: attempting to reproduce on a taishan 2280 system

Changed in kunpeng920:
status: Incomplete → In Progress
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Hi Xinwei,

I can reset SEC module with 5.11 kernel on Taishan 2280 system. Are these messages expected?

ubuntu@kreiken:~$ sudo dmesg --clear
ubuntu@kreiken:~$ lspci | grep SEC
76:00.0 Network and computing encryption device: Huawei Technologies Co., Ltd. HiSilicon SEC Engine (rev 21)
b6:00.0 Network and computing encryption device: Huawei Technologies Co., Ltd. HiSilicon SEC Engine (rev 21)
ubuntu@kreiken:~$ echo 1 | sudo tee /sys/devices/pci0000:74/0000:74:01.0/0000:76:00.0/reset
1
ubuntu@kreiken:~$ echo 1 | sudo tee /sys/devices/pci0000:b4/0000:b4:01.0/0000:b6:00.0/reset
1
ubuntu@kreiken:~$ dmesg
[ 471.629138] hisi_sec2 0000:76:00.0: FLR resetting...
[ 471.734131] hisi_sec2 0000:76:00.0: FLR reset complete
[ 479.167301] hisi_sec2 0000:b6:00.0: FLR resetting...
[ 479.274189] hisi_sec2 0000:b6:00.0: FLR reset complete
ubuntu@kreiken:~$ sudo busybox devmem 0x200141B01018 32 0x1
ubuntu@kreiken:~$ dmesg
[ 471.629138] hisi_sec2 0000:76:00.0: FLR resetting...
[ 471.734131] hisi_sec2 0000:76:00.0: FLR reset complete
[ 479.167301] hisi_sec2 0000:b6:00.0: FLR resetting...
[ 479.274189] hisi_sec2 0000:b6:00.0: FLR reset complete
[ 976.191626] {1}[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 0
[ 976.199885] {1}[Hardware Error]: event severity: recoverable
[ 976.205532] {1}[Hardware Error]: Error 0, type: recoverable
[ 976.211179] {1}[Hardware Error]: section_type: PCIe error
[ 976.216732] {1}[Hardware Error]: version: 4.0
[ 976.221251] {1}[Hardware Error]: command: 0x0006, status: 0x0010
[ 976.227413] {1}[Hardware Error]: device_id: 0000:b6:00.0
[ 976.232885] {1}[Hardware Error]: slot: 0
[ 976.236971] {1}[Hardware Error]: secondary_bus: 0x00
[ 976.242093] {1}[Hardware Error]: vendor_id: 0x19e5, device_id: 0xa255
[ 976.248689] {1}[Hardware Error]: class_code: 100000
[ 976.254045] hisi_sec2 0000:b6:00.0: AER: aer_status: 0x00000000, aer_mask: 0x00000000
[ 976.261886] hisi_sec2 0000:b6:00.0: AER: aer_layer=Transaction Layer, aer_agent=Receiver ID
[ 976.270260] hisi_sec2 0000:b6:00.0: AER: aer_uncor_severity: 0x00000000
[ 976.276909] hisi_sec2 0000:b6:00.0: PCI error detected, state(=1)!!
[ 976.276928] hisi_sec2 0000:b6:00.0: sec_axi_rresp_err_rint [error status=0x1] found
[ 976.284623] hisi_sec2 0000:b6:00.0: Controller resetting...
[ 976.286859] hisi_sec2 0000:b6:00.0: Controller reset complete
[ 976.286983] pci 0000:b4:01.0: AER: device recovery successful
ubuntu@kreiken:~$ uname -a
Linux kreiken 5.11.0-36-generic #40~20.04.1-Ubuntu SMP Sat Sep 18 02:14:50 UTC 2021 aarch64 aarch64 aarch64 GNU/Linux

Ike Panhc (ikepanhc)
Changed in kunpeng920:
assignee: nobody → Ike Panhc (ikepanhc)
Revision history for this message
Fred Kimmy (kongzizaixian) wrote :

I can reset SEC module with 5.11 kernel on Taishan 2280 system. Are these messages expected? =>yes, you can reproduce
it

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Cherry-picking commit d0228aeb4d65 ("crypto: hisilicon/sec2 - update SEC initialization and reset") causes kernel build failure.

ubuntu-focal/drivers/crypto/hisilicon/sec2/sec_main.c:457:43: error: ‘SEC_QM_ABNORMAL_INT_MASK’ undeclared (first use in this function)
  457 | writel(GENMASK(12, 0), sec->qm.io_base + SEC_QM_ABNORMAL_INT_MASK);
      | ^~~~~~~~~~~~~~~~~~~~~~~~

Check SEC_QM_ABNORMAL_INT_MASK and find out it is defined in first commit of sec_main.c "416d82204df4" but removed in commit eaebf4c3b103 ("crypto: hisilicon - Unify hardware error init/uninit into QM")

Shall we cherry-pick commit eaebf4c3b103 too?

Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: In Progress → Incomplete
Revision history for this message
Ike Panhc (ikepanhc) wrote :

In order to backport all missing patches from mainline to commit d0228aeb4d65 ("crypto: hisilicon/sec2 - update SEC initialization and reset"), it need to backport 61 patches which is too many against SRU policy. Look into patches needed for this bug and bug 1932117. There are only 4 patches needed, which shall be less risk then 61 patches.

d0228aeb4d65 crypto: hisilicon/sec2 - update SEC initialization and reset
a13c97118749 crypto: hisilicon/sec2 - Add workqueue for SEC driver.
57ca81245f4d crypto: hisilicon - Use one workqueue per qm instead of per qp
eaebf4c3b103 crypto: hisilicon - Unify hardware error init/uninit into QM

I am going to build kernel with these 4 patches and test.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Hi Xinwei,

I build kernel for testing on bug 1932117 and bug 1943301 with 4 patches backported.

https://kernel.ubuntu.com/~ikepanhc/lp1943301.1/

d0228aeb4d65 crypto: hisilicon/sec2 - update SEC initialization and reset
a13c97118749 crypto: hisilicon/sec2 - Add workqueue for SEC driver.
57ca81245f4d crypto: hisilicon - Use one workqueue per qm instead of per qp
eaebf4c3b103 crypto: hisilicon - Unify hardware error init/uninit into QM

Please test to see if there is any risk found. I will run full checkbox test too.

Revision history for this message
Andrew Cloke (andrew-cloke) wrote :

Full checkbox test passed. Waiting to hear from Xinwei on whether this has passed their testing.

Revision history for this message
Ike Panhc (ikepanhc) wrote :

Working on crypto module testcase..

Changed in kunpeng920:
status: Incomplete → In Progress
Ike Panhc (ikepanhc)
description: updated
Changed in linux (Ubuntu):
status: New → In Progress
Changed in linux (Ubuntu Focal):
status: New → In Progress
Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Focal):
assignee: nobody → Ike Panhc (ikepanhc)
Revision history for this message
Ike Panhc (ikepanhc) wrote :
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
importance: Undecided → Medium
Stefan Bader (smb)
Changed in linux (Ubuntu Focal):
status: In Progress → Fix Committed
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: In Progress → Fix Committed
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.4.0-106.120 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-focal
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Thanks. 5.4.0-106.120 kernel works for me.

tags: added: verification-done-focal
removed: verification-needed-focal
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-hwe-5.4/5.4.0-107.121~18.04.1 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-bionic' to 'verification-done-bionic'. If the problem still exists, change the tag 'verification-needed-bionic' to 'verification-failed-bionic'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-bionic
Revision history for this message
Ike Panhc (ikepanhc) wrote :

Looks linux-hwe-5.4/5.4.0-107.121~18.04.1 contains security fix without patch for this issue. I will wait for testing linux-hwe-5.4/5.4.0-108

tags: added: verification-failed-bionic
removed: verification-needed-bionic
Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (10.8 KiB)

This bug was fixed in the package linux - 5.4.0-109.123

---------------
linux (5.4.0-109.123) focal; urgency=medium

  * focal/linux: 5.4.0-109.123 -proposed tracker (LP: #1968290)

  * USB devices not detected during boot on USB 3.0 hubs (LP: #1968210)
    - SAUCE: Revert "Revert "xhci: Set HCD flag to defer primary roothub
      registration""
    - SAUCE: Revert "Revert "usb: core: hcd: Add support for deferring roothub
      registration""

linux (5.4.0-108.122) focal; urgency=medium

  * focal/linux: 5.4.0-108.122 -proposed tracker (LP: #1966740)

  * Packaging resync (LP: #1786013)
    - [Packaging] resync dkms-build{,--nvidia-N} from LRMv5
    - debian/dkms-versions -- update from kernel-versions (main/2022.03.21)

  * Low RX performance for 40G Solarflare NICs (LP: #1964512)
    - SAUCE: sfc: The size of the RX recycle ring should be more flexible

  * [UBUNTU 20.04] KVM: Enable storage key checking for intercepted instruction
    (LP: #1962831)
    - selftests: kvm: add _vm_ioctl
    - selftests: kvm: Introduce the TEST_FAIL macro
    - KVM: selftests: Add GUEST_ASSERT variants to pass values to host
    - KVM: s390: gaccess: Refactor gpa and length calculation
    - KVM: s390: gaccess: Refactor access address range check
    - KVM: s390: gaccess: Cleanup access to guest pages
    - s390/uaccess: introduce bit field for OAC specifier
    - s390/uaccess: fix compile error
    - s390/uaccess: Add copy_from/to_user_key functions
    - KVM: s390: Honor storage keys when accessing guest memory
    - KVM: s390: handle_tprot: Honor storage keys
    - KVM: s390: selftests: Test TEST PROTECTION emulation
    - KVM: s390: Add optional storage key checking to MEMOP IOCTL
    - KVM: s390: Add vm IOCTL for key checked guest absolute memory access
    - KVM: s390: Rename existing vcpu memop functions
    - KVM: s390: Add capability for storage key extension of MEM_OP IOCTL
    - KVM: s390: Update api documentation for memop ioctl
    - KVM: s390: Clarify key argument for MEM_OP in api docs
    - KVM: s390: Add missing vm MEM_OP size check

  * 【sec-0911】 fail to reset sec module (LP: #1943301)
    - crypto: hisilicon/sec2 - Add workqueue for SEC driver.
    - crypto: hisilicon/sec2 - update SEC initialization and reset

  * Lots of hisi_qm zombie task slow down system after stress test
    (LP: #1932117)
    - crypto: hisilicon - Use one workqueue per qm instead of per qp

  * Lots of hisi_qm zombie task slow down system after stress test
    (LP: #1932117) // 【sec-0911】 fail to reset sec module (LP: #1943301)
    - crypto: hisilicon - Unify hardware error init/uninit into QM

  * [UBUNTU 20.04] Fix SIGP processing on KVM/s390 (LP: #1962578)
    - KVM: s390: Simplify SIGP Set Arch handling
    - KVM: s390: Add a routine for setting userspace CPU state

  * Move virtual graphics drivers from linux-modules-extra to linux-modules
    (LP: #1960633)
    - [Packaging] Move VM DRM drivers into modules

  * Focal update: v5.4.178 upstream stable release (LP: #1964634)
    - audit: improve audit queue handling when "audit=1" on cmdline
    - ASoC: ops: Reject out of bounds values in snd_soc_put_volsw()
    - ASoC: ops: Reject out of bounds values in snd_...

Changed in linux (Ubuntu Focal):
status: Fix Committed → Fix Released
Ike Panhc (ikepanhc)
Changed in kunpeng920:
status: Fix Committed → Fix Released
Revision history for this message
Juerg Haefliger (juergh) wrote :

Fixes release in 5.4.0-108.122~18.04.1.

tags: added: verification-done-bionic
removed: verification-failed-bionic
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.