[UBUNTU 22.04] Kernel oops while removing device from cio_ignore list

Bug #1980951 reported by bugproxy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Medium
Skipper Bug Screeners
linux (Ubuntu)
Invalid
Undecided
Unassigned
Jammy
Fix Released
Medium
Canonical Kernel Team

Bug Description

SRU Justification:
==================

[Impact]

 * A kernel Oops occurs when a device is removed from the cio_ignore list
   (that is used to hide ccw devices) on a system with non-I/O subchannels
   (e.g. SCM or CHSC subchannels).

 * As a result, previously ignored devices cannot be activated again,
   and new devices cannot be found.

[Fix]

 * 0c3812c347bf 0c3812c347bfb0dc213556a195e79850c55702f5 "s390/cio: derive cdev information only for IO-subchannels"

[Test Plan]

 * An IBM zSystems or LinuxONE LPAR with with non-I/O subchannels.

 * Ubuntu Server 22.04 LTS (with GA kernel 5.15) installed.

 * Define a cio_ignore list (to hide ccw devices).

 * Remove a device from the cio_ignore list.

 * Due to hardware requirements this test needs to be conducted by IBM.

[Where problems could occur]

 * General problems may occur with ccw device activation/deactivation
   in case the new initialization is erroneous.

 * Issues may also have an impact on the type of ccw devices,
   and may no longer be limited to non-I/O subchannels.

 * Things could still go wrong in case cdev is still not properly
   derived from sch-type SUBCHANNEL_TYPE_IO.

[Other Info]

 * The commit is upstream since kernel 5.16 (next-20220315).
__________

---Problem Description from Peter---
A kernel oops occurs when a device is removed from the cio_ignore list on a system with non-I/O subchannels (e.g. SCM or CHSC subchannels). As a result, previously ignored devices cannot be activated, and new devices cannot be found.

---uname output---
Linux localhost 5.15.0-40-generic #43-Ubuntu SMP Wed Jun 15 12:53:53 UTC 2022 s390x s390x s390x GNU/Linux

Machine Type = s390x

---Steps to Reproduce---
On an s390x-LPAR with non-I/O subchannels, remove a device from the cio_ignore list.

Oops output:
 [ 51.597505] Unable to handle kernel pointer dereference in virtual kernel address space
[ 51.597516] Failing address: 2081e99191e98000 TEID: 2081e99191e98803
[ 51.597520] Fault in home space mode while using kernel ASCE.
[ 51.597524] AS:0000000082adc007 R3:0000000000000024
[ 51.597665] Oops: 0038 ilc:3 [#1] SMP
[ 51.597671] Modules linked in: scm_block chsc_sch vfio_ccw mdev vfio_iommu_type1 vfio eadm_sch sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua drm i2c_core drm_panel_orientation_quirks ip_tables x_tables btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_
xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear pkey zcrypt crc32_vx_s390 ghash_s390 prng chacha_s390 libchacha aes_s390 des_s390 libdes qeth_l2 bridge stp llc zfcp qeth qdio scsi_transport_fc ccwgroup sha512_s390 sha256_s390 sha1_s390 sha_common
[ 51.597735] CPU: 6 PID: 1418 Comm: cio_ignore Not tainted 5.15.0-40-generic #43-Ubuntu
[ 51.597740] Hardware name: IBM 2964 NC9 702 (LPAR)
[ 51.597742] Krnl PSW : 0704e00180000000 0000000081b0c632 (__unset_online+0x22/0x70)
[ 51.597752] R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[ 51.597756] Krnl GPRS: 0000000000000001 00000000004a3ca8 0000038007e49000 2081e99191e98528
[ 51.597760] 0000000000000000 0000000081b0c610 0000000000000000 0000000081b07f10
[ 51.597762] 000002aa00000000 0000000081b0c610 0000038007e49000 0000038007a6bc98
[ 51.597765] 00000000941f9200 000003ffa8cf95e0 0000000081818900 0000038007a6bbf8
[ 51.597773] Krnl Code: 0000000081b0c624: a784000c brc 8,0000000081b0c63c
[ 51.597773] 0000000081b0c628: e33030200002 ltg %r3,32(%r3)
[ 51.597773] #0000000081b0c62e: a7840007 brc 8,0000000081b0c63c
[ 51.597773] >0000000081b0c632: e33032000012 lt %r3,512(%r3)
[ 51.597773] 0000000081b0c638: a7740007 brc 7,0000000081b0c646
[ 51.597773] 0000000081b0c63c: a7290000 lghi %r2,0
[ 51.597773] 0000000081b0c640: c0f400089854 brcl 15,0000000081c1f6e8
[ 51.597773] 0000000081b0c646: ebeff0880024 stmg %r14,%r15,136(%r15)
[ 51.597818] Call Trace:
[ 51.597820] [<0000000081b0c632>] __unset_online+0x22/0x70
[ 51.597824] ([<00000000818188e6>] bus_for_each_dev+0x66/0xc0)
[ 51.597828] [<0000000081b0e378>] css_schedule_eval_cond+0xe8/0x130
[ 51.597832] [<0000000081b08062>] cio_ignore_write+0x152/0x190
[ 51.597838] [<00000000814be36e>] proc_reg_write+0x9e/0xf0
[ 51.597843] [<00000000813f3470>] vfs_write+0xc0/0x280
[ 51.597848] [<00000000813f59a8>] ksys_write+0x68/0x100
[ 51.597852] [<0000000081b495d0>] __do_syscall+0x1c0/0x1f0
[ 51.597857] [<0000000081b56858>] system_call+0x78/0xa0
[ 51.597862] Last Breaking-Event-Address:
[ 51.597864] [<000000000000a000>] 0xa000
[ 51.597868] ---[ end trace 166ba86e913d2c60 ]---

bugproxy (bugproxy)
tags: added: architecture-s39064 bugnameltc-198892 severity-medium targetmilestone-inin---
Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2022-07-07 09:04 EDT-------
The bug occurs because the following upstream kernel commit is missing from the Ubuntu 22.04 kernel:

commit 0c3812c347bfb0dc213556a195e79850c55702f5
Author: Vineeth Vijayan <email address hidden>
Date: Fri Sep 17 15:04:01 2021 +0200

s390/cio: derive cdev information only for IO-subchannels
cdev->online for the purge function must not be checked for the
non-IO subchannel type. Make sure that we are deriving the cdev only
from sch-type SUBCHANNEL_TYPE_IO.
Signed-off-by: Vineeth Vijayan <email address hidden>
Reviewed-by: Peter Oberparleiter <email address hidden>
Signed-off-by: Vasily Gorbik <email address hidden>

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=0c3812c347bfb0dc213556a195e79850c55702f5

Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
Changed in linux (Ubuntu):
importance: Undecided → Medium
Changed in ubuntu-z-systems:
importance: Undecided → Low
importance: Low → Medium
Changed in linux (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Frank Heimes (fheimes)
Frank Heimes (fheimes)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

A patched jammy kernel 5.15 test build in PPA is available here:
https://launchpad.net/~fheimes/+archive/ubuntu/lp1980951

Revision history for this message
Frank Heimes (fheimes) wrote :

SRU request submitted to the Ubuntu kernel team mailing list for jammy.
https://lists.ubuntu.com/archives/kernel-team/2022-July/thread.html#131703
Changing status to 'In Progress' for jammy.

Changed in linux (Ubuntu):
status: New → In Progress
Changed in ubuntu-z-systems:
status: New → In Progress
Changed in linux (Ubuntu):
assignee: Frank Heimes (fheimes) → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu Jammy):
status: New → In Progress
Changed in linux (Ubuntu):
status: In Progress → Invalid
Changed in linux (Ubuntu Jammy):
importance: Undecided → Medium
Changed in linux (Ubuntu):
importance: Medium → Undecided
Changed in linux (Ubuntu Jammy):
assignee: nobody → Canonical Kernel Team (canonical-kernel-team)
Changed in linux (Ubuntu):
assignee: Canonical Kernel Team (canonical-kernel-team) → nobody
Stefan Bader (smb)
Changed in linux (Ubuntu Jammy):
status: In Progress → Fix Committed
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: In Progress → Fix Committed
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-08 13:23 EDT-------
(In reply to comment #7)
> A patched jammy kernel 5.15 test build in PPA is available here:
> https://launchpad.net/~fheimes/+archive/ubuntu/lp1980951

I successfully verified that the PPA kernel fixes the problem:

Without fix
===========

root@localhost:~# uname -a
Linux localhost 5.15.0-40-generic #43-Ubuntu SMP Wed Jun 15 12:53:53 UTC 2022 s390x s390x s390x GNU/Linux
root@localhost:~# lscss | wc -l
6
root@localhost:~# dmesg -c >/dev/null
root@localhost:~# echo free all >/proc/cio_ignore
root@localhost:~# dmesg | head -n 5
[ 91.836284] Unable to handle kernel pointer dereference in virtual kernel address space
[ 91.836300] Failing address: 6370753837000000 TEID: 6370753837000803
[ 91.836305] Fault in home space mode while using kernel ASCE.
[ 91.836311] AS:000000000a9e8007 R3:0000000000000024
[ 91.836494] Oops: 0038 ilc:3 [#1] SMP
root@localhost:~# lscss | wc -l
6

With fix applied
================

root@localhost:~# uname -a
Linux localhost 5.15.0-41-generic #44~lp1980951-Ubuntu SMP Thu Jul 7 14:31:02 UTC 2022 s390x s390x s390x GNU/Linux
root@localhost:~# lscss | wc -l
6
root@localhost:~# dmesg -c >/dev/null
root@localhost:~# echo free all >/proc/cio_ignore
root@localhost:~# dmesg
root@localhost:~# lscss | wc -l
2961

Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux/5.15.0-43.46 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-jammy' to 'verification-done-jammy'. If the problem still exists, change the tag 'verification-needed-jammy' to 'verification-failed-jammy'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-jammy
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-15 09:47 EDT-------
*** Bug 198784 has been marked as a duplicate of this bug. ***

bugproxy (bugproxy)
tags: added: targetmilestone-inin2204 verification-done-jammy
removed: targetmilestone-inin--- verification-needed-jammy
Revision history for this message
bugproxy (bugproxy) wrote :

------- Comment From <email address hidden> 2022-07-15 10:10 EDT-------
fix was verified (like comment #9), hence changing Tag to 'verification-done-jammy'

Revision history for this message
Launchpad Janitor (janitor) wrote :
Download full text (3.6 KiB)

This bug was fixed in the package linux - 5.15.0-43.46

---------------
linux (5.15.0-43.46) jammy; urgency=medium

  * jammy/linux: 5.15.0-43.46 -proposed tracker (LP: #1981243)

  * Packaging resync (LP: #1786013)
    - debian/dkms-versions -- update from kernel-versions (main/2022.07.11)

  * nbd: requests can become stuck when disconnecting from server with qemu-nbd
    (LP: #1896350)
    - nbd: don't handle response without a corresponding request message
    - nbd: make sure request completion won't concurrent
    - nbd: don't clear 'NBD_CMD_INFLIGHT' flag if request is not completed
    - nbd: fix io hung while disconnecting device

  * Ubuntu 22.04 and 20.04 DPC Fixes for Failure Cases of DownPort Containment
    events (LP: #1965241)
    - PCI/portdrv: Rename pm_iter() to pcie_port_device_iter()
    - PCI: pciehp: Ignore Link Down/Up caused by error-induced Hot Reset
    - [Config] Enable config option CONFIG_PCIE_EDR

  * [SRU] Ubuntu 22.04 Feature Request-Add support for a NVMe-oF-TCP CDC Client
    - TP 8010 (LP: #1948626)
    - nvme: add CNTRLTYPE definitions for 'identify controller'
    - nvme: send uevent on connection up
    - nvme: expose cntrltype and dctype through sysfs

  * [UBUNTU 22.04] Kernel oops while removing device from cio_ignore list
    (LP: #1980951)
    - s390/cio: derive cdev information only for IO-subchannels

  * Jammy Charmed OpenStack deployment fails over connectivity issues when using
    converged OVS bridge for control and data planes (LP: #1978820)
    - net/mlx5e: TC NIC mode, fix tc chains miss table

  * Hairpin traffic does not work with centralized NAT gw (LP: #1967856)
    - net: openvswitch: fix misuse of the cached connection on tuple changes

  * alsa: asoc: amd: the internal mic can't be dedected on yellow carp machines
    (LP: #1980700)
    - ASoC: amd: Add driver data to acp6x machine driver
    - ASoC: amd: Add support for enabling DMIC on acp6x via _DSD

  * AMD ACP 6.x DMIC Supports (LP: #1949245)
    - ASoC: amd: add Yellow Carp ACP6x IP register header
    - ASoC: amd: add Yellow Carp ACP PCI driver
    - ASoC: amd: add acp6x init/de-init functions
    - ASoC: amd: add platform devices for acp6x pdm driver and dmic driver
    - ASoC: amd: add acp6x pdm platform driver
    - ASoC: amd: add acp6x irq handler
    - ASoC: amd: add acp6x pdm driver dma ops
    - ASoC: amd: add acp6x pci driver pm ops
    - ASoC: amd: add acp6x pdm driver pm ops
    - ASoC: amd: enable Yellow carp acp6x drivers build
    - ASoC: amd: create platform device for acp6x machine driver
    - ASoC: amd: add YC machine driver using dmic
    - ASoC: amd: enable Yellow Carp platform machine driver build
    - ASoC: amd: fix uninitialized variable in snd_acp6x_probe()
    - [Config] Enable AMD ACP 6 DMIC Support

  * [UBUNTU 20.04] Include patches to avoid self-detected stall with Secure
    Execution (LP: #1979296)
    - KVM: s390: pv: add macros for UVC CC values
    - KVM: s390: pv: avoid stalls when making pages secure

  * [22.04 FEAT] KVM: Attestation support for Secure Execution (crypto)
    (LP: #1959973)
    - drivers/s390/char: Add Ultravisor io device
    - s390/uv_uapi: depend on CONFIG_S390
    - [Co...

Read more...

Changed in linux (Ubuntu Jammy):
status: Fix Committed → Fix Released
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Fix Committed → Fix Released
Revision history for this message
Ubuntu Kernel Bot (ubuntu-kernel-bot) wrote :

This bug is awaiting verification that the linux-gkeop-5.15/5.15.0-1003.5~20.04.2 kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-focal' to 'verification-done-focal'. If the problem still exists, change the tag 'verification-needed-focal' to 'verification-failed-focal'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

Revision history for this message
Frank Heimes (fheimes) wrote :

linux-gkeop-5.15 is out of scope for this bug (does not exist for s390x).
To unblock the process I'll add the 'verification-done-focal' tag.

tags: added: verification-done-focal
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.