Hot-unplug of disks leaves broken block devices around in Hirsute on s390x

Bug #1925211 reported by Christian Ehrhardt  on 2021-04-20
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
High
bugproxy
linux (Ubuntu)
High
Unassigned
Hirsute
High
Unassigned
systemd (Ubuntu)
Undecided
Unassigned
Hirsute
Undecided
Unassigned
udev (Ubuntu)
Undecided
Unassigned
Hirsute
Undecided
Unassigned

Bug Description

SRU Justification

[Impact]

Hot removal of disks under kvm on s390 does not result in the kernel removing the block device, which can lead to hung tasks and other issues.

[Test Plan]

See steps to reproduce the bug in the original description below. To test, execute these steps and confirm that the block device gets removed as expected.

[Where problems could occur]

The fix is a revert of the changes which introduced this regression. The original commit was a removal of supposedly unused code, but it seems a mistake was made in the logic around unregistering of disks. Reverting the changes could have potential to introduce bugs related to other virt devices, especially if it interacts badly with subsequent driver changes. However, the patch reverted cleanly, and reverting restores the code to the state which has been working well in previous kernels and seems like the lowest risk option until a proper fix is available upstream.

---

Repro:
#1 Get a guest
$ uvt-kvm create --disk 5 --password=ubuntu h release=hirsute arch=s390x label=daily
$ uvt-kvm wait h release=hirsute arch=s390x label=daily

#2 Attach and Detach disk
$ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 10M
$ virsh attach-disk h /var/lib/libvirt/images/test.qcow2 vdc
$ virsh detach-disk h vdc

From libvirts POV it is gone at this point
$ virsh domblklist h
 Target Source
------------------------------------------------------------------
 vda /var/lib/uvtool/libvirt/images/hirsute-2nd-zfs.qcow
 vdb /var/lib/uvtool/libvirt/images/hirsute-2nd-zfs-ds.qcow

But the guest thinks still it is present
$ uvt-kvm ssh --insecure hirsute-2nd-zfs lsblk
  ...
  vdc 252:32 0 20M 0 disk

This even remains a while after (not a race).

Any access to it in the guest will hang (as you'd expect of a non-existing blockdev)
4 0 1758 1739 20 0 12140 4800 - S+ pts/0 0:00 | \_ sudo mkfs.ext4 /dev/vdc
4 0 1759 1758 20 0 6924 1044 - D+ pts/0 0:00 | \_ mkfs.ext4 /dev/vdc

The result above was originally found with hirsute-guest@hirsute-host on s390x

I do NOT see the same with groovy-guest@hirsute-host on s390x
I DO see the same with hirsute-guest@groovy-host on s390x
  => Guest version dependent not Host/Hipervisor dependent
I DO see the same with ZFS disks AND LVM disks being added&removed
  => not type dependent
I do NOT see the same on x86.
  => Arch dependent ??

... the evidence slowly points towards an issue in the guest, damn we are so
close to release - but non-fully detaching disks are critical in my POV :-/

Filing this as-is for awareness, but certainly this will need more debugging.
Unsure where this is going to eventually I'll now file it for kernel/udev/systemd.
If there are any known issues/components that are related let me know please!
---
ProblemType: Bug
AlsaDevices: Error: command ['ls', '-l', '/dev/snd/'] failed with exit code 2: ls: cannot access '/dev/snd/': No such file or directory
AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
ApportVersion: 2.20.11-0ubuntu65
Architecture: s390x
ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
CRDA: N/A
CasperMD5CheckResult: unknown
DistroRelease: Ubuntu 21.04
IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
Lspci:

Lspci-vt: -[0000:00]-
Lsusb: Error: command ['lsusb'] failed with exit code 1:
Lsusb-t: Error: command ['lsusb', '-t'] failed with exit code 1: /sys/bus/usb/devices: No such file or directory
Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
Package: udev
PackageArchitecture: s390x
PciMultimedia:

ProcFB:

ProcKernelCmdLine: root=LABEL=cloudimg-rootfs
ProcVersionSignature: User Name 5.11.0-14.15-generic 5.11.12
RelatedPackageVersions:
 linux-restricted-modules-5.11.0-14-generic N/A
 linux-backports-modules-5.11.0-14-generic N/A
 linux-firmware N/A
RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
Tags: hirsute uec-images
Uname: Linux 5.11.0-14-generic s390x
UpgradeStatus: No upgrade log present (probably fresh install)
UserGroups: adm audio cdrom dialout dip floppy lxd netdev plugdev sudo video
_MarkForUpload: True
acpidump:

Changed in udev (Ubuntu):
importance: Undecided → Critical

This bug is missing log files that will aid in diagnosing the problem. While running an Ubuntu kernel (not a mainline or third-party kernel) please enter the following command in a terminal window:

apport-collect 1925211

and then change the status of the bug to 'Confirmed'.

If, due to the nature of the issue you have encountered, you are unable to run this command, please add a comment stating that fact and change the bug status to 'Confirmed'.

This change has been made by an automated script, maintained by the Ubuntu Kernel Team.

Changed in linux (Ubuntu):
status: New → Incomplete
tags: added: hirsute

This even happens with the most common image-file backed disks which further simplifies the repro:

$ sudo qemu-img create -f qcow2 /var/lib/libvirt/images/test.qcow2 10M
$ virsh attach-disk h /var/lib/libvirt/images/test.qcow2 vdc
$ virsh detach-disk h vdc

description: updated
tags: added: apport-collected uec-images
description: updated

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

apport information

Changed in linux (Ubuntu):
status: Incomplete → New
status: New → Confirmed

Hirsute - dmesg
# attach
[ 264.065866] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=5
[ 264.065906] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
[ 264.099347] virtio_blk virtio5: [vdc] 385 512-byte logical blocks (197 kB/193 KiB)
# detach
[ 289.702243] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=5
[ 289.702267] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0

Groovy - dmesg
# attach
[ 719.712747] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=5
[ 719.712758] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0
[ 719.745538] virtio_blk virtio5: [vdc] 385 512-byte logical blocks (197 kB/193 KiB)
[ 719.745542] vdc: detected capacity change from 0 to 197120
# detach
[ 780.425222] crw_info : CRW reports slct=0, oflw=0, chn=1, rsc=3, anc=0, erc=4, rsid=5
[ 780.425233] crw_info : CRW reports slct=0, oflw=0, chn=0, rsc=3, anc=0, erc=4, rsid=0

There is a difference in hirsute missing the capacity change. The newer kernel
might immediately do that instead of detecting as zero and then bumping it up.

The sizes reported in the Hirsute guest are correct (e.g. 20M for a 20M zfs case).
So this might be a red herring unless the missing message makes you very suspicious

Download full text (3.4 KiB)

Hirsute - udevadm

# attach
KERNEL[319.020043] add /devices/css0/0.0.0005 (css)
KERNEL[319.020088] add /devices/css0/0.0.0005/0.0.0005 (ccw)
KERNEL[319.020103] bind /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [319.022297] add /devices/css0/0.0.0005 (css)
UDEV [319.022802] add /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [319.023039] bind /devices/css0/0.0.0005/0.0.0005 (ccw)
KERNEL[319.025073] add /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
UDEV [319.025527] add /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
KERNEL[319.027524] add /devices/virtual/bdi/252:32 (bdi)
UDEV [319.028389] add /devices/virtual/bdi/252:32 (bdi)
KERNEL[319.048685] add /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
KERNEL[319.048743] bind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
UDEV [319.072936] add /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
UDEV [319.075862] bind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)

# detach
<no entry>

Groovy - udevadm

# attach
KERNEL[719.986637] add /devices/css0/0.0.0005 (css)
KERNEL[719.986669] add /devices/css0/0.0.0005/0.0.0005 (ccw)
KERNEL[719.986685] bind /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [719.988667] add /devices/css0/0.0.0005 (css)
KERNEL[719.992750] add /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
KERNEL[719.992757] add /devices/virtual/bdi/252:32 (bdi)
UDEV [719.993298] add /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [719.993520] bind /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [719.994097] add /devices/virtual/bdi/252:32 (bdi)
UDEV [719.995568] add /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
KERNEL[720.009523] add /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
KERNEL[720.009544] bind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
UDEV [720.058301] add /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
UDEV [720.059128] bind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)

# detach
KERNEL[780.673663] remove /devices/virtual/bdi/252:32 (bdi)
KERNEL[780.673928] remove /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
UDEV [780.676469] remove /devices/css0/0.0.0005/0.0.0005/virtio5/block/vdc (block)
UDEV [780.678185] remove /devices/virtual/bdi/252:32 (bdi)
KERNEL[780.708055] unbind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
KERNEL[780.708078] remove /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
KERNEL[780.708088] unbind /devices/css0/0.0.0005/0.0.0005 (ccw)
KERNEL[780.708101] remove /devices/css0/0.0.0005/0.0.0005 (ccw)
KERNEL[780.708109] unbind /devices/css0/0.0.0005 (css)
KERNEL[780.708117] remove /devices/css0/0.0.0005 (css)
UDEV [780.708779] unbind /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
UDEV [780.709099] remove /devices/css0/0.0.0005/0.0.0005/virtio5 (virtio)
UDEV [780.709397] unbind /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [780.709670] remove /devices/css0/0.0.0005/0.0.0005 (ccw)
UDEV [780.709971] unbind /devices/css0/0.0.0005 (css)
UDEV [780.710194] remove /devices/css0/0.0.0005 (css)

The events on attach are exactly the same, but in sl...

Read more...

FYI
[15:43] <rbalint> cpaelzer, this may be related https://github.com/systemd/systemd/blob/4d484e14bb9864cef1d124885e625f33bf31e91c/NEWS#L5

And indeed it is an interesting read, but for now the question is why only with 5.11@s390x then? Maybe there is something small to get this back in line? No one is proposing to change "all rules" in the last minute :-)

I was trying mainline builds [1] to pinpoint the kernel change (if any).
I had a few totally failing ones which stalled me a bit, but eventually I can show
this helpful list:

v5.8 - working
v5.10 - working
v5.10.31 - working
v5.11 - failing
v5.11.15 - failing
v5.12-rc8 - failing

So it isn't a >5.10 change that is backported into 5.10 stable.
And it isn't something that was fixed in later master or 5.11 stable releases.

If you want you could bisect the kernel for this, my env is great for testing
but very bad for kernel builds. Do you want to throw kernels my way or should I
try to build my own in another place?

[1]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

Ok build env set up and tested the starting points.
From that I also see:
v5.10 - working
v5.11 - failing

So from here I think I can try a bisect

Download full text (3.8 KiB)

commit 8cc0dcfdc1c0e0be107d0288f9c0cf1f4201be62
Author: Vineeth Vijayan <email address hidden>
Date: Fri Nov 20 09:36:38 2020 +0100

    s390/cio: remove pm support from ccw bus driver

    As part of removing broken pm-support from s390 arch, remove
    the pm callbacks from ccw-bus driver.The power-management functions
    are unused since the 'commit 394216275c7d ("s390: remove broken
    hibernate / power management support")'.

    Signed-off-by: Vineeth Vijayan <email address hidden>
    Reviewed-by: Peter Oberparleiter <email address hidden>
    Signed-off-by: Heiko Carstens <email address hidden>

 arch/s390/include/asm/ccwdev.h | 10 --
 drivers/s390/cio/cmf.c | 5 -
 drivers/s390/cio/device.c | 247 +----------------------------------------
 drivers/s390/cio/device.h | 1 -
 drivers/s390/cio/device_fsm.c | 6 -
 drivers/s390/cio/io_sch.h | 1 -

It seems it wasn't as unused/broken as they thought :-/
BTW the referenced 394216275c7d ("s390: remove broken hibernate / power management support") was in v5.7

Found by:

$ git bisect log
git bisect start
# good: [2c85ebc57b3e1817b6ce1a6b703928e113a90442] Linux 5.10
git bisect good 2c85ebc57b3e1817b6ce1a6b703928e113a90442
# bad: [f40ddce88593482919761f74910f42f4b84c004b] Linux 5.11
git bisect bad f40ddce88593482919761f74910f42f4b84c004b
# bad: [538fcf57aaee6ad78a05f52b69a99baa22b33418] Merge branches 'acpi-scan', 'acpi-pnp' and 'acpi-sleep'
git bisect bad 538fcf57aaee6ad78a05f52b69a99baa22b33418
# bad: [15b447361794271f4d03c04d82276a841fe06328] mm/lru: revise the comments of lru_lock
git bisect bad 15b447361794271f4d03c04d82276a841fe06328
# good: [b10733527bfd864605c33ab2e9a886eec317ec39] Merge tag 'amd-drm-next-5.11-2020-12-09' of git://people.freedesktop.org/~agd5f/linux into drm-next
git bisect good b10733527bfd864605c33ab2e9a886eec317ec39
# good: [2c075f38a708c578a752b738a45e8c26923eac2e] Merge branch 'radeon-fixes' (Radeon and amdgpu fixes)
git bisect good 2c075f38a708c578a752b738a45e8c26923eac2e
# bad: [76d4acf22b4847f6c7b2f9042366fbdc3d20f578] Merge tag 'perf-kprobes-2020-12-14' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
git bisect bad 76d4acf22b4847f6c7b2f9042366fbdc3d20f578
# bad: [f9b4240b074730f41c1ef8e0d695d10fb5bb1e27] Merge tag 'fixes-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux
git bisect bad f9b4240b074730f41c1ef8e0d695d10fb5bb1e27
# good: [d889797530c66f699170233474eab3361471e808] Merge remote-tracking branch 'arm64/for-next/fixes' into for-next/core
git bisect good d889797530c66f699170233474eab3361471e808
# good: [2f6ea6fb88ab9d517644a098fc670b4d5dd1735e] s390/tape: remove unsupported PM functions
git bisect good 2f6ea6fb88ab9d517644a098fc670b4d5dd1735e
# bad: [586592478b1fa8bb8cd6875a9191468e9b1a8b13] Merge tag 's390-5.11-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
git bisect bad 586592478b1fa8bb8cd6875a9191468e9b1a8b13
# good: [0b03beface02d519693edb8020f9811c67d5c88f] Merge tag 'm68k-for-v5.11-tag1' of git://git.kernel.org/pub/scm/linux/kernel/git/geert/linux-m68k
git bisect good 0b03beface02d519693edb8020f9811c67d5c88f
# bad: [613775d62ec60202f98d2c5f520e6e9ba6dd4ac4] s390/...

Read more...

But it makes sense in regard to the broken use case and explains why we see it only on s390x.

@sforshee - what should we do about it now that we know which commit it was - just revert it asap or are there bad dependencies to it?

Seth Forshee (sforshee) wrote :

The commit reverts cleanly. We need to confirm that reverting the commit does fix the issue. I put a test build here, please test.

https://people.canonical.com/~sforshee/lp1925211/

I doubt we can get a new kernel into the release. If it's extremely urgent we can consider a day 0 SRU kernel for hirsute, otherwise we can make sure it gets into the first normal SRU kernel.

summary: - Hot-unplug of disks leaves broken block devices around in Hirsute
+ Hot-unplug of disks leaves broken block devices around in Hirsute on
+ s390x
Changed in linux (Ubuntu Hirsute):
milestone: none → hirsute-updates

Thanks Seth.
I have verified 5.11.0-16-generic #17+lp1925211v202104201520 from the PPA and can confirm that the issue is gone.
=> The revert works as a fix \o/

For the severity/urgency at least we now know that it is s390x only (not "good" but reduces the amount of affected people).
I'll later (after I'm actually awake) if it also affects non-KVM disks (e.g. channel I/O detaches) then we can decide if 0-day or the next normal round will be ok.

Kai-Heng Feng (kaihengfeng) wrote :

The check was for resuming flag, but now it's inverted. Please test this patch.

Frank Heimes (fheimes) on 2021-04-21
Changed in ubuntu-z-systems:
assignee: nobody → bugproxy (bugproxy)
tags: added: reverse-proxy-bugzilla

Thanks Frank for adding the mirror request to this, because either way we sooner or later want a discussion with the s390x developers on this.

Hi Kaihenfeng,
Thanks for your patch suggestion! I'm semantically not sure it is the right thing - to clarify your theory is that before it checked !resuming and before had the check for !cdev maybe just to avoid a deference error. And now you assume that instead of !cdev it should check if there is a cdev there.
I'm unsure - if !cdev was indeed just to protect the dereference then maybe no check at all might be better. Which would then read "if the event is IO_SCH_ORPH_UNREG or IO_SCH_UNREG then do css_sch_device_unregister.

But that I'm not immediately convinced doesn't mean much and it is easy to test and surely worth a try, so I ran v5.11 (bad) plus your patch and the result will be useful to know in any case. It is working fine, that much I can tell you.

But if my thought above was right (it was only there to avoid the potential deference error), then why check it at all. If the condition cdev==NULL is possible it would now skip to to fully remove it - we might not need that at all.
And Since I brought up the idea of dropping the cdev check entirely that was worth a try as well. So now the third check of this morning is for:
--- a/drivers/s390/cio/device.c
+++ b/drivers/s390/cio/device.c
@@ -1525,8 +1525,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process)
        switch (action) {
        case IO_SCH_ORPH_UNREG:
        case IO_SCH_UNREG:
- if (!cdev)
- css_sch_device_unregister(sch);
+ css_sch_device_unregister(sch);
                break;
        case IO_SCH_ORPH_ATTACH:
        case IO_SCH_UNREG_ATTACH:

My patch with that change - in my test - is working as well.
Neither of the solutions has triggered other regressions in my setup - but then there are so many potential use-cases that I can't be sure without a further revew by subject matter experts.

So a summary of the recent tests:

5.11.0-16-generic #17+lp1925211v202104201520 (Seths full revert) - working
5.11.0lp1925211-patch-kaihengfeng-dirty - working
5.11.0nocdevcheck-paelzer-dirty - working

I think we'd want an answer from the IBM devs which solution (full revert, kaihenfeng patch, cpaelzer patch, another approach) they would prefer - then we can submit it upstream for them to include officially and we can carry it as delta until we rebase onto a version that has it applied anyway.

[1]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8cc0dcfdc1c0e0be107d0288f9c0cf1f4201be62

Kai-Heng Feng (kaihengfeng) wrote :

Yes, !resuming always evaluates to true because obviously the hot-unplug test was not done in any system PM operations.

I am also unsure whether cdev can be NULL in that context so I left it there. Other functions have similar cdev check too. I think IBM devs will have more insights on this.

Download full text (4.0 KiB)

I was wondering if I could trigger the same issue on an lpar as it would raise the severity IMHO. I have no claim on completeness of these tests in regard to all that could happen. I tried what I considered low hanging fruits in regard to this cross check.

Pre-condition each time
- a dasd attached to the system
- not used e.g. as a FS
- no aliases enabled
=> this (more or less) matches our former KVM based test case

$ lscss | grep 1523; lsdasd 0.0.1523; ll /dev/dasdc
0.0.1523 0.0.0183 3390/0c 3990/e9 yes f0 f0 ff 10111213 00000000
Bus-ID Status Name Device Type BlkSz Size Blocks
================================================================================
0.0.1523 active dasdc 94:8 ECKD 4096 7043MB 1803060
brw-rw---- 1 root disk 94, 8 Apr 21 06:21 /dev/dasdc

I was tracking the same state after the removing action and ran udevadm monitor to see is a unbind happened.

---

#1 cio purge
$ sudo cio_ignore -a 0.0.1523; sudo cio_ignore --purge

=> can't take away online devices, and I'm not interested in initial blocking ..

---

#2 chzdev
$ sudo chzdev --disable 0.0.1523

=> properly removed

---

#3 remove the dasds on the storage server
"LSS 08 SRV_SS0_0823" is mapped to s1lp5 0.0.1523 - removing that on the storage server

By default that fails:

Error - delete of volume SRV_SS0_0823 failed.
8:28 AM
Error: CMUN02948E IBM.2107-75DXP71/0823 The Delete logical volume task cannot be initiated because the Allow Host Pre-check Control Switch is set to true and the volume that you have specified is online to a host.

In the old UI the force option is available as checkbox - trying via that.
Done.

The system does not realize that the disk is gone, I/O on it (e.g. dasdfmt) goes into a deadlock.
After a while in that hang the system realizes it is in trouble:

dmesg:
Apr 21 06:42:32 s1lp5 kernel: dasd(eckd): I/O status report for device 0.0.1523:
                              dasd(eckd): in req: 00000000e903a5ac CC:00 FC:00 AC:00 SC:00 DS:00 CS:00 RC:-11
                              dasd(eckd): device 0.0.1523: Failing CCW: 0000000000000000
                              dasd(eckd): SORRY - NO VALID SENSE AVAILABLE
Apr 21 06:42:32 s1lp5 kernel: dasd(eckd): Related CP in req: 00000000e903a5ac
                              dasd(eckd): CCW 00000000c3e100c4: 2760000C 014C5FF0 DAT: 18000000 08231c00 00000000
                              dasd(eckd): CCW 00000000335dd238: 3E20401A 00A40000 DAT: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Apr 21 06:42:32 s1lp5 kernel: dasd(eckd):......
Apr 21 06:42:32 s1lp5 kernel: dasd-eckd.adb621: 0.0.1523: ERP failed for the DASD

udevadm:
KERNEL[1313.022835] remove /devices/css0/0.0.0183/0.0.1523/block/dasdc/dasdc1 (block)
UDEV [1313.024648] remove /devices/css0/0.0.0183/0.0.1523/block/dasdc/dasdc1 (block)

Even after the above - the disk is still "present":
$ lscss | grep 1523; lsdasd 0.0.1523; ll /dev/dasdc
0.0.1523 0.0.0183 3390/0c 3990/e9 yes f0 f0 0f 10111213 00000000
Bus-ID Status Name Device Type BlkSz Size Blocks
==================================================================...

Read more...

Changed in udev (Ubuntu Hirsute):
status: New → Invalid
Changed in systemd (Ubuntu Hirsute):
status: New → Invalid
Changed in linux (Ubuntu Hirsute):
status: Confirmed → Triaged
importance: Undecided → High
Changed in udev (Ubuntu Hirsute):
importance: Critical → Undecided
Changed in ubuntu-z-systems:
status: New → Triaged
importance: Undecided → High
bugproxy (bugproxy) on 2021-04-21
tags: added: architecture-s39064 bugnameltc-192463 severity-high targetmilestone-inin2104
tags: added: patch
Seth Forshee (sforshee) wrote :

The condition for css_sch_device_unregister(sch) also caught my eye, calling it unconditionally is probably closer to right because it was called in the !cdev case before, and in the attached patch it would no longer be called in this case. However I think in the short term the revert is the safest option, since the code will match what we already know was working in the groovy kernel. Once a fix is committed upstream, we can trade out the revert for that patch.

Seth Forshee (sforshee) on 2021-04-21
description: updated

------- Comment From <email address hidden> 2021-04-23 01:50 EDT-------
---snip---
--- a/drivers/s390/cio/device.c
+++ b/drivers/s390/cio/device.c
@@ -1525,8 +1525,7 @@ static int io_subchannel_sch_event(struct subchannel *sch, int process)
switch (action) {
case IO_SCH_ORPH_UNREG:
case IO_SCH_UNREG:
- if (!cdev)
- css_sch_device_unregister(sch);
+ css_sch_device_unregister(sch);
break;
case IO_SCH_ORPH_ATTACH:
case IO_SCH_UNREG_ATTACH:

I think we'd want an answer from the IBM devs which solution (full revert, kaihenfeng patch, cpaelzer patch, another approach) they would prefer - then we can submit it upstream for them to include officially and we can carry it as delta until we rebase onto a version that has it applied anyway.
---snip---

Thank you very much for reporting this. Yes. This is a leftover from the pm-remove patch and the right solution is as mentioned above here. We shall prepare the patch and share it to the external mailing-list.

Stefan Bader (smb) on 2021-04-23
Changed in linux (Ubuntu Hirsute):
status: Triaged → Fix Committed
Frank Heimes (fheimes) on 2021-04-23
Changed in ubuntu-z-systems:
status: Triaged → Fix Committed

This bug is awaiting verification that the kernel in -proposed solves the problem. Please test the kernel and update this bug with the results. If the problem is solved, change the tag 'verification-needed-hirsute' to 'verification-done-hirsute'. If the problem still exists, change the tag 'verification-needed-hirsute' to 'verification-failed-hirsute'.

If verification is not done by 5 working days from today, this fix will be dropped from the source code, and this bug will be closed.

See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Thank you!

tags: added: verification-needed-hirsute
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers