Kernel Panic when dasd-fba device is selected for install

Bug #1876011 reported by John George
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
curtin
Fix Released
Undecided
Unassigned
subiquity
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Stating a zVM install (either subiquity or d-i) and selecting dasd-fba devices leads to a kernel panic.

Details from the installer shell before the panic:

root@ubuntu-server:/# uname -a
Linux ubuntu-server 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:57:22 UTC 2020 s390x s390x s390x GNU/Linux

root@ubuntu-server:/# cat /proc/cmdline
ip=10.245.208.13::10.245.208.1:255.255.255.0:s5lp1-gen03:enc600:none:10.245.208.1 url=ftp://10.13.0.2:21/ubuntu-live-server-20.04/focal-live-server-s390x.iso http_proxy=http://91.189.89.11:3128 --- quiet
root@ubuntu-server:/#

root@ubuntu-server:/# lsmod
Module Size Used by
dm_multipath 40960 0
scsi_dh_rdac 20480 0
scsi_dh_emc 16384 0
scsi_dh_alua 24576 0
vmur 20480 0
vfio_ccw 36864 0
vfio_mdev 16384 0
mdev 28672 2 vfio_ccw,vfio_mdev
vfio_iommu_type1 32768 0
vfio 36864 3 vfio_ccw,vfio_mdev,vfio_iommu_type1
sch_fq_codel 20480 1
drm 499712 0
drm_panel_orientation_quirks 16384 1 drm
i2c_core 77824 1 drm
ip_tables 32768 0
x_tables 45056 1 ip_tables
overlay 135168 1
nls_utf8 16384 1
isofs 49152 1
qeth_l2 45056 1
lcs 53248 0
zfcp 126976 0
scsi_transport_fc 69632 1 zfcp
raid10 65536 0
raid456 180224 0
async_raid6_recov 20480 1 raid456
async_memcpy 20480 1 raid456
async_pq 20480 1 raid456
async_xor 20480 2 async_pq,raid456
async_tx 20480 5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
xor 16384 1 async_xor
raid6_pq 102400 3 async_pq,raid456,async_raid6_recov
libcrc32c 16384 1 raid456
raid1 53248 0
raid0 28672 0
linear 20480 0
pkey 32768 0
crc32_vx_s390 16384 1
ghash_s390 16384 0
prng 20480 4
aes_s390 28672 0
des_s390 20480 0
libdes 28672 1 des_s390
sha512_s390 16384 0
sha256_s390 16384 0
sha1_s390 16384 0
sha_common 16384 3 sha512_s390,sha256_s390,sha1_s390
qeth 135168 1 qeth_l2
dasd_fba_mod 24576 0
dasd_eckd_mod 131072 0
qdio 61440 3 qeth,zfcp,qeth_l2
ccwgroup 20480 3 qeth,lcs,qeth_l2
dasd_mod 143360 2 dasd_eckd_mod,dasd_fba_mod
zcrypt_cex4 20480 0
zcrypt 106496 2 pkey,zcrypt_cex4

root@ubuntu-server:/# dmesg | tail
[ 34.458754] audit: type=1400 audit(1588204779.286:14): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.subiquity.subiquity-service" pid=1789 comm="apparmor_parser"
[ 190.647685] ctcm.151d85: CTCM driver initialized
[ 190.663097] dasd-fba.f36f2f: 0.0.0101: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 190.664748] dasda: dasda1
[ 193.364797] dasd-fba.f36f2f: 0.0.0102: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 193.366573] dasdb:(nonl) dasdb1
[ 195.743686] dasd-fba.f36f2f: 0.0.0103: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 195.745327] dasdc:(nonl) dasdc1
[ 198.408631] dasd-fba.f36f2f: 0.0.0104: New FBA DASD 9336/10 (CU 6310/80) with 16384 MB and 512 B/blk
[ 198.411231] dasdd:(nonl) dasdd1

Dropped to shell after partition confirmation to run tail on syslog

root@ubuntu-server:/# tail -f /var/log/syslog
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Shutdown Plan:
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 6, 'device': '/sys/class/block/dm-0', 'dev_type': 'lvm'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 4, 'device': '/sys/class/block/dasda/dasda1', 'dev_type': 'partition'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 2, 'device': '/sys/class/block/dasda', 'dev_type': 'disk'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-0'
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Running command ['dmsetup', 'splitname', 's5lp1--gen03--vg-s5lp3--gen3--lv', '-c', '--noheadings', '--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Wiping lvm logical volume: /dev/s5lp1-gen03-vg/s5lp3-gen3-lv
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: wiping 1M on /dev/s5lp1-gen03-vg/s5lp3-gen3-lv at offsets [0, -1048576]
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: using "lvremove" on s5lp1-gen03-vg/s5lp3-gen3-lv
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Running command ['lvremove', '--force', '--force', 's5lp1-gen03-vg/s5lp3-gen3-lv'] with allowed return codes [0] (capture=False)

x3270 console output:

ubuntu-server login: Ý 145.304094¨ addressing exception: 0005 ilc:3 Ý#1¨ SMP
Ý 145.304101¨ Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO)
 zcommon(PO) znvpair(PO) spl(O) zlib_deflate bcache crc64 ctcm fsm dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmur vfio_ccw vfio_mdev mdev vfio_iommu_ty
pe1 vfio sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tabl
es overlay nls_utf8 isofs qeth_l2 lcs zfcp scsi_transport_fc raid10 raid456 asyn
c_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c ra
id1 raid0 linear pkey crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha
512_s390 qeth qdio ccwgroup sha256_s390 sha1_s390 sha_common zcrypt_cex4 dasd_ec
kd_mod dasd_fba_mod dasd_mod zcrypt
Ý 145.304140¨ CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 5.4.0-2
6-generic #30-Ubuntu
Ý 145.304145¨ Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
Ý 145.304150¨ Krnl PSW : 0404e00180000000 000003ff800e38a2 (dasd_fba_dump_sense
+0x282/0x4f0 Ýdasd_fba_mod¨)
Ý 145.304159¨ R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:
0 EA:302: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP sto
p from
 CPU 02.
03: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 03.
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
02: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
03: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
Ý 145.304168¨ Krnl GPRS: 0b8e51db0000000f 0000000000000000 00000000e11f518e 000
000007e500000
Ý 145.304170¨ 0000000000000004 00000000af2f6538 0000000000000004 000
00000e11f5000
Ý 145.304171¨ 000000010000018e 00000000032451e8 0000000003245208 000
0000003245208
Ý 145.304173¨ 000000017e169100 00000000e2e267a0 000003ff800e3864 000
003e00029fc58
Ý 145.304181¨ Krnl Code: 000003ff800e3892: eb110002000d sllg %r1,%r1,
2
Ý 145.304181¨ 000003ff800e3898: a76a0004 ahi %r6,4
Ý 145.304181¨ #000003ff800e389c: e34130000014 lgf %r4,0(%r
1,%r3)
Ý 145.304181¨ >000003ff800e38a2: a78a0009 ahi %r8,9
Ý 145.304181¨ 000003ff800e38a6: c030000014d2 larl %r3,0000
03ff800e624a
Ý 145.304181¨ 000003ff800e38ac: c0e5fffffc00 brasl %r14,000
003ff800e30ac
Ý 145.304181¨ 000003ff800e38b2: ec66ffdc207e cij %r6,32,6
,000003ff800e386a
Ý 145.304181¨ 000003ff800e38b8: b9140018 lgfr %r1,%r8
Ý 145.304197¨ Call Trace:
Ý 145.304200¨ (Ý<000003ff800e3864>¨ dasd_fba_dump_sense+0x244/0x4f0 Ýdasd_fba_m
od¨)
Ý 145.304211¨ Ý<000003ff8002d4da>¨ dasd_block_tasklet+0x25a/0x470 Ýdasd_mod¨
Ý 145.304217¨ Ý<00000000aede4ab2>¨ tasklet_action_common.isra.0+0x82/0x160
Ý 145.304223¨ Ý<00000000af63e6c4>¨ __do_softirq+0x104/0x360
Ý 145.304225¨ Ý<00000000aede522e>¨ irq_exit+0x9e/0xc0
Ý 145.304228¨ Ý<00000000aed70b18>¨ do_IRQ+0x78/0xb0
Ý 145.304229¨ Ý<00000000af63d948>¨ io_int_handler+0x124/0x28c
Ý 145.304232¨ Ý<00000000aed675bc>¨ enabled_wait+0x3c/0xd0
Ý 145.304235¨ Last Breaking-Event-Address:
Ý 145.304237¨ Ý<00000000af63e938>¨ __s390_indirect_jump_r14+0x0/0xc
Ý 145.304240¨ Kernel panic - not syncing: Fatal exception in interrupt
01: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 AED7349E

Revision history for this message
John George (jog) wrote :
Revision history for this message
John George (jog) wrote :
John George (jog)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

The situation was recreateable on d-i (legacy image) with a very specific disk constellation and configuration.
It looks like this happens in case there is an existing, but incomplete LVM configuration across the activated FBA DASD disk(s), and d-i tries to identify all existing volumes (in preparation for a potential reuse), but fails in case one or more volumes are missing.

In detail (btw. happens on a test systems, hence the different installation):
- a set of four FBA DASDs existed where an LVM installation was done across all four
- during next install only one of the four got activated and the installation completed on this single FBA DASD disk
- next installation again with all four enabled, the failure happens during LVM setup

What wondered me was that manually cleaning the first 512 byte of the disk(s), like:
dd if=/dev/zero of=/dev/dasd? bs=512 count=1
was not enough - I needed to wipe the _entire_ disk space (in d-i, using dd).

I consider this as corner case and because there is a workaround (see above) and d-i is no longer supported in 20.04 (since subiquity became the new installer), this will probably need to be set to 'Won't fix' - for d-i.
However, FBA support for subiquity needs to be discussed.

Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
Ryan Harper (raharper) wrote :

FBA disks are not _yet_ supported in curtin. The stacktrace indicates (to me) there's a kernel issue going on as well. Certainly writing zeros to an FBA shouldn't crash/oops a kernel.

Can we get pointers to documentation on FBA devices?

Changed in curtin:
status: New → Incomplete
Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (3.7 KiB)

Details about DASD disks (ECKD as well as FBAs) can be found in the 'Device Drivers, Features, and Commands on Ubuntu Server 18.04 LTS, SC34-2765' guide:
http://public.dhe.ibm.com/software/dw/linux390/docu/lub0dd01.pdf
(the 18.04 is the latest of these guides for Ubuntu - they are only created for LTS releases)

Here is a brief summary of all s390x-specific disk storage device types:

* zFCP (SCSI-over-Fibre Channel (FCP) devices and SCSI devices):
- lszdev | grep zfcp # incl. zfcp-host (FCP devices) and zfcp-lun (zfcp-attached SCSI devices)
- can have up to 15 partitions
- partition with fdisk, parted, partman
- disk device type is just 'fba'
==> already supported

* DASDs in general (FICON-attached Direct Access Storage Devices):
- lsdasd shows both types (ECKD and FBA):
- lszdev shows both types (dasd-eckd and dasd-fba)

   * ECKD - Enhanced Count Key Data (ECKD) DASDs:
   - lszdev | grep dasd-eckd
   - label: msdos partition table
   - can hold up to 3 partitions
   - partition with fdasd
   - requires dasdfmt with disk device type 'cdl' (Compatible disk layout, 'ldl' is legacy and no longer in use)
   ==> already supported

   * FBA - Fixed Block Architecture (FBA) DASDs:
   - lszdev | grep dasd-fba
   - can hold up to 3 partition(s
   - partition with fdisk, parted, partman
   - disk device type is just 'fba' (here NO dasdfmt usage)
   ==> tbd

Identification/characteristics/tooling:

$ lsdasd
Bus-ID Status Name Device Type BlkSz Size Blocks
================================================================================
0.0.0101 active dasda 94:0 FBA 512 16383MB 33554368
0.0.0300 active dasdb 94:4 ECKD 4096 7042MB 1802880
0.0.0200 active dasdc 94:8 ECKD 4096 7042MB 1802880
0.0.0400 active dasdd 94:12 ECKD 4096 21128MB 5409000
0.0.0102 active dasde 94:16 FBA 512 16383MB 33554368
0.0.0103 active dasdf 94:20 FBA 512 16383MB 33554368
0.0.0104 active dasdg 94:24 FBA 512 16384MB 33554592
$ lszdev dasd-eckd
TYPE ID ON PERS NAMES
dasd-eckd 0.0.0190 no no
dasd-eckd 0.0.0191 no no
dasd-eckd 0.0.019d no no
dasd-eckd 0.0.019e no no
dasd-eckd 0.0.0200 yes yes dasdc
dasd-eckd 0.0.0300 yes yes dasdb
dasd-eckd 0.0.0400 yes yes dasdd
dasd-eckd 0.0.0592 no no
dasd-eckd 0.0.1607 no no
$ lszdev dasd-fba
TYPE ID ON PERS NAMES
dasd-fba 0.0.0101 yes yes dasda
dasd-fba 0.0.0102 yes yes dasde
dasd-fba 0.0.0103 yes yes dasdf
dasd-fba 0.0.0104 yes yes dasdg
$ cat /sys/class/block/dasda/device/devtype
9336/10
$ cat /sys/class/block/dasdb/device/devtype
3390/0c
$ sudo dasdview -i /dev/dasda

--- general DASD information --------------------------------------------------
device node : /dev/dasda
busid : 0.0.0101
type : FBA
device type : hex 9336 dec 37686

--- DASD geometry -------------------------------------------------------------
number of cylinders : hex 309 dec 777
tracks per cylinder ...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :

Does fdasd work on DASD-fba ?

I'm hoping that DASD-fba is compatible enough with the curtin support so that we *don't* have to care about the differences.

seen your comment, it sounds like it does work:

"during next install only one of the four got activated and the installation completed on this single FBA DASD disk"

The stack-trace indicates to me that this isn't a curtin issue (at least not yet); we'll need kernel/foundations work on the bug exposed by poking at partial LVM devices over FBA.

Revision history for this message
Frank Heimes (fheimes) wrote :

We unfortunately need to care (at least a bit) about the differences, especially for the partitioning. fdasd only for DASD ECKD disks.

The statement:
"during next install only one of the four got activated and the installation completed on this single FBA DASD disk"
was while doing a d-i legacy installation (sorry).

Yes, the partial LVM issue with the FBA devices is with d-i and with subiquity.
(workaround is to wipe out the disks manually - but I needed to dd the entire disk - dd first block or so is not sufficient (and wipefs is not available in the d-i shell ...)

Revision history for this message
Frank Heimes (fheimes) wrote :

I have to correct one of the statement in #5.

DASD FBAs can only hold one partition

(https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_r_dasd_layout_sum.html)

Revision history for this message
Frank Heimes (fheimes) wrote :

Checking the focal git tree this is included in the Ubuntu kernels:
Ubuntu-5.4.0-55.61
Ubuntu-5.4.0-56.62
Ubuntu-5.4.0-57.63
And Ubuntu-5.4.0-56 just migrated to updates:
 linux-generic | 5.4.0.56.59 | focal-updates | s390x
hence I'm updating this (focal) to Fix Released.

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in subiquity:
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug is fixed with
https://github.com/CanonicalLtd/subiquity/releases/tag/21.01.1
for Ubuntu releases H, G and F.

Changed in subiquity:
status: In Progress → Fix Released
Changed in curtin:
status: Incomplete → Fix Released
Changed in ubuntu-z-systems:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.