Kernel Panic when dasd-fba device is selected for install

Bug #1876011 reported by John George
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Fix Released
Undecided
Unassigned
curtin
Fix Released
Undecided
Unassigned
subiquity
Fix Released
Undecided
Unassigned
linux (Ubuntu)
Fix Released
Undecided
Unassigned

Bug Description

Stating a zVM install (either subiquity or d-i) and selecting dasd-fba devices leads to a kernel panic.

Details from the installer shell before the panic:

root@ubuntu-server:/# uname -a
Linux ubuntu-server 5.4.0-26-generic #30-Ubuntu SMP Mon Apr 20 16:57:22 UTC 2020 s390x s390x s390x GNU/Linux

root@ubuntu-server:/# cat /proc/cmdline
ip=10.245.208.13::10.245.208.1:255.255.255.0:s5lp1-gen03:enc600:none:10.245.208.1 url=ftp://10.13.0.2:21/ubuntu-live-server-20.04/focal-live-server-s390x.iso http_proxy=http://91.189.89.11:3128 --- quiet
root@ubuntu-server:/#

root@ubuntu-server:/# lsmod
Module Size Used by
dm_multipath 40960 0
scsi_dh_rdac 20480 0
scsi_dh_emc 16384 0
scsi_dh_alua 24576 0
vmur 20480 0
vfio_ccw 36864 0
vfio_mdev 16384 0
mdev 28672 2 vfio_ccw,vfio_mdev
vfio_iommu_type1 32768 0
vfio 36864 3 vfio_ccw,vfio_mdev,vfio_iommu_type1
sch_fq_codel 20480 1
drm 499712 0
drm_panel_orientation_quirks 16384 1 drm
i2c_core 77824 1 drm
ip_tables 32768 0
x_tables 45056 1 ip_tables
overlay 135168 1
nls_utf8 16384 1
isofs 49152 1
qeth_l2 45056 1
lcs 53248 0
zfcp 126976 0
scsi_transport_fc 69632 1 zfcp
raid10 65536 0
raid456 180224 0
async_raid6_recov 20480 1 raid456
async_memcpy 20480 1 raid456
async_pq 20480 1 raid456
async_xor 20480 2 async_pq,raid456
async_tx 20480 5 async_pq,async_memcpy,async_xor,raid456,async_raid6_recov
xor 16384 1 async_xor
raid6_pq 102400 3 async_pq,raid456,async_raid6_recov
libcrc32c 16384 1 raid456
raid1 53248 0
raid0 28672 0
linear 20480 0
pkey 32768 0
crc32_vx_s390 16384 1
ghash_s390 16384 0
prng 20480 4
aes_s390 28672 0
des_s390 20480 0
libdes 28672 1 des_s390
sha512_s390 16384 0
sha256_s390 16384 0
sha1_s390 16384 0
sha_common 16384 3 sha512_s390,sha256_s390,sha1_s390
qeth 135168 1 qeth_l2
dasd_fba_mod 24576 0
dasd_eckd_mod 131072 0
qdio 61440 3 qeth,zfcp,qeth_l2
ccwgroup 20480 3 qeth,lcs,qeth_l2
dasd_mod 143360 2 dasd_eckd_mod,dasd_fba_mod
zcrypt_cex4 20480 0
zcrypt 106496 2 pkey,zcrypt_cex4

root@ubuntu-server:/# dmesg | tail
[ 34.458754] audit: type=1400 audit(1588204779.286:14): apparmor="STATUS" operation="profile_load" profile="unconfined" name="snap.subiquity.subiquity-service" pid=1789 comm="apparmor_parser"
[ 190.647685] ctcm.151d85: CTCM driver initialized
[ 190.663097] dasd-fba.f36f2f: 0.0.0101: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 190.664748] dasda: dasda1
[ 193.364797] dasd-fba.f36f2f: 0.0.0102: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 193.366573] dasdb:(nonl) dasdb1
[ 195.743686] dasd-fba.f36f2f: 0.0.0103: New FBA DASD 9336/10 (CU 6310/80) with 16383 MB and 512 B/blk
[ 195.745327] dasdc:(nonl) dasdc1
[ 198.408631] dasd-fba.f36f2f: 0.0.0104: New FBA DASD 9336/10 (CU 6310/80) with 16384 MB and 512 B/blk
[ 198.411231] dasdd:(nonl) dasdd1

Dropped to shell after partition confirmation to run tail on syslog

root@ubuntu-server:/# tail -f /var/log/syslog
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Shutdown Plan:
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 6, 'device': '/sys/class/block/dm-0', 'dev_type': 'lvm'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 4, 'device': '/sys/class/block/dasda/dasda1', 'dev_type': 'partition'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: {'level': 2, 'device': '/sys/class/block/dasda', 'dev_type': 'disk'}
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: shutdown running on holder type: 'lvm' syspath: '/sys/class/block/dm-0'
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Running command ['dmsetup', 'splitname', 's5lp1--gen03--vg-s5lp3--gen3--lv', '-c', '--noheadings', '--separator', '=', '-o', 'vg_name,lv_name'] with allowed return codes [0] (capture=True)
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Wiping lvm logical volume: /dev/s5lp1-gen03-vg/s5lp3-gen3-lv
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: wiping 1M on /dev/s5lp1-gen03-vg/s5lp3-gen3-lv at offsets [0, -1048576]
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: using "lvremove" on s5lp1-gen03-vg/s5lp3-gen3-lv
Apr 30 00:10:08 ubuntu-server curtin_log.2234[2946]: Running command ['lvremove', '--force', '--force', 's5lp1-gen03-vg/s5lp3-gen3-lv'] with allowed return codes [0] (capture=False)

x3270 console output:

ubuntu-server login: Ý 145.304094¨ addressing exception: 0005 ilc:3 Ý#1¨ SMP
Ý 145.304101¨ Modules linked in: zfs(PO) zunicode(PO) zavl(PO) icp(PO) zlua(PO)
 zcommon(PO) znvpair(PO) spl(O) zlib_deflate bcache crc64 ctcm fsm dm_multipath
scsi_dh_rdac scsi_dh_emc scsi_dh_alua vmur vfio_ccw vfio_mdev mdev vfio_iommu_ty
pe1 vfio sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tabl
es overlay nls_utf8 isofs qeth_l2 lcs zfcp scsi_transport_fc raid10 raid456 asyn
c_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c ra
id1 raid0 linear pkey crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha
512_s390 qeth qdio ccwgroup sha256_s390 sha1_s390 sha_common zcrypt_cex4 dasd_ec
kd_mod dasd_fba_mod dasd_mod zcrypt
Ý 145.304140¨ CPU: 1 PID: 0 Comm: swapper/1 Tainted: P O 5.4.0-2
6-generic #30-Ubuntu
Ý 145.304145¨ Hardware name: IBM 2964 N63 400 (z/VM 6.4.0)
Ý 145.304150¨ Krnl PSW : 0404e00180000000 000003ff800e38a2 (dasd_fba_dump_sense
+0x282/0x4f0 Ýdasd_fba_mod¨)
Ý 145.304159¨ R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:
0 EA:302: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP sto
p from
 CPU 02.
03: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 03.
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 00.
00: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
02: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
03: HCPGSP2629I The virtual machine is placed in CP mode due to a SIGP stop from
 CPU 01.
Ý 145.304168¨ Krnl GPRS: 0b8e51db0000000f 0000000000000000 00000000e11f518e 000
000007e500000
Ý 145.304170¨ 0000000000000004 00000000af2f6538 0000000000000004 000
00000e11f5000
Ý 145.304171¨ 000000010000018e 00000000032451e8 0000000003245208 000
0000003245208
Ý 145.304173¨ 000000017e169100 00000000e2e267a0 000003ff800e3864 000
003e00029fc58
Ý 145.304181¨ Krnl Code: 000003ff800e3892: eb110002000d sllg %r1,%r1,
2
Ý 145.304181¨ 000003ff800e3898: a76a0004 ahi %r6,4
Ý 145.304181¨ #000003ff800e389c: e34130000014 lgf %r4,0(%r
1,%r3)
Ý 145.304181¨ >000003ff800e38a2: a78a0009 ahi %r8,9
Ý 145.304181¨ 000003ff800e38a6: c030000014d2 larl %r3,0000
03ff800e624a
Ý 145.304181¨ 000003ff800e38ac: c0e5fffffc00 brasl %r14,000
003ff800e30ac
Ý 145.304181¨ 000003ff800e38b2: ec66ffdc207e cij %r6,32,6
,000003ff800e386a
Ý 145.304181¨ 000003ff800e38b8: b9140018 lgfr %r1,%r8
Ý 145.304197¨ Call Trace:
Ý 145.304200¨ (Ý<000003ff800e3864>¨ dasd_fba_dump_sense+0x244/0x4f0 Ýdasd_fba_m
od¨)
Ý 145.304211¨ Ý<000003ff8002d4da>¨ dasd_block_tasklet+0x25a/0x470 Ýdasd_mod¨
Ý 145.304217¨ Ý<00000000aede4ab2>¨ tasklet_action_common.isra.0+0x82/0x160
Ý 145.304223¨ Ý<00000000af63e6c4>¨ __do_softirq+0x104/0x360
Ý 145.304225¨ Ý<00000000aede522e>¨ irq_exit+0x9e/0xc0
Ý 145.304228¨ Ý<00000000aed70b18>¨ do_IRQ+0x78/0xb0
Ý 145.304229¨ Ý<00000000af63d948>¨ io_int_handler+0x124/0x28c
Ý 145.304232¨ Ý<00000000aed675bc>¨ enabled_wait+0x3c/0xd0
Ý 145.304235¨ Last Breaking-Event-Address:
Ý 145.304237¨ Ý<00000000af63e938>¨ __s390_indirect_jump_r14+0x0/0xc
Ý 145.304240¨ Kernel panic - not syncing: Fatal exception in interrupt
01: HCPGIR450W CP entered; disabled wait PSW 00020001 80000000 00000000 AED7349E

Revision history for this message
John George (jog) wrote :
Revision history for this message
John George (jog) wrote :
John George (jog)
description: updated
Revision history for this message
Frank Heimes (fheimes) wrote :

The situation was recreateable on d-i (legacy image) with a very specific disk constellation and configuration.
It looks like this happens in case there is an existing, but incomplete LVM configuration across the activated FBA DASD disk(s), and d-i tries to identify all existing volumes (in preparation for a potential reuse), but fails in case one or more volumes are missing.

In detail (btw. happens on a test systems, hence the different installation):
- a set of four FBA DASDs existed where an LVM installation was done across all four
- during next install only one of the four got activated and the installation completed on this single FBA DASD disk
- next installation again with all four enabled, the failure happens during LVM setup

What wondered me was that manually cleaning the first 512 byte of the disk(s), like:
dd if=/dev/zero of=/dev/dasd? bs=512 count=1
was not enough - I needed to wipe the _entire_ disk space (in d-i, using dd).

I consider this as corner case and because there is a workaround (see above) and d-i is no longer supported in 20.04 (since subiquity became the new installer), this will probably need to be set to 'Won't fix' - for d-i.
However, FBA support for subiquity needs to be discussed.

Changed in ubuntu-z-systems:
status: New → Triaged
Revision history for this message
Ryan Harper (raharper) wrote :

FBA disks are not _yet_ supported in curtin. The stacktrace indicates (to me) there's a kernel issue going on as well. Certainly writing zeros to an FBA shouldn't crash/oops a kernel.

Can we get pointers to documentation on FBA devices?

Changed in curtin:
status: New → Incomplete
Revision history for this message
Frank Heimes (fheimes) wrote :
Download full text (3.7 KiB)

Details about DASD disks (ECKD as well as FBAs) can be found in the 'Device Drivers, Features, and Commands on Ubuntu Server 18.04 LTS, SC34-2765' guide:
http://public.dhe.ibm.com/software/dw/linux390/docu/lub0dd01.pdf
(the 18.04 is the latest of these guides for Ubuntu - they are only created for LTS releases)

Here is a brief summary of all s390x-specific disk storage device types:

* zFCP (SCSI-over-Fibre Channel (FCP) devices and SCSI devices):
- lszdev | grep zfcp # incl. zfcp-host (FCP devices) and zfcp-lun (zfcp-attached SCSI devices)
- can have up to 15 partitions
- partition with fdisk, parted, partman
- disk device type is just 'fba'
==> already supported

* DASDs in general (FICON-attached Direct Access Storage Devices):
- lsdasd shows both types (ECKD and FBA):
- lszdev shows both types (dasd-eckd and dasd-fba)

   * ECKD - Enhanced Count Key Data (ECKD) DASDs:
   - lszdev | grep dasd-eckd
   - label: msdos partition table
   - can hold up to 3 partitions
   - partition with fdasd
   - requires dasdfmt with disk device type 'cdl' (Compatible disk layout, 'ldl' is legacy and no longer in use)
   ==> already supported

   * FBA - Fixed Block Architecture (FBA) DASDs:
   - lszdev | grep dasd-fba
   - can hold up to 3 partition(s
   - partition with fdisk, parted, partman
   - disk device type is just 'fba' (here NO dasdfmt usage)
   ==> tbd

Identification/characteristics/tooling:

$ lsdasd
Bus-ID Status Name Device Type BlkSz Size Blocks
================================================================================
0.0.0101 active dasda 94:0 FBA 512 16383MB 33554368
0.0.0300 active dasdb 94:4 ECKD 4096 7042MB 1802880
0.0.0200 active dasdc 94:8 ECKD 4096 7042MB 1802880
0.0.0400 active dasdd 94:12 ECKD 4096 21128MB 5409000
0.0.0102 active dasde 94:16 FBA 512 16383MB 33554368
0.0.0103 active dasdf 94:20 FBA 512 16383MB 33554368
0.0.0104 active dasdg 94:24 FBA 512 16384MB 33554592
$ lszdev dasd-eckd
TYPE ID ON PERS NAMES
dasd-eckd 0.0.0190 no no
dasd-eckd 0.0.0191 no no
dasd-eckd 0.0.019d no no
dasd-eckd 0.0.019e no no
dasd-eckd 0.0.0200 yes yes dasdc
dasd-eckd 0.0.0300 yes yes dasdb
dasd-eckd 0.0.0400 yes yes dasdd
dasd-eckd 0.0.0592 no no
dasd-eckd 0.0.1607 no no
$ lszdev dasd-fba
TYPE ID ON PERS NAMES
dasd-fba 0.0.0101 yes yes dasda
dasd-fba 0.0.0102 yes yes dasde
dasd-fba 0.0.0103 yes yes dasdf
dasd-fba 0.0.0104 yes yes dasdg
$ cat /sys/class/block/dasda/device/devtype
9336/10
$ cat /sys/class/block/dasdb/device/devtype
3390/0c
$ sudo dasdview -i /dev/dasda

--- general DASD information --------------------------------------------------
device node : /dev/dasda
busid : 0.0.0101
type : FBA
device type : hex 9336 dec 37686

--- DASD geometry -------------------------------------------------------------
number of cylinders : hex 309 dec 777
tracks per cylinder ...

Read more...

Revision history for this message
Ryan Harper (raharper) wrote :

Does fdasd work on DASD-fba ?

I'm hoping that DASD-fba is compatible enough with the curtin support so that we *don't* have to care about the differences.

seen your comment, it sounds like it does work:

"during next install only one of the four got activated and the installation completed on this single FBA DASD disk"

The stack-trace indicates to me that this isn't a curtin issue (at least not yet); we'll need kernel/foundations work on the bug exposed by poking at partial LVM devices over FBA.

Revision history for this message
Frank Heimes (fheimes) wrote :

We unfortunately need to care (at least a bit) about the differences, especially for the partitioning. fdasd only for DASD ECKD disks.

The statement:
"during next install only one of the four got activated and the installation completed on this single FBA DASD disk"
was while doing a d-i legacy installation (sorry).

Yes, the partial LVM issue with the FBA devices is with d-i and with subiquity.
(workaround is to wipe out the disks manually - but I needed to dd the entire disk - dd first block or so is not sufficient (and wipefs is not available in the d-i shell ...)

Revision history for this message
Frank Heimes (fheimes) wrote :

I have to correct one of the statement in #5.

DASD FBAs can only hold one partition

(https://www.ibm.com/support/knowledgecenter/linuxonibm/com.ibm.linux.z.lgdd/lgdd_r_dasd_layout_sum.html)

Revision history for this message
Frank Heimes (fheimes) wrote :

Checking the focal git tree this is included in the Ubuntu kernels:
Ubuntu-5.4.0-55.61
Ubuntu-5.4.0-56.62
Ubuntu-5.4.0-57.63
And Ubuntu-5.4.0-56 just migrated to updates:
 linux-generic | 5.4.0.56.59 | focal-updates | s390x
hence I'm updating this (focal) to Fix Released.

Changed in linux (Ubuntu):
status: New → Fix Released
Changed in subiquity:
status: New → In Progress
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
status: Triaged → In Progress
Revision history for this message
Frank Heimes (fheimes) wrote :

This bug is fixed with
https://github.com/CanonicalLtd/subiquity/releases/tag/21.01.1
for Ubuntu releases H, G and F.

Changed in subiquity:
status: In Progress → Fix Released
Changed in curtin:
status: Incomplete → Fix Released
Changed in ubuntu-z-systems:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers