Ubuntu
linux package

[Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.

Bug #1881109 reported by bugproxy on 2020-05-28

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Ubuntu on IBM z Systems	Fix Released	Undecided	Skipper Bug Screeners
	linux (Ubuntu)	Fix Released	Undecided	Skipper Bug Screeners

Bug Description

We can reproduce a crash in the block layer with lots of stress on lots of SCSI disks (on an XIV storage server).

We seem to have several scsi stalls in the logs/errors (needs to be analyzed further) but in the end we do crash with this this calltrace.

[20832.901147] Failing address: 00007fe00dea8000 TEID: 00007fe00dea8403
[20832.901159] Fault in home space mode while using kernel ASCE.
[20832.901171] AS:000001d3cccf400b R2:000003fd0020800b R3:000003fd0020c007 S:000003fc1cc78800 P:0000000000000400
[20832.901264] Oops: 0011 ilc:2 [#1] SMP
[20832.901280] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup
[20832.901516] CPU: 29 PID: 389709 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu
[20832.901530] Hardware name: IBM 8561 T01 708 (LPAR)
[20832.901542] Krnl PSW : 0404e00180000000 000001d3cbd559be (try_to_wake_up+0x4e/0x700)
[20832.901575] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[20832.901744] Krnl GPRS: 000003fc917cd988 00007fe000000000 00007fe00000001e 0000000000000003
[20832.901750] 0000000000000000 000000000000004c 040003fd005a4600 0000000000000003
[20832.901753] 0000000000000003 00007fe00dea8454 0000000000000000 00007fe00dea7b00
[20832.901754] 000003f44bd9b300 000001d3cc587088 00007fe0021c7ae0 00007fe0021c7a60
[20832.901767] Krnl Code: 000001d3cbd559b2: 41902954 la %r9,2388(%r2)
                          000001d3cbd559b6: 582003ac l %r2,940
                         #000001d3cbd559ba: a7180000 lhi %r1,0
                         >000001d3cbd559be: ba129000 cs %r1,%r2,0(%r9)
                          000001d3cbd559c2: a77401c9 brc 7,000001d3cbd55d54
                          000001d3cbd559c6: e310b0080004 lg %r1,8(%r11)
                          000001d3cbd559cc: b9800018 ngr %r1,%r8
                          000001d3cbd559d0: a774001f brc 7,000001d3cbd55a0e
[20832.901784] Call Trace:
[20832.901816] ([<000001d3cc57e0ac>] cleanup_critical+0x0/0x474)
[20832.901823] [<000001d3cc1d16ba>] rq_qos_wake_function+0x8a/0xa0
[20832.901827] [<000001d3cbd74bde>] __wake_up_common+0x9e/0x1b0
[20832.901829] [<000001d3cbd750e4>] __wake_up_common_lock+0x94/0xe0
[20832.901830] [<000001d3cbd7515a>] __wake_up+0x2a/0x40
[20832.901835] [<000001d3cc1e8640>] wbt_done+0x90/0xe0
[20832.901837] [<000001d3cc1d17be>] __rq_qos_done+0x3e/0x60
[20832.901841] [<000001d3cc1bd5b0>] blk_mq_free_request+0xe0/0x140
[20832.901848] [<000001d3cc35fc60>] dm_softirq_done+0x140/0x230
[20832.901849] [<000001d3cc1bbfbc>] blk_done_softirq+0xbc/0xe0
[20832.901850] [<000001d3cc57e710>] __do_softirq+0x100/0x360
[20832.901853] [<000001d3cbd2525e>] irq_exit+0x9e/0xc0
[20832.901856] [<000001d3cbcb0b18>] do_IRQ+0x78/0xb0
[20832.901859] [<000001d3cc57dc28>] ext_int_handler+0x128/0x12c
[20832.901860] [<000001d3cc57d306>] sie_exit+0x0/0x46
[20832.901866] ([<000001d3cbce944a>] __vcpu_run+0x27a/0xc30)
[20832.901870] [<000001d3cbcf29a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840
[20832.901875] [<000001d3cbcdd242>] kvm_vcpu_ioctl+0x282/0x770
[20832.901880] [<000001d3cbf85f66>] do_vfs_ioctl+0x376/0x690
[20832.901881] [<000001d3cbf86304>] ksys_ioctl+0x84/0xb0
[20832.901883] [<000001d3cbf8639a>] __s390x_sys_ioctl+0x2a/0x40
[20832.901885] [<000001d3cc57d5f2>] system_call+0x2a6/0x2c8
[20832.901885] Last Breaking-Event-Address:
[20832.901889] [<000001d3cbd5607e>] wake_up_process+0xe/0x20
[20832.901897] Kernel panic - not syncing: Fatal exception in interrupt

Addl. comment:
In all dumps we've seen so far a crash similar to what Christian posted in
comment #2 would happen soon after the SCSI Midlayer went through Error
Recovery, after issuing SCSI Aborts and potentially SCSI Device and Target
resets.

From our most recent repro:
[20226.914939] sd 1:0:3:16687: [sdatp] tag#1256 Done: TIMEOUT_ERROR Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20226.914948] sd 1:0:3:16687: [sdatp] tag#1256 CDB: Read(10) 28 00 00 00 25 30 00 00 08 00
[20226.914953] sd 1:0:3:16687: [sdatp] tag#1256 abort scheduled
[20227.626587] sd 1:0:2:10: [sdlw] tag#1071 Done: SUCCESS Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[20227.626593] sd 1:0:2:10: [sdlw] tag#1071 CDB: Report target port groups a3 0a 00 00 00 00 00 00 10 00 00 00
[20227.626749] sd 1:0:2:2: [sdlb] tag#3388 cmd abort failed
[20227.626862] sd 1:0:2:2: [sdlb] tag#2356 aborting command
[20227.626865] sd 1:0:2:2: [sdlb] tag#2356 cmd abort failed
[20227.626867] sd 1:0:2:2: [sdlb] tag#2353 aborting command
[20227.626868] sd 1:0:2:2: [sdlb] tag#2353 cmd abort failed
[20227.626870] sd 1:0:2:2: [sdlb] tag#2351 aborting command
[20227.626871] sd 1:0:2:2: [sdlb] tag#2351 cmd abort failed
[20227.626873] sd 1:0:2:2: [sdlb] tag#1255 aborting command
[20227.626874] sd 1:0:2:2: [sdlb] tag#1255 cmd abort failed
[20227.626876] sd 1:0:2:2: [sdlb] tag#1254 aborting command
[20227.626877] sd 1:0:2:2: [sdlb] tag#1254 cmd abort failed
[20227.626879] sd 1:0:2:2: [sdlb] tag#1252 aborting command
[20227.626881] sd 1:0:2:2: [sdlb] tag#1252 cmd abort failed
[20227.626883] sd 1:0:2:2: [sdlb] tag#1250 aborting command
[20227.626884] sd 1:0:2:2: [sdlb] tag#1250 cmd abort failed
[20227.626886] sd 1:0:2:2: [sdlb] tag#1248 aborting command
[20227.626889] sd 1:0:2:16687: [sdahy] tag#1072 Done: NEEDS_RETRY Result: hostbyte=DID_TRANSPORT_DISRUPTED driverbyte=DRIVER_OK
[20227.626891] sd 1:0:2:2: [sdlb] tag#1248 cmd abort failed
[20227.626893] sd 1:0:2:2: [sdlb] tag#1245 aborting command
[20227.626895] sd 1:0:2:2: [sdlb] tag#1245 cmd abort failed
[20227.626897] sd 1:0:2:16687: [sdahy] tag#1072 CDB: Read(10) 28 00 00 00 59 98 00 00 08 00
[20227.626899] sd 1:0:2:2: [sdlb] tag#1244 aborting command
[20227.626901] sd 1:0:2:2: [sdlb] tag#1244 cmd abort failed
[20227.626903] sd 1:0:2:2: [sdlb] tag#1242 aborting command
[20227.626904] sd 1:0:2:2: [sdlb] tag#1242 cmd abort failed
[20227.626906] sd 1:0:2:2: [sdlb] tag#1240 aborting command
[20227.626907] sd 1:0:2:2: [sdlb] tag#1240 cmd abort failed
[20227.626909] sd 1:0:2:2: [sdlb] tag#302 aborting command
[20227.626910] sd 1:0:2:2: [sdlb] tag#302 cmd abort failed
[20227.626911] sd 1:0:2:2: [sdlb] tag#297 aborting command
[20227.626912] sd 1:0:2:2: [sdlb] tag#297 cmd abort failed
[20227.626914] sd 1:0:2:2: [sdlb] tag#295 aborting command
[20227.626915] sd 1:0:2:2: [sdlb] tag#295 cmd abort failed
[20227.626917] sd 1:0:2:2: [sdlb] tag#248 aborting command
[20227.626972] sd 1:0:2:2: [sdlb] tag#248 cmd abort failed
[20227.626977] sd 1:0:2:2: [sdlb] tag#246 aborting command
[20227.626979] sd 1:0:2:2: [sdlb] tag#246 cmd abort failed
[20227.626981] sd 1:0:2:2: [sdlb] tag#244 aborting command
[20227.626982] sd 1:0:2:2: [sdlb] tag#244 cmd abort failed
[20227.626984] sd 1:0:2:2: [sdlb] tag#242 aborting command
[20227.626985] sd 1:0:2:2: [sdlb] tag#242 cmd abort failed
[20227.626986] sd 1:0:2:2: [sdlb] tag#241 aborting command
[20227.626988] sd 1:0:2:2: [sdlb] tag#241 cmd abort failed
[20227.626989] sd 1:0:2:2: [sdlb] tag#239 aborting command
[20227.626990] sd 1:0:2:2: [sdlb] tag#239 cmd abort failed
[20227.626991] sd 1:0:2:2: [sdlb] tag#237 aborting command
[20227.626992] sd 1:0:2:2: [sdlb] tag#237 cmd abort failed
[20227.626994] sd 1:0:2:2: [sdlb] tag#236 aborting command
[20227.626995] sd 1:0:2:2: [sdlb] tag#236 cmd abort failed
[20227.626996] sd 1:0:2:2: [sdlb] tag#234 aborting command
[20227.626997] sd 1:0:2:2: [sdlb] tag#234 cmd abort failed
[20227.626999] sd 1:0:2:2: [sdlb] tag#199 aborting command
[20227.627000] sd 1:0:2:2: [sdlb] tag#199 cmd abort failed
[20227.627001] sd 1:0:2:2: [sdlb] tag#195 aborting command
[20227.627003] sd 1:0:2:2: [sdlb] tag#195 cmd abort failed
[20227.627004] sd 1:0:2:2: [sdlb] tag#193 aborting command
[20227.627006] sd 1:0:2:2: [sdlb] tag#193 cmd abort failed
[20227.627009] sd 1:0:3:2: [sdaia] tag#2355 aborting command
[20227.627011] sd 1:0:3:2: [sdaia] tag#2355 cmd abort failed
[20227.627013] sd 1:0:3:2: [sdaia] tag#2354 aborting command
[20227.627014] sd 1:0:3:2: [sdaia] tag#2354 cmd abort failed
[20227.627016] sd 1:0:3:2: [sdaia] tag#2352 aborting command
[20227.627017] sd 1:0:3:2: [sdaia] tag#2352 cmd abort failed
[20227.627019] sd 1:0:3:2: [sdaia] tag#2350 aborting command
[20227.627020] sd 1:0:3:2: [sdaia] tag#2350 cmd abort failed
[20227.627022] sd 1:0:3:2: [sdaia] tag#1253 aborting command
[20227.627023] sd 1:0:3:2: [sdaia] tag#1253 cmd abort failed
[20227.627025] sd 1:0:3:2: [sdaia] tag#1251 aborting command
[20227.627026] sd 1:0:3:2: [sdaia] tag#1251 cmd abort failed
[20227.627028] sd 1:0:3:2: [sdaia] tag#1249 aborting command
[20227.627030] sd 1:0:3:2: [sdaia] tag#1249 cmd abort failed
[20227.627031] sd 1:0:3:2: [sdaia] tag#1247 aborting command
[20227.627032] sd 1:0:3:2: [sdaia] tag#1247 cmd abort failed
[20227.627034] sd 1:0:3:2: [sdaia] tag#1246 aborting command
[20227.627035] sd 1:0:3:2: [sdaia] tag#1246 cmd abort failed
[20227.627037] sd 1:0:3:2: [sdaia] tag#1243 aborting command
[20227.627038] sd 1:0:3:2: [sdaia] tag#1243 cmd abort failed
[20227.627039] sd 1:0:3:2: [sdaia] tag#1241 aborting command
[20227.627041] sd 1:0:3:2: [sdaia] tag#1241 cmd abort failed
[20227.627042] sd 1:0:3:2: [sdaia] tag#1239 aborting command
[20227.627043] sd 1:0:3:2: [sdaia] tag#1239 cmd abort failed
[20227.627045] sd 1:0:3:2: [sdaia] tag#301 aborting command
[20227.627047] sd 1:0:3:2: [sdaia] tag#301 cmd abort failed
[20227.627049] sd 1:0:3:2: [sdaia] tag#299 aborting command
[20227.627050] sd 1:0:3:2: [sdaia] tag#299 cmd abort failed
[20227.627052] sd 1:0:3:2: [sdaia] tag#298 aborting command
[20227.627053] sd 1:0:3:2: [sdaia] tag#298 cmd abort failed
[20227.627055] sd 1:0:3:2: [sdaia] tag#296 aborting command
[20227.627056] sd 1:0:3:2: [sdaia] tag#296 cmd abort failed
[20227.627058] sd 1:0:3:2: [sdaia] tag#294 aborting command
[20227.627059] sd 1:0:3:2: [sdaia] tag#294 cmd abort failed
[20227.627061] sd 1:0:3:2: [sdaia] tag#247 aborting command
[20227.627063] sd 1:0:3:2: [sdaia] tag#247 cmd abort failed
[20227.627064] sd 1:0:3:2: [sdaia] tag#245 aborting command
[20227.627065] sd 1:0:3:2: [sdaia] tag#245 cmd abort failed
[20227.627067] sd 1:0:3:2: [sdaia] tag#243 aborting command
[20227.627068] sd 1:0:3:2: [sdaia] tag#243 cmd abort failed
[20227.627070] sd 1:0:3:2: [sdaia] tag#240 aborting command
[20227.627071] sd 1:0:3:2: [sdaia] tag#240 cmd abort failed
[20227.627073] sd 1:0:3:2: [sdaia] tag#238 aborting command
[20227.627074] sd 1:0:3:2: [sdaia] tag#238 cmd abort failed
[20227.627075] sd 1:0:3:2: [sdaia] tag#235 aborting command
[20227.627077] sd 1:0:3:2: [sdaia] tag#235 cmd abort failed
[20227.627079] sd 1:0:3:2: [sdaia] tag#198 aborting command
[20227.627081] sd 1:0:3:2: [sdaia] tag#198 cmd abort failed
[20227.627082] sd 1:0:3:2: [sdaia] tag#197 aborting command
[20227.627083] sd 1:0:3:2: [sdaia] tag#197 cmd abort failed
[20227.627085] sd 1:0:3:2: [sdaia] tag#196 aborting command
[20227.627086] sd 1:0:3:2: [sdaia] tag#196 cmd abort failed
[20227.627088] sd 1:0:3:2: [sdaia] tag#194 aborting command
[20227.627089] sd 1:0:3:2: [sdaia] tag#194 cmd abort failed
[20227.627091] sd 1:0:3:2: [sdaia] tag#192 aborting command
[20227.627092] sd 1:0:3:2: [sdaia] tag#192 cmd abort failed
[20227.627095] sd 1:0:3:16687: [sdatp] tag#1256 aborting command
[20227.627096] sd 1:0:3:16687: [sdatp] tag#1256 cmd abort failed
[20227.632049] qdio: 0.0.7100 ZFCP on SC 21e using AI:1 QEBSM:0 PRI:1 TDD:1 SIGA: W AP
[20227.705142] scsi host1: scsi_eh_1: waking up 0/57/57
[20227.705150] scsi host1: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 28
[20227.705282] scsi host1: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 28
[20227.705436] scsi host1: scsi_eh_prt_fail_stats: cmds failed: 0, cancel: 1
[20227.705437] scsi host1: Total of 57 commands on 3 devices require eh work
[20227.705534] sd 1:0:2:2: scsi_eh_1: Sending BDR
[20228.787915] sd 1:0:2:2: [sdlb] tag#3388 scsi_eh_done result: 2
[20228.787928] sd 1:0:2:2: [sdlb] tag#3388 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20228.787931] sd 1:0:2:2: [sdlb] tag#3388 CDB: Test Unit Ready 00 00 00 00 00 00
[20228.787933] sd 1:0:2:2: [sdlb] tag#3388 Sense Key : Unit Attention [current]
[20228.787937] sd 1:0:2:2: [sdlb] tag#3388 Add. Sense: Power on, reset, or bus device reset occurred
[20228.787939] sd 1:0:2:2: [sdlb] tag#3388 scsi_send_eh_cmnd timeleft: 1000
[20228.787940] sd 1:0:2:2: Power-on or device reset occurred
[20228.787950] sd 1:0:2:2: [sdlb] tag#3388 scsi_send_eh_cmnd: scsi_eh_completed_normally 2006
[20228.787952] sd 1:0:2:2: [sdlb] tag#3388 scsi_eh_tur return: 2001
[20228.788219] sd 1:0:2:2: [sdlb] tag#3388 scsi_eh_done result: 0
[20228.788236] sd 1:0:2:2: [sdlb] tag#3388 scsi_send_eh_cmnd timeleft: 1000
[20228.788237] sd 1:0:2:2: [sdlb] tag#3388 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[20228.788239] sd 1:0:2:2: [sdlb] tag#3388 scsi_eh_tur return: 2002
[20228.788297] sd 1:0:3:2: scsi_eh_1: Sending BDR
[20241.124934] sd 1:0:3:2: [sdaia] tag#2355 Done: FAILED Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20241.124948] sd 1:0:3:2: [sdaia] tag#2355 CDB: Test Unit Ready 00 00 00 00 00 00
[20241.124951] sd 1:0:3:2: [sdaia] tag#2355 scsi_send_eh_cmnd timeleft: 0
[20241.124953] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_tur return: 2003
[20241.125181] sd 1:0:3:16687: scsi_eh_1: Sending BDR
[20241.128570] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_done result: c0000
[20241.128574] sd 1:0:3:16687: [sdatp] tag#1256 Done: SUCCESS Result: hostbyte=DID_IMM_RETRY driverbyte=DRIVER_OK
[20241.128576] sd 1:0:3:16687: [sdatp] tag#1256 CDB: Test Unit Ready 00 00 00 00 00 00
[20241.128577] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd timeleft: 1000
[20241.128580] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd: scsi_eh_completed_normally 2003
[20241.128581] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_tur return: 2003
[20241.128583] scsi host1: scsi_eh_1: Sending target reset to target 3
[20247.159773] scsi 1:0:3:0: scsi scan: device exists on 1:0:3:0
[20247.159810] scsi 1:0:3:0: scsi scan: Sending REPORT LUNS to (try 0)
[20247.265072] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_done result: 2
[20247.265111] sd 1:0:3:2: [sdaia] tag#2355 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20247.265117] sd 1:0:3:2: [sdaia] tag#2355 CDB: Test Unit Ready 00 00 00 00 00 00
[20247.265120] sd 1:0:3:2: [sdaia] tag#2355 Sense Key : Unit Attention [current]
[20247.265123] sd 1:0:3:2: [sdaia] tag#2355 Add. Sense: Power on, reset, or bus device reset occurred
[20247.265124] sd 1:0:3:2: [sdaia] tag#2355 scsi_send_eh_cmnd timeleft: 1000
[20247.265128] sd 1:0:3:2: Power-on or device reset occurred
[20247.265141] sd 1:0:3:2: [sdaia] tag#2355 scsi_send_eh_cmnd: scsi_eh_completed_normally 2006
[20247.265143] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_tur return: 2001
[20247.265254] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_done result: 0
[20247.265278] sd 1:0:3:2: [sdaia] tag#2355 scsi_send_eh_cmnd timeleft: 1000
[20247.265281] sd 1:0:3:2: [sdaia] tag#2355 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[20247.265282] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_tur return: 2002
[20247.265439] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_done result: 2
[20247.265454] sd 1:0:3:16687: [sdatp] tag#1256 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20247.265456] sd 1:0:3:16687: [sdatp] tag#1256 CDB: Test Unit Ready 00 00 00 00 00 00
[20247.265458] sd 1:0:3:16687: [sdatp] tag#1256 Sense Key : Unit Attention [current]
[20247.265459] sd 1:0:3:16687: [sdatp] tag#1256 Add. Sense: Power on, reset, or bus device reset occurred
[20247.265461] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd timeleft: 1000
[20247.265462] sd 1:0:3:16687: Power-on or device reset occurred
[20247.265469] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd: scsi_eh_completed_normally 2006
[20247.265470] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_tur return: 2001
[20247.265568] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_done result: 0
[20247.265582] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd timeleft: 1000
[20247.265584] sd 1:0:3:16687: [sdatp] tag#1256 scsi_send_eh_cmnd: scsi_eh_completed_normally 2002
[20247.265585] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_tur return: 2002
[20247.265593] sd 1:0:2:2: [sdlb] tag#3388 scsi_eh_1: flush retry cmd
[20247.265603] sd 1:0:2:2: [sdlb] tag#2356 scsi_eh_1: flush retry cmd
[20247.265607] sd 1:0:2:2: [sdlb] tag#2353 scsi_eh_1: flush retry cmd
[20247.265610] sd 1:0:2:2: [sdlb] tag#2351 scsi_eh_1: flush retry cmd
[20247.265612] sd 1:0:2:2: [sdlb] tag#1255 scsi_eh_1: flush retry cmd
[20247.265615] sd 1:0:2:2: [sdlb] tag#1254 scsi_eh_1: flush retry cmd
[20247.265617] sd 1:0:2:2: [sdlb] tag#1252 scsi_eh_1: flush retry cmd
[20247.265619] sd 1:0:2:2: [sdlb] tag#1250 scsi_eh_1: flush retry cmd
[20247.265621] sd 1:0:2:2: [sdlb] tag#1248 scsi_eh_1: flush retry cmd
[20247.265622] sd 1:0:2:2: [sdlb] tag#1245 scsi_eh_1: flush retry cmd
[20247.265624] sd 1:0:2:2: [sdlb] tag#1244 scsi_eh_1: flush retry cmd
[20247.265625] sd 1:0:2:2: [sdlb] tag#1242 scsi_eh_1: flush retry cmd
[20247.265626] sd 1:0:2:2: [sdlb] tag#1240 scsi_eh_1: flush retry cmd
[20247.265629] sd 1:0:2:2: [sdlb] tag#302 scsi_eh_1: flush retry cmd
[20247.265631] sd 1:0:2:2: [sdlb] tag#297 scsi_eh_1: flush retry cmd
[20247.265633] sd 1:0:2:2: [sdlb] tag#295 scsi_eh_1: flush retry cmd
[20247.265635] sd 1:0:2:2: [sdlb] tag#248 scsi_eh_1: flush retry cmd
[20247.265638] sd 1:0:2:2: [sdlb] tag#246 scsi_eh_1: flush retry cmd
[20247.265640] sd 1:0:2:2: [sdlb] tag#244 scsi_eh_1: flush retry cmd
[20247.265642] sd 1:0:2:2: [sdlb] tag#242 scsi_eh_1: flush retry cmd
[20247.265643] sd 1:0:2:2: [sdlb] tag#241 scsi_eh_1: flush retry cmd
[20247.265645] sd 1:0:2:2: [sdlb] tag#239 scsi_eh_1: flush retry cmd
[20247.265647] sd 1:0:2:2: [sdlb] tag#237 scsi_eh_1: flush retry cmd
[20247.265648] sd 1:0:2:2: [sdlb] tag#236 scsi_eh_1: flush retry cmd
[20247.265649] sd 1:0:2:2: [sdlb] tag#234 scsi_eh_1: flush retry cmd
[20247.265651] sd 1:0:2:2: [sdlb] tag#199 scsi_eh_1: flush retry cmd
[20247.265653] sd 1:0:2:2: [sdlb] tag#195 scsi_eh_1: flush retry cmd
[20247.265654] sd 1:0:2:2: [sdlb] tag#193 scsi_eh_1: flush retry cmd
[20247.265655] sd 1:0:3:2: [sdaia] tag#2355 scsi_eh_1: flush retry cmd
[20247.265658] sd 1:0:3:2: [sdaia] tag#2354 scsi_eh_1: flush retry cmd
[20247.265659] sd 1:0:3:2: [sdaia] tag#2352 scsi_eh_1: flush retry cmd
[20247.265660] sd 1:0:3:2: [sdaia] tag#2350 scsi_eh_1: flush retry cmd
[20247.265662] sd 1:0:3:2: [sdaia] tag#1253 scsi_eh_1: flush retry cmd
[20247.265664] sd 1:0:3:2: [sdaia] tag#1251 scsi_eh_1: flush retry cmd
[20247.265665] sd 1:0:3:2: [sdaia] tag#1249 scsi_eh_1: flush retry cmd
[20247.265666] sd 1:0:3:2: [sdaia] tag#1247 scsi_eh_1: flush retry cmd
[20247.265668] sd 1:0:3:2: [sdaia] tag#1246 scsi_eh_1: flush retry cmd
[20247.265669] sd 1:0:3:2: [sdaia] tag#1243 scsi_eh_1: flush retry cmd
[20247.265670] sd 1:0:3:2: [sdaia] tag#1241 scsi_eh_1: flush retry cmd
[20247.265671] sd 1:0:3:2: [sdaia] tag#1239 scsi_eh_1: flush retry cmd
[20247.265673] sd 1:0:3:2: [sdaia] tag#301 scsi_eh_1: flush retry cmd
[20247.265674] sd 1:0:3:2: [sdaia] tag#299 scsi_eh_1: flush retry cmd
[20247.265676] sd 1:0:3:2: [sdaia] tag#298 scsi_eh_1: flush retry cmd
[20247.265677] sd 1:0:3:2: [sdaia] tag#296 scsi_eh_1: flush retry cmd
[20247.265678] sd 1:0:3:2: [sdaia] tag#294 scsi_eh_1: flush retry cmd
[20247.265679] sd 1:0:3:2: [sdaia] tag#247 scsi_eh_1: flush retry cmd
[20247.265681] sd 1:0:3:2: [sdaia] tag#245 scsi_eh_1: flush retry cmd
[20247.265682] sd 1:0:3:2: [sdaia] tag#243 scsi_eh_1: flush retry cmd
[20247.265683] sd 1:0:3:2: [sdaia] tag#240 scsi_eh_1: flush retry cmd
[20247.265684] sd 1:0:3:2: [sdaia] tag#238 scsi_eh_1: flush retry cmd
[20247.265686] sd 1:0:3:2: [sdaia] tag#235 scsi_eh_1: flush retry cmd
[20247.265687] sd 1:0:3:2: [sdaia] tag#198 scsi_eh_1: flush retry cmd
[20247.265688] sd 1:0:3:2: [sdaia] tag#197 scsi_eh_1: flush retry cmd
[20247.265689] sd 1:0:3:2: [sdaia] tag#196 scsi_eh_1: flush retry cmd
[20247.265690] sd 1:0:3:2: [sdaia] tag#194 scsi_eh_1: flush retry cmd
[20247.265692] sd 1:0:3:2: [sdaia] tag#192 scsi_eh_1: flush retry cmd
[20247.265693] sd 1:0:3:16687: [sdatp] tag#1256 scsi_eh_1: flush retry cmd
[20247.266024] scsi host1: waking up host to restart
[20247.268104] scsi 1:0:3:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
[20247.268107] scsi 1:0:3:0: scsi scan: REPORT LUN scan
[20247.268114] scsi 1:0:3:0: scsi scan: device exists on 1:0:3:0
[20247.268117] sd 1:0:3:1: scsi scan: device exists on 1:0:3:1
[20247.268119] sd 1:0:3:2: scsi scan: device exists on 1:0:3:2
....
[20247.269151] sd 1:0:3:16686: scsi scan: device exists on 1:0:3:16686
[20247.269158] sd 1:0:3:16687: scsi scan: device exists on 1:0:3:16687
[20247.269776] scsi host1: scsi_eh_1: sleeping
[20247.273485] scsi 1:0:0:0: scsi scan: INQUIRY pass 1 length 36
[20247.273912] scsi 1:0:0:0: scsi scan: INQUIRY successful with code 0x0
[20247.273944] scsi 1:0:0:0: scsi scan: INQUIRY pass 2 length 109
[20247.274650] scsi 1:0:0:0: scsi scan: INQUIRY successful with code 0x0
[20247.274685] scsi 1:0:0:0: scsi scan: peripheral device type of 31, no device added
[20247.278681] scsi 1:0:0:0: scsi scan: Sending REPORT LUNS to (try 0)
[20247.279981] scsi 1:0:0:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
[20247.279983] scsi 1:0:0:0: scsi scan: REPORT LUN scan
[20247.279984] scsi 1:0:0:0: scsi scan: device exists on 1:0:0:0
[20247.281405] scsi 1:0:1:0: scsi scan: INQUIRY pass 1 length 36
[20247.281517] scsi 1:0:1:0: scsi scan: INQUIRY successful with code 0x0
[20247.281529] scsi 1:0:1:0: scsi scan: INQUIRY pass 2 length 109
[20247.281607] scsi 1:0:1:0: scsi scan: INQUIRY successful with code 0x0
[20247.281612] scsi 1:0:1:0: scsi scan: peripheral device type of 31, no device added
[20247.282750] scsi 1:0:1:0: scsi scan: Sending REPORT LUNS to (try 0)
[20247.282872] scsi 1:0:1:0: scsi scan: REPORT LUNS successful (try 0) result 0x0
[20247.282873] scsi 1:0:1:0: scsi scan: REPORT LUN scan
[20247.282875] scsi 1:0:1:0: scsi scan: device exists on 1:0:1:0
[20247.295705] sd 1:0:2:2: Power-on or device reset occurred
[20247.295717] sd 1:0:2:2: tag#2125 Done: ADD_TO_MLQUEUE Result: hostbyte=DID_OK driverbyte=DRIVER_OK
[20247.295719] sd 1:0:2:2: tag#2125 CDB: Report target port groups a3 2a 00 00 00 00 00 00 00 80 00 00
[20247.295721] sd 1:0:2:2: tag#2125 Sense Key : Unit Attention [current]
[20247.295723] sd 1:0:2:2: tag#2125 Add. Sense: Power on, reset, or bus device reset occurred
[20247.299125] sd 1:0:2:2: alua: port group 00 state A non-preferred supports tolusnA
[20247.299734] sd 1:0:2:2: alua: port group 00 state A non-preferred supports tolusnA
[20247.315850] sd 1:0:3:16687: alua: port group 00 state A non-preferred supports tolusnA
[20482.227493] Unable to handle kernel pointer dereference in virtual kernel address space
[20482.227509] Failing address: 001fffe00c3c0000 TEID: 001fffe00c3c0403
[20482.227511] Fault in home space mode while using kernel ASCE.
[20482.227513] AS:000003fa68ac800b R2:000003fa3400800b R3:000003fa3400c007 S:000003f9fd093800 P:0000000000000400
[20482.227562] Oops: 0011 ilc:2 [#1] SMP
[20482.227569] Modules linked in: vhost_net vhost macvtap macvlan tap xfs xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge aufs overlay dm_service_time dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup
[20482.227654] CPU: 17 PID: 0 Comm: swapper/17 Kdump: loaded Not tainted 5.4.0-29-generic #33-Ubuntu
[20482.227656] Hardware name: ...
[20482.227658] Krnl PSW : 0404e00180000000 000003fa67b299be (try_to_wake_up+0x4e/0x700)
[20482.227672] R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[20482.227737] Krnl GPRS: 000003fa273e8588 001fffe000000000 001fffe000000012 0000000000000003
[20482.227739] 0000000000000000 0000000000000033 040003fa67a84b18 0000000000000003
[20482.227741] 0000000000000003 001fffe00c3c0284 0000000000000000 001fffe00c3bf930
[20482.227743] 000003fa330fa200 000003fa6835b088 001fffe001f87ae0 001fffe001f87a60
[20482.227758] Krnl Code: 000003fa67b299b2: 41902954 la %r9,2388(%r2)
                          000003fa67b299b6: 582003ac l %r2,940
                         #000003fa67b299ba: a7180000 lhi %r1,0
                         >000003fa67b299be: ba129000 cs %r1,%r2,0(%r9)
                          000003fa67b299c2: a77401c9 brc 7,000003fa67b29d54
                          000003fa67b299c6: e310b0080004 lg %r1,8(%r11)
                          000003fa67b299cc: b9800018 ngr %r1,%r8
                          000003fa67b299d0: a774001f brc 7,000003fa67b29a0e
[20482.227775] Call Trace:
[20482.227784] ([<000003fa6887c8b0>] __irq_regs+0x0/0x8)
[20482.227791] [<000003fa67fa56ba>] rq_qos_wake_function+0x8a/0xa0
[20482.227795] [<000003fa67b48bde>] __wake_up_common+0x9e/0x1b0
[20482.227797] [<000003fa67b490e4>] __wake_up_common_lock+0x94/0xe0
[20482.227799] [<000003fa67b4915a>] __wake_up+0x2a/0x40
[20482.227804] [<000003fa67fbc640>] wbt_done+0x90/0xe0
[20482.227806] [<000003fa67fa57be>] __rq_qos_done+0x3e/0x60
[20482.227809] [<000003fa67f915b0>] blk_mq_free_request+0xe0/0x140
[20482.227816] [<000003fa68133c60>] dm_softirq_done+0x140/0x230
[20482.227817] [<000003fa67f8ffbc>] blk_done_softirq+0xbc/0xe0
[20482.227822] [<000003fa68352710>] __do_softirq+0x100/0x360
[20482.227826] [<000003fa67af925e>] irq_exit+0x9e/0xc0
[20482.227829] [<000003fa67a84b18>] do_IRQ+0x78/0xb0
[20482.227831] [<000003fa68351c28>] ext_int_handler+0x128/0x12c
[20482.227835] [<000003fa67a7b5bc>] enabled_wait+0x3c/0xd0
[20482.227840] Last Breaking-Event-Address:
[20482.227844] [<000003fa67b2a07e>] wake_up_process+0xe/0x20
[20482.227872] Kernel panic - not syncing: Fatal exception in interrupt

You can see at 20226.914939 the start of common code scsi-mod error handling
(SCSI EH). It finds a bunch of commands that have been running for too long
(30s IIRC), and it tries to issue aborts for those commands. At 20227.632049
the FCP HBA resets; this reset is not directly related to the aborts issued by
SCSI EH - IOW we don't just reset the adapter each time we get an abort. A
side-effect of the HBA reset ist that the Abort Commands are not issued, hence
SCSI EH escalates to the next step and issues BDR at 20227.705534, subsequently
even Target reset at 20241.128583. But the HBA reset also mean the commands
that the midlayer wanted to be abort are effectively aborted. There won't be
any more completions for them (we also have mutual exclusion due to the CAS
operation in scsi_times_out() and scsi_mq_done() with SCMD_STATE_COMPLETE being
set in the command state).

Subsequently I/O works again. Evidence is that the TURs that the midlayer sends
during SCSI EH are successful (e.g. at 20228.787915 directly after the BDR).
After that we also see the midlayer scanning for new devices and such, which
all works fine apparently - and is also normal I/O.

The reason why our HBA goes through reset, and why the timeouts happen is a
separate issue that we'll research further.

As far as I've been able to find out, the crash in the WBT code is not directly
related to the HBA reset. We couldn't find ultimate proof in any of the data,
but it seems the crash is somehow related to the fact that the scsi midlayer
went through EH, and later tried to wake up a thread waiting in WBT that points
into nothing. So my current hypotheses is that WBT is somehow not integrated
well in this error handling case with the SCSI midlayer.

The same test with ECKD storage instead of SCSI does not crash.

We also chased after the requests that eventually
caused the crash in the examples I posted above. Digging into the dump we could
recover the request address at 0x3f89b5bb200. This request was serviced in a
dm-multipath request queue, and was cloned into 0x3f9cc25f300, which was
serviced by the SCSI midlayer. From there I could also see the LBA that this
request was operating on:

crash> print *((struct request *) 0x3f9cc25f300)
$20 = {
  ....
  __data_len = 0x0,
  __sector = 0x1598e5a0,
  bio = 0x0,
  biotail = 0x3f8c86a5a10,
  ....
}

After all data is serviced the __sector address is not increased anymore, so we
can see that this request was sent against LBA 0x1598e5a0. When I went through
out HBA traces this LBA never occurred with an error, so currently I don't have
much evidence pointing to the HBA driver messing this up.

Like I said, I currently feel like this crash results from some bad interaction
after SCSI EH between the SCSI midlayer and WBT. But I don't have any strong
evidence for that as of now, so I can't really be certain.

It is also a bit surprising that the time distance between End of EH and the
crash is so long:

[20247.315850] sd 1:0:3:16687: alua: port group 00 state A non-preferred supports tolusnA
[20482.227493] Unable to handle kernel pointer dereference in virtual kernel address space

nearly 235 seconds (~3 minutes 55 seconds).

Addl. Comment:

I did a little research, and it is indeed possible to disable WBT during runtime, its just a bit un-userfriendly for lots of disks, so I wrote a quick`n`dirty udev-rule file.

Is it possible for you to retry this test with WBT disabled? I think it might prove useful to know whether we can handle the occasional queue-slowdown with that feature disabled, and it should further narrow down what components are affected (IOW, if it runs fine with WBT disabled, someone needs to audit the WBT code, and such).

Here is the udev-rule to disable WBT for every block-device on the system:

    t35lp49 (1) ~ # cat /etc/udev/rules.d/99-disable-blk-wbt.rules
    #
    # Possible workaround for LTC 185913:
    # Disable the Writeback Throttling Feature.
    # Main interface is 'queue/wbt_lat_usec'. When writing '0' into it, the
    # feature should be disabled for the queue in question
    #
    # Rule Syntax: https://www.freedesktop.org/software/systemd/man/udev.html
    #

SUBSYSTEM!="block", GOTO="wbt_disable_end"
ACTION=="remove", GOTO="wbt_disable_end"

TEST{0644}=="queue/wbt_lat_usec", ATTR{queue/wbt_lat_usec}!="0", ATTR{queue/wbt_lat_usec}="0"

LABEL="wbt_disable_end"

Its a bit coarse (like I said, quick`n`dirty), but as I don't really know your storage-setup, this should catch everything. You can also test whether it worked before the test with `cat /sys/class/block/*/queue/wbt_lat_usec`; it should display '0'.

Regarding the queue slowdowns we meanwhile have a theory as well, but that will require running a modified kernel, so lets do one step at a time.

Tags:

bugproxy (bugproxy) on 2020-05-28

tags:	added: architecture-s3903164 bugnameltc-185913 severity-critical targetmilestone-inin2004
Changed in ubuntu:
assignee:	nobody → Skipper Bug Screeners (skipper-screen-team)
affects:	ubuntu → linux (Ubuntu)

Frank Heimes (fheimes) on 2020-05-28

Changed in ubuntu-z-systems:
assignee:	nobody → Skipper Bug Screeners (skipper-screen-team)

Revision history for this message

bugproxy (bugproxy) wrote on 2020-05-28: Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-05-28 12:14 EDT-------
(In reply to comment #18)
>
> I did a little research, and it is indeed possible to disable WBT during
> runtime, its just a bit un-userfriendly for lots of disks, so I wrote a
> quick`n`dirty udev-rule file.
>
> Is it possible for you to retry this test with WBT disabled? I think it
> might prove useful to know whether we can handle the occasional
> queue-slowdown with that feature disabled, and it should further narrow down
> what components are affected (IOW, if it runs fine with WBT disabled,
> someone needs to audit the WBT code, and such).
>
> Here is the udev-rule to disable WBT for every block-device on the system:
>
> t35lp49 (1) ~ # cat /etc/udev/rules.d/99-disable-blk-wbt.rules
> #
> # Possible workaround for LTC 185913:
> # Disable the Writeback Throttling Feature.
> # Main interface is 'queue/wbt_lat_usec'. When writing '0' into it, the
> # feature should be disabled for the queue in question
> #
> # Rule Syntax: https://www.freedesktop.org/software/systemd/man/udev.html
> #
>
> SUBSYSTEM!="block", GOTO="wbt_disable_end"
> ACTION=="remove", GOTO="wbt_disable_end"
>
> TEST{0644}=="queue/wbt_lat_usec", ATTR{queue/wbt_lat_usec}!="0",
> ATTR{queue/wbt_lat_usec}="0"
>
> LABEL="wbt_disable_end"
>
> Its a bit coarse (like I said, quick`n`dirty), but as I don't really know
> your storage-setup, this should catch everything. You can also test whether
> it worked before the test with `cat /sys/class/block/*/queue/wbt_lat_usec`;
> it should display '0'.

The last re-run we did with this in place - and WBT disabled - did not crash anymore - where it would be pretty consistent before. We still see timeouts/unit-resets in the midlayer, but otherwise the system stays up (still no perfect, but better than crashing outright).

So there is at least some evidence that this is something in the communication between scsi-midlayer/block-layer/wbt, when requests go through SCSI EH.

------- Comment From BBLOCK@de.ibm.com 2020-05-28 12:14 EDT-------
(In reply to comment #18)
>
> I did a little research, and it is indeed possible to disable WBT during
> runtime, its just a bit un-userfriendly for lots of disks, so I wrote a
> quick`n`dirty udev-rule file.
>
> Is it possible for you to retry this test with WBT disabled? I think it
> might prove useful to know whether we can handle the occasional
> queue-slowdown with that feature disabled, and it should further narrow down
> what components are affected (IOW, if it runs fine with WBT disabled,
> someone needs to audit the WBT code, and such).
>
> Here is the udev-rule to disable WBT for every block-device on the system:
>
> t35lp49 (1) ~ # cat /etc/udev/rules.d/99-disable-blk-wbt.rules
> #
> # Possible workaround for LTC 185913:
> #     Disable the Writeback Throttling Feature.
> #     Main interface is 'queue/wbt_lat_usec'. When writing '0' into it, the
> #     feature should be disabled for the queue in question
> #
> # Rule Syntax: https://www.freedesktop.org/software/systemd/man/udev.html
> #
>
> SUBSYSTEM!="block", GOTO="wbt_disable_end"
> ACTION=="remove", GOTO="wbt_disable_end"
>
> TEST{0644}=="queue/wbt_lat_usec", ATTR{queue/wbt_lat_usec}!="0",
> ATTR{queue/wbt_lat_usec}="0"
>
> LABEL="wbt_disable_end"
>
> Its a bit coarse (like I said, quick`n`dirty), but as I don't really know
> your storage-setup, this should catch everything. You can also test whether
> it worked before the test with `cat /sys/class/block/*/queue/wbt_lat_usec`;
> it should display '0'.

So there is at least some evidence that this is something in the communication between scsi-midlayer/block-layer/wbt, when requests go through SCSI EH.

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-05-28:

Hi Benjamin,
if it's an issue somewhere in scsi-midlayer/block-layer/wbt wouldn't it then also happen with zFCP on DS8k and on other patforms?
So far we did some testing with zFCP on DS8k (the only storage sub-system we have) as part of the release testing and server certification and on top we have constantly several zFCP systems currently running on 20.04 (probably less big systems and/or with less load), but so far we didn't faced a single crash.
So I'm assuming more that is is XIV related, no?

Revision history for this message

bugproxy (bugproxy) wrote on 2020-05-28:

------- Comment From <email address hidden> 2020-05-28 14:37 EDT-------
(In reply to comment #20)
> Hi Benjamin,
> if it's an issue somewhere in scsi-midlayer/block-layer/wbt wouldn't it then
> also happen with zFCP on DS8k and on other patforms?
> So far we did some testing with zFCP on DS8k (the only storage sub-system we
> have) as part of the release testing and server certification and on top we
> have constantly several zFCP systems currently running on 20.04 (probably
> less big systems and/or with less load), but so far we didn't faced a single
> crash.
> So I'm assuming more that is is XIV related, no?

Hey Frank,

I suspect this is a follow-on error from SCSI requests running into timeouts and subsequently being aborted and LUN/Target resets being send by the SCSI Error Handling code. Those cause abnormal request terminations (its rather unusual to have request timeouts) that might cause this WBT crash. At least that is my working theory so far.

I am looking into why the requests timeout in the first place in parallel to this report internally. But anyway, I don't think it should crash even with the timeouts. The last test also shows that if we disable WBT the setup doesn't seem to crash anymore, although the timeouts are still present - it "just" slows the workload for a time, but ultimately recovers.

At this point I don't have any evidence that XIV causes this problem.

Revision history for this message

bugproxy (bugproxy) wrote on 2020-06-17:

Download full text (6.0 KiB)

------- Comment From <email address hidden> 2020-06-17 06:59 EDT-------
O.K. we have some new insights here.

@<email address hidden> did some experiments on my behalf with a slightly modified Ubuntu kernel (based on 5.4.0-29) where I removed commit 3060781f2664 ("s390/qdio: allow to scan all Output SBALs in one go") (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3060781f2664d34af641247aeac62696405a3fde). We had a suspicion that this might be related to the queue-stalls/-slowdowns we always saw in the past before the crash in the WBT code. And to my slight suprise, not only did the queue-stalls/-slowdowns disappear, but the WBT crash still persisted. I checked all available logs and our driver traces from the dump and didn't find any indication what so ever that scsi-EH was ever invoked, nor that we went through adapter-recovery any time after the initial instance boot - no command timeouts or anything.

So in my mind, while I can't proof yet that 3060781f2664 was really responsible for the queue-stalls/-slowdowns - that might still just be coincidence (although, it *did* happen quiet persistently before, and now not once.. so that is rather suspicious for me) - it shows that the crash in the WBT code is independent. So that seems to be something that can happen without any transport interruptions.

Here is the backtrace from that particular run, where no queue-stalls/-slowdowns were seen, but WBT still crashed:

------- Comment From BBLOCK@de.ibm.com 2020-06-17 06:59 EDT-------
O.K. we have some new insights here.

@ali.haider@ibm.com did some experiments on my behalf with a slightly modified Ubuntu kernel (based on 5.4.0-29) where I removed commit 3060781f2664 ("s390/qdio: allow to scan all Output SBALs in one go") (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=3060781f2664d34af641247aeac62696405a3fde). We had a suspicion that this might be related to the queue-stalls/-slowdowns we always saw in the past before the crash in the WBT code. And to my slight suprise, not only did the queue-stalls/-slowdowns disappear, but the WBT crash still persisted. I checked all available logs and our driver traces from the dump and didn't find any indication what so ever that scsi-EH was ever invoked, nor that we went through adapter-recovery any time after the initial instance boot - no command timeouts or anything.

Here is the backtrace from that particular run, where no queue-stalls/-slowdowns were seen, but WBT still crashed:

[22808.815235] Unable to handle kernel pointer dereference in virtual kernel address space
[22808.815247] Failing address: 00007fe010a50000 TEID: 00007fe010a50403
[22808.815249] Fault in home space mode while using kernel ASCE.
[22808.815252] AS:000003dc9b67c00b R2:000003fd0000800b R3:000003fd0000c007 S:000003fba9b84800 P:0000000000000400
[22808.815368] Oops: 0011 ilc:2 [#1] SMP
[22808.815376] Modules linked in: xfs vhost_net vhost macvtap macvlan tap xt_CHECKSUM xt_MASQUERADE xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle iptable_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nf_tables nfnetlink ip6table_filter ip6_tables iptable_filter bpfilter bridge dm_service_time aufs overlay dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua s390_trng chsc_sch eadm_sch vfio_ccw vfio_mdev mdev vfio_iommu_type1 vfio 8021q garp mrp stp llc sch_fq_codel drm drm_panel_orientation_quirks i2c_core ip_tables x_tables btrfs zstd_compress zlib_deflate raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 linear dm_mirror dm_region_hash dm_log qeth_l2 pkey zcrypt crc32_vx_s390 ghash_s390 prng aes_s390 des_s390 libdes sha3_512_s390 sha3_256_s390 sha512_s390 sha256_s390 sha1_s390 sha_common zfcp scsi_transport_fc dasd_eckd_mod dasd_mod qeth qdio ccwgroup
[22808.815519] CPU: 14 PID: 185372 Comm: CPU 0/KVM Kdump: loaded Not tainted 5.4.0-2901-generic #01
[22808.815521] Hardware name: IBM 8561 T01 708 (LPAR)
[22808.815523] Krnl PSW : 0404e00180000000 000003dc9a6dd9be (try_to_wake_up+0x4e/0x700)
[22808.815535]            R:0 T:1 IO:0 EX:0 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
[22808.815602] Krnl GPRS: 000003fbd9ab7588 00007fe000000000 00007fe00000000f 0000000000000003
[22808.815605]            0000000000000000 0000000000000039 04007fe001ef7a88 0000000000000003
[22808.815607]            0000000000000003 00007fe010a50284 0000000000000000 00007fe010a4f930
[22808.815609]            000003f588fd6600 000003dc9af0f070 00007fe001ef7ae0 00007fe001ef7a60
[22808.815620] Krnl Code: 000003dc9a6dd9b2: 41902954            la      %r9,2388(%r2)
000003dc9a6dd9b6: 582003ac            l       %r2,940
#000003dc9a6dd9ba: a7180000            lhi     %r1,0
>000003dc9a6dd9be: ba129000            cs      %r1,%r2,0(%r9)
000003dc9a6dd9c2: a77401c9            brc     7,000003dc9a6ddd54
000003dc9a6dd9c6: e310b0080004        lg      %r1,8(%r11)
000003dc9a6dd9cc: b9800018            ngr     %r1,%r8
000003dc9a6dd9d0: a774001f            brc     7,000003dc9a6dda0e
[22808.815637] Call Trace:
[22808.816011] ([<00007fff809861e8>] __key.84156+0x10/0xfffffffffffb7e28 [xfs])
[22808.816022]  [<000003dc9ab596ba>] rq_qos_wake_function+0x8a/0xa0
[22808.816025]  [<000003dc9a6fcbde>] __wake_up_common+0x9e/0x1b0
[22808.816028]  [<000003dc9a6fd0e4>] __wake_up_common_lock+0x94/0xe0
[22808.816029]  [<000003dc9a6fd15a>] __wake_up+0x2a/0x40
[22808.816034]  [<000003dc9ab70640>] wbt_done+0x90/0xe0
[22808.816036]  [<000003dc9ab597be>] __rq_qos_done+0x3e/0x60
[22808.816040]  [<000003dc9ab455b0>] blk_mq_free_request+0xe0/0x140
[22808.816045]  [<000003dc9ace7c60>] dm_softirq_done+0x140/0x230
[22808.816046]  [<000003dc9ab43fbc>] blk_done_softirq+0xbc/0xe0
[22808.816051]  [<000003dc9af06710>] __do_softirq+0x100/0x360
[22808.816054]  [<000003dc9a6ad25e>] irq_exit+0x9e/0xc0
[22808.816057]  [<000003dc9a638b18>] do_IRQ+0x78/0xb0
[22808.816059]  [<000003dc9af05c28>] ext_int_handler+0x128/0x12c
[22808.816060]  [<000003dc9af05306>] sie_exit+0x0/0x46
[22808.816065] ([<000003dc9a67144a>] __vcpu_run+0x27a/0xc30)
[22808.816068]  [<000003dc9a67a9a8>] kvm_arch_vcpu_ioctl_run+0x2d8/0x840
[22808.816072]  [<000003dc9a665242>] kvm_vcpu_ioctl+0x282/0x770
[22808.816077]  [<000003dc9a90df66>] do_vfs_ioctl+0x376/0x690
[22808.816078]  [<000003dc9a90e304>] ksys_ioctl+0x84/0xb0
[22808.816080]  [<000003dc9a90e39a>] __s390x_sys_ioctl+0x2a/0x40
[22808.816082]  [<000003dc9af055f2>] system_call+0x2a6/0x2c8
[22808.816084] Last Breaking-Event-Address:
[22808.816087]  [<000003dc9a6de07e>] wake_up_process+0xe/0x20
[22808.816159] Kernel panic - not syncing: Fatal exception in interrupt

For the moment I'll stop chasing after the WBT crash - I already provided a work-around for it with the udev-rule I wrote before (although, I might note, I have no idea what this workaround means for performance.. WBT is primarily a performance feature that is intended to stop I/O starvation in face of excessive page-cache writeback from individual tasks). We will definitely continue working on the queue-stalls/-slowdowns, but that seems now independent from this particular bug-report.

Corbin (cpwest11) on 2020-07-07

information type:

Public → Public Security

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-07-07:

@Corbin, may I ask for the rationale why you changed this from Public → Public Security ?

Frank Heimes (fheimes) on 2020-07-09

information type:

Public Security → Public

Revision history for this message

bugproxy (bugproxy) wrote on 2020-07-09:

------- Comment From <email address hidden> 2020-07-09 06:00 EDT-------
IBM: We can't possibly generally recommend turning WBT off.
What is Canonicals position, to solve this situation?
Many thanks in advance

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-07-09:

Well, we do drive our storage sub-system from time to time to the limits - especially if we do parallel LPAR deployments for OpenStack environments.
But that's on a z13 and a DS8k - and so far we never saw such issues in this environment.

Further investigations in Launchpad did not resulted in further references to similar reports like this, with SCSI / wbt (or wbt in general) on focal.

However, I found that there were wbt, respectively blk-wbt, issues in the past with kernels > 4.10 and < v4.19 that partially led to CPU hard lockups on heavy writes (largely reported on NVMe drives).
But those bugs where only reported on bionic (and cosmic) - which fits to the kernel range above - and got fixed quite some time ago.
The bionic (and cosmic) kernels where patched via backports of:
2887e41b910b - "blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait"
38cfb5a45ee0 - "blk-wbt: improve waking of tasks"
I just double checked that the fixes from those tickets are (still) in, and they are.

With only having heard about this problem in this bug here, I agree that recommending to turn WBT off in general would not be good - even preferring stability over performance.
(I still have the suspicion that it could be XIV related, rather than general block or SCSI layer...)

However, for now we may add a statement to the s390x section of the release notes pointing to WBT and the udev rule for disabling it for the block-devices, in case one hits such issues under high disk I/O stress.

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-07-14:

I'm wondering if it would make sense (on top of comment #7: https://bugs.launchpad.net/ubuntu-z-systems/+bug/1881109/comments/7) to test this again with an updated kernel (that got patched to fix DIF and DIX), where the fix virtually updates the scsi/zfcp driver to the kernel 5.8 level?

I'm having this bug in mind:
LP 1887124 "[UBUNTU 20.04] DIF and DIX support in zfcp (s390x) is broken and the kernel crashes unconditionally"
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124

A patched kernel to test is referenced here:
https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124/comments/1

Revision history for this message

bugproxy (bugproxy) wrote on 2020-09-16:

------- Comment From <email address hidden> 2020-09-16 09:02 EDT-------
@CAN: Will the patch mentioned in LP comment #8 be used to build the kernel referenced above made it into the regular Ubuntu kernel?

Many thx in advance

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-09-16:

#10

Yes, the kernel(s) were the significant set (of about 30) zFCP related patches were applied to,
already landed in focal (-updates) respectively the groovy kernel (indicated by the Fix Released status at LP 1887124 - https://bugs.launchpad.net/ubuntu-z-systems/+bug/1887124)

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-09-16:

#11

In other words it's reasonable to retry on a latest Ubuntu 20.04 kernel
(after sudo apt update && sudo apt full-upgrade and a reboot)

Frank Heimes (fheimes) on 2020-12-08

Changed in ubuntu-z-systems:
status:	New → Incomplete

Revision history for this message

bugproxy (bugproxy) wrote on 2020-12-21:

#12

------- Comment From <email address hidden> 2020-12-21 10:42 EDT-------
We have re-run our test suite that originally produces this bug consistently and again were unable to reproduce. It looks like this issue has been fixed on the latest kernel.

```
bender@zt93ke:~$ uname -a
Linux zt93ke 5.4.0-58-generic #64-Ubuntu SMP Wed Dec 9 08:13:08 UTC 2020 s390x s390x s390x GNU/Linux
bender@zt93ke:~$ cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
```

@Heinz-Werner Seeck good to close +1

Revision history for this message

Frank Heimes (fheimes) wrote on 2020-12-21:

#13

Hello Max, glad to read that.
That's what I hoped, after the significant patch set of LP 1887124 landed.
I'm closing this bug on our side, too.
Thx

Changed in linux (Ubuntu):
status:	New → Fix Released
Changed in ubuntu-z-systems:
status:	Incomplete → Fix Released

Revision history for this message

bugproxy (bugproxy) wrote on 2021-01-13:

#14

------- Comment From <email address hidden> 2021-01-13 09:51 EDT-------
IBM Bugzilla -> closed, Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.

Ubuntulinux package

[Ubuntu 20.04] LPAR crashes in block layer under high stress. Might be triggered by scsi errors.

Bug Description

Other bug subscribers

Remote bug watches

Ubuntu
linux package