DS8870 path fails to restore during DS8K EI (error injection) test on z15 LPAR

Bug #1852290 reported by bugproxy
10
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ubuntu on IBM z Systems
Invalid
Medium
Skipper Bug Screeners
multipath-tools (Ubuntu)
Invalid
Medium
Canonical Server

Bug Description

A path in an zFCP/SCSI multipath environment did not recovered while running a DS8870 EI (error injection) test on 18.04 with kernel 4.15 on a z15 LPAR.

Revision history for this message
bugproxy (bugproxy) wrote : system log

Default Comment by Bridge

tags: added: architecture-s39064 bugnameltc-181838 severity-medium targetmilestone-inin---
Revision history for this message
bugproxy (bugproxy) wrote : DS8K EI running log

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : server check log

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : dbginfo

Default Comment by Bridge

Revision history for this message
bugproxy (bugproxy) wrote : sosreport

Default Comment by Bridge

Changed in ubuntu:
assignee: nobody → Skipper Bug Screeners (skipper-screen-team)
affects: ubuntu → linux (Ubuntu)
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla
Download full text (3.3 KiB)

------- Comment From <email address hidden> 2019-11-12 09:53 EDT-------
Path did not recover during running DS8K EI

---uname output---
machine: 4.15.0-65-generic #74-Ubuntu SMP Tue Sep 17 17:04:42 UTC 2019 s390x s390x s390x GNU/Linux

Machine Type = z15 lpar

---Debugger---
A debugger is not configured

---Steps to Reproduce---
1. There are 4 path from DS8870
2. Run DS8870 EI on storage side:
DS8K_ClusterWS
DS8K_ClusterFLA
Which can caused the path fail and back
Expect:
path can be back on time
Result:
Sometimes the path did not come back

Stack trace output:
no

Oops output:
no

System Dump Info:
The system is not configured to capture a system dump.

[From system log, the error occurred during:
start:
ilzlnx3 root: ILAB_CYCLE34_DS8K_CLUSTERFOFB_LOOP_3_NODE1_CYCLE34_1_20191021214951

end:
ilzlnx3 systemd[1]: Started Session 259 of user root.
The failed disk is
32G_d2_ilsd2107t (36005076303ffc042000000000000012e) dm-26 IBM,2107900
size=13G features='3 queue_if_no_path queue_mode mq' hwhandler='0' wp=rw
`-+- policy='service-time 0' prio=50 status=active
|- 3:0:7:1076772865 sdaw 67:0 active ready running
|- #:#:#:# sdf 8:80 active faulty running
|- 0:0:0:1076772865 sdbj 67:208 active ready running
`- 0:0:2:1076772865 sdbn 68:16 active ready running

Meantime there are some errors related to the path 'sdf'
ilzlnx3 kernel: [212581.399001] sd 3:0:1:1076772865: [sdf] tag#528 Done: ADD_TO_MLQUEUE Result: hostbyte=DID_OK driverbyte=DRIVER_OK
ilzlnx3 kernel: [212581.399002] sd 3:0:1:1076772865: [sdf] tag#528 CDB: Write(10) 2a 00 01 0d 88 40 00 08 00 00
ilzlnx3 kernel: [212581.399003] sd 3:0:1:1076772865: [sdf] tag#528 Sense Key : Unit Attention [current]
ilzlnx3 kernel: [212581.399004] sd 3:0:1:1076772865: [sdf] tag#528 Add. Sense: Power on, reset, or bus device reset occurred
ilzlnx3 kernel: [212581.399097] sd 1:0:11:1078870017: Power-on or dev

Internal validation:
Hello. We found some errors in your setup. For every fc_host parameter dev_loss_tmo should be disabled. Please disable it for all your fc_hosts and rerun EI job.

For more recommendations regarding zFCP setup, please refer to our zFCP Best Pracitces:
http://public.dhe.ibm.com/software/dw/linux390/lvc/zFCP_Best_Practices-BB-Webcast_201805.pdf?cm_sp=dw-dwtv-_-linuxonz-_-presentation-PDF

Hello. We have analyzed data, that you have provided to us.

We can see, that in hostcheck output in 'multipath -ll' command output sdf device has this representation in multipath group:
#:#:#:# sdf 8:80 active faulty running

But as we can see in dbginfo output in 'multipathd -k'show topo'' command output sdf device has this representation:
3:0:1:1076772865 sdf 8:80 active ready running

Here it is ready, not faulty and there are no '#' signs.

We also checked another LUN for same host and target port and in both outputs it looks good:
3:0:1:1076707329 sdd 8:48 active ready running

In syslog we have not seen any problems with sdf device and with 32G_d2_ilsd2107t multipath group during time, when your Error Injection test was running.

Hi
I did not see any IO errors during EI. There is only DS8k path showing 'fail' state in "multipath -ll"

Hello. Based on i...

Read more...

Frank Heimes (fheimes)
affects: linux (Ubuntu) → multipath-tools (Ubuntu)
Frank Heimes (fheimes)
Changed in ubuntu-z-systems:
importance: Undecided → Medium
assignee: nobody → Canonical Server Team (canonical-server)
Revision history for this message
Frank Heimes (fheimes) wrote : Re: ILAB_TUC Z15 guest ilzlnx3 DS8870 path fail to restore during DS8K EI

Moving an excerpt of comment #6 to the bug description...

description: updated
Frank Heimes (fheimes)
summary: - ILAB_TUC Z15 guest ilzlnx3 DS8870 path fail to restore during DS8K EI
+ DS8870 path fails to restore during DS8K EI (error injection) test on
+ z15 LPAR
Revision history for this message
Christian Ehrhardt  (paelzer) wrote :

Hi,
I was made aware today of this, quickly going over it I think we would ask to test this with the newest version next.

That means interesting would be the following retests:
- use linux-hwe-edge to get a new 5.3 kernel into the mix
- upgrade to Ubuntu 19.10 to retest with the newer multipath-tools 0.7.9
- (future) we plan to go to multipath-tools 0.8.x into Ubuntu 20.04

We'd ping you once the merge of 0.8.x is available in 20.04. Then you'd need to prep the test env only once and could do the testing for all three cases. That would help to identify if this is a patch that needs backporting (and if it is in userspace or kernel) or if this really needs new code even on the latest SW level.

Frank Heimes (fheimes)
Changed in multipath-tools (Ubuntu):
assignee: Skipper Bug Screeners (skipper-screen-team) → Canonical Server Team (canonical-server)
Changed in ubuntu-z-systems:
assignee: Canonical Server Team (canonical-server) → Skipper Bug Screeners (skipper-screen-team)
status: New → Triaged
Changed in multipath-tools (Ubuntu):
status: New → Triaged
importance: Undecided → Medium
Revision history for this message
Dimitri John Ledkov (xnox) wrote :

0.8.3 shipped in focal, did it improve anything for this issue?

Changed in ubuntu-z-systems:
status: Triaged → Incomplete
Changed in multipath-tools (Ubuntu):
status: Triaged → Incomplete
Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2021-03-08 06:50 EDT-------
IBM Bugzilla status->closed, Problem could currently not reproduced.
Once showing up again, new ticket will be created.

Frank Heimes (fheimes)
Changed in multipath-tools (Ubuntu):
status: Incomplete → Invalid
Changed in ubuntu-z-systems:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.