Comment 3 for bug 1881109

Revision history for this message
bugproxy (bugproxy) wrote : Comment bridged from LTC Bugzilla

------- Comment From <email address hidden> 2020-05-28 14:37 EDT-------
(In reply to comment #20)
> Hi Benjamin,
> if it's an issue somewhere in scsi-midlayer/block-layer/wbt wouldn't it then
> also happen with zFCP on DS8k and on other patforms?
> So far we did some testing with zFCP on DS8k (the only storage sub-system we
> have) as part of the release testing and server certification and on top we
> have constantly several zFCP systems currently running on 20.04 (probably
> less big systems and/or with less load), but so far we didn't faced a single
> crash.
> So I'm assuming more that is is XIV related, no?

Hey Frank,

I suspect this is a follow-on error from SCSI requests running into timeouts and subsequently being aborted and LUN/Target resets being send by the SCSI Error Handling code. Those cause abnormal request terminations (its rather unusual to have request timeouts) that might cause this WBT crash. At least that is my working theory so far.

I am looking into why the requests timeout in the first place in parallel to this report internally. But anyway, I don't think it should crash even with the timeouts. The last test also shows that if we disable WBT the setup doesn't seem to crash anymore, although the timeouts are still present - it "just" slows the workload for a time, but ultimately recovers.

At this point I don't have any evidence that XIV causes this problem.