Comment 7 for bug 1881109

Revision history for this message
Frank Heimes (fheimes) wrote :

Well, we do drive our storage sub-system from time to time to the limits - especially if we do parallel LPAR deployments for OpenStack environments.
But that's on a z13 and a DS8k - and so far we never saw such issues in this environment.

Further investigations in Launchpad did not resulted in further references to similar reports like this, with SCSI / wbt (or wbt in general) on focal.

However, I found that there were wbt, respectively blk-wbt, issues in the past with kernels > 4.10 and < v4.19 that partially led to CPU hard lockups on heavy writes (largely reported on NVMe drives).
But those bugs where only reported on bionic (and cosmic) - which fits to the kernel range above - and got fixed quite some time ago.
The bionic (and cosmic) kernels where patched via backports of:
2887e41b910b - "blk-wbt: Avoid lock contention and thundering herd issue in wbt_wait"
38cfb5a45ee0 - "blk-wbt: improve waking of tasks"
I just double checked that the fixes from those tickets are (still) in, and they are.

With only having heard about this problem in this bug here, I agree that recommending to turn WBT off in general would not be good - even preferring stability over performance.
(I still have the suspicion that it could be XIV related, rather than general block or SCSI layer...)

However, for now we may add a statement to the s390x section of the release notes pointing to WBT and the udev rule for disabling it for the block-devices, in case one hits such issues under high disk I/O stress.