(I'll add the SRU template + testing steps and post to ML shortly.)
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer.
(I'll add the SRU template + testing steps and post to ML shortly.)
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something, target_ destroy( ).
and that does not happen because the other CPU is infinitely looping in
virtscsi_
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer.