2018-10-16 14:37:19 |
Mauricio Faria de Oliveira |
bug |
|
|
added bug |
2018-10-16 15:00:05 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Incomplete |
|
2018-10-16 15:00:06 |
Ubuntu Kernel Bot |
tags |
|
xenial |
|
2018-10-16 15:06:10 |
Mauricio Faria de Oliveira |
linux (Ubuntu): status |
Incomplete |
Confirmed |
|
2018-10-16 15:28:48 |
Mauricio Faria de Oliveira |
description |
(I'll add the SRU template + testing steps and post to ML shortly.)
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer. |
[Impact]
* Detaching virtio-scsi disk in Xenial guest can cause
CPU soft lockup in guest (and take 100% CPU in host).
* It may prevent further progress on other tasks that
depend on resources locked earlier in the SCSI target
removal stack, and/or impact other SCSI functionality.
* The fix resolves a corner case in the requests counter
in the virtio SCSI target, which impacts a downstream
(SAUCE) patch in the virtio-scsi target removal handler
that depends on the requests counter.
[Test Case]
* See LP #1798110 (this bug)'s comment #3 (too long for
this section -- synthetic case with GDB+QEMU) and
comment #4 (organic test case in cloud instance).
[Regression Potential]
* It seem low -- this only affects the SCSI command requeue
path with regards to the reference counter, which is only
used with real chance of problems in our downstream patch
(which is now passing this testcase).
* The other less serious issue would be decrementing it to
a negative / < 0 value, which is not possible with this
driver logic (see commit message), because the reqs counter
is always incremented before calling virtscsi_queuecommand(),
where this decrement operation is inserted.
[Original Description]
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer. |
|
2018-10-16 15:37:02 |
Mauricio Faria de Oliveira |
description |
[Impact]
* Detaching virtio-scsi disk in Xenial guest can cause
CPU soft lockup in guest (and take 100% CPU in host).
* It may prevent further progress on other tasks that
depend on resources locked earlier in the SCSI target
removal stack, and/or impact other SCSI functionality.
* The fix resolves a corner case in the requests counter
in the virtio SCSI target, which impacts a downstream
(SAUCE) patch in the virtio-scsi target removal handler
that depends on the requests counter.
[Test Case]
* See LP #1798110 (this bug)'s comment #3 (too long for
this section -- synthetic case with GDB+QEMU) and
comment #4 (organic test case in cloud instance).
[Regression Potential]
* It seem low -- this only affects the SCSI command requeue
path with regards to the reference counter, which is only
used with real chance of problems in our downstream patch
(which is now passing this testcase).
* The other less serious issue would be decrementing it to
a negative / < 0 value, which is not possible with this
driver logic (see commit message), because the reqs counter
is always incremented before calling virtscsi_queuecommand(),
where this decrement operation is inserted.
[Original Description]
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer. |
[Impact]
* Detaching virtio-scsi disk in Xenial guest can cause
CPU soft lockup in guest (and take 100% CPU in host).
* It may prevent further progress on other tasks that
depend on resources locked earlier in the SCSI target
removal stack, and/or impact other SCSI functionality.
* The fix resolves a corner case in the requests counter
in the virtio SCSI target, which impacts a downstream
(SAUCE) patch in the virtio-scsi target removal handler
that depends on the requests counter value to be zero.
[Test Case]
* See LP #1798110 (this bug)'s comment #3 (too long for
this section -- synthetic case with GDB+QEMU) and
comment #4 (organic test case in cloud instance).
[Regression Potential]
* It seem low -- this only affects the SCSI command requeue
path with regards to the reference counter, which is only
used with real chance of problems in our downstream patch
(which is now passing this testcase).
* The other less serious issue would be decrementing it to
a negative / < 0 value, which is not possible with this
driver logic (see commit message), because the reqs counter
is always incremented before calling virtscsi_queuecommand(),
where this decrement operation is inserted.
[Original Description]
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_target_destroy().
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer. |
|
2018-10-16 18:37:23 |
Joseph Salisbury |
linux (Ubuntu): importance |
Undecided |
Medium |
|
2018-10-16 18:37:26 |
Joseph Salisbury |
linux (Ubuntu): status |
Confirmed |
Triaged |
|
2018-10-16 18:37:40 |
Joseph Salisbury |
nominated for series |
|
Ubuntu Xenial |
|
2018-10-16 18:37:40 |
Joseph Salisbury |
bug task added |
|
linux (Ubuntu Xenial) |
|
2018-10-16 18:37:45 |
Joseph Salisbury |
linux (Ubuntu Xenial): status |
New |
Triaged |
|
2018-10-16 18:37:48 |
Joseph Salisbury |
linux (Ubuntu Xenial): importance |
Undecided |
Medium |
|
2018-10-24 09:58:40 |
Kleber Sacilotto de Souza |
linux (Ubuntu Xenial): status |
Triaged |
Fix Committed |
|
2018-10-25 08:04:39 |
Brad Figg |
tags |
xenial |
verification-needed-xenial xenial |
|
2018-10-25 14:07:17 |
Mauricio Faria de Oliveira |
tags |
verification-needed-xenial xenial |
verification-done-xenial xenial |
|
2018-10-25 14:41:31 |
David Coronel |
bug |
|
|
added subscriber David Coronel |
2018-11-13 17:53:26 |
Launchpad Janitor |
linux (Ubuntu Xenial): status |
Fix Committed |
Fix Released |
|
2018-11-13 17:53:26 |
Launchpad Janitor |
cve linked |
|
2018-7755 |
|
2019-07-24 21:16:27 |
Brad Figg |
tags |
verification-done-xenial xenial |
cscc verification-done-xenial xenial |
|