xenial: virtio-scsi: CPU soft lockup due to loop in virtscsi_target_destroy()
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
linux (Ubuntu) |
Triaged
|
Medium
|
Unassigned | ||
Xenial |
Fix Released
|
Medium
|
Unassigned |
Bug Description
[Impact]
* Detaching virtio-scsi disk in Xenial guest can cause
CPU soft lockup in guest (and take 100% CPU in host).
* It may prevent further progress on other tasks that
depend on resources locked earlier in the SCSI target
removal stack, and/or impact other SCSI functionality.
* The fix resolves a corner case in the requests counter
in the virtio SCSI target, which impacts a downstream
(SAUCE) patch in the virtio-scsi target removal handler
that depends on the requests counter value to be zero.
[Test Case]
* See LP #1798110 (this bug)'s comment #3 (too long for
this section -- synthetic case with GDB+QEMU) and
comment #4 (organic test case in cloud instance).
[Regression Potential]
* It seem low -- this only affects the SCSI command requeue
path with regards to the reference counter, which is only
used with real chance of problems in our downstream patch
(which is now passing this testcase).
* The other less serious issue would be decrementing it to
a negative / < 0 value, which is not possible with this
driver logic (see commit message), because the reqs counter
is always incremented before calling virtscsi_
where this decrement operation is inserted.
[Original Description]
A customer reported a CPU soft lockup on Trusty HWE kernel from Xenial
when detaching a virtio-scsi drive, and provided a crashdump that shows
2 things:
1) The soft locked up CPU is waiting for another CPU to finish something,
and that does not happen because the other CPU is infinitely looping in
virtscsi_
2) The loop happens because the 'tgt->reqs' counter is non-zero, and that
probably happened due to a missing decrement in SCSI command requeue path,
exercised when the virtio ring is full.
The reported problem itself happens because of a downstream/SAUCE patch,
coupled with the problem of the missing decrement for the reqs counter.
Introducing a decrement in the SCSI command requeue path resolves the
problem, verified synthetically with QEMU+GDB and with test-case/loop
provided by the customer as problem reproducer.
CVE References
description: | updated |
description: | updated |
Changed in linux (Ubuntu): | |
importance: | Undecided → Medium |
status: | Confirmed → Triaged |
Changed in linux (Ubuntu Xenial): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in linux (Ubuntu Xenial): | |
status: | Triaged → Fix Committed |
tags: | added: cscc |
Problem Analysis
================
The dmesg log 'crash/ 201809061748/ dmesg.201809061 748' shows the CPU soft lockup occurs 25 seconds after the 'sdb' virtio-scsi drive is removed.
This seems to indicate the events are related (there's usually an extra 2s-3s between an event and the report of the 22s or 23s stuck, for some reason).
[ 3002.697474] sd 0:0:2:0: [sdb] Synchronizing SCSI cache DID_BAD_ TARGET driverbyte= DRIVER_ OK
[ 3002.697545] sd 0:0:2:0: [sdb] Synchronize Cache(10) failed: Result: hostbyte=
[ 3028.294602] NMI watchdog: BUG: soft lockup - CPU#1 stuck for 23s! [id:2887]
CPU 1 is waiting on another CPU (to finish something) ------- ------- ------- ---
-------
The locked up 'id' process is in page fault handling stack (which is OK/normal), calling a function on many (SMP) cpus (see smp_call_ function_ many() in the top of the stack), and is specifically in the call to one CPU (see smp_call_ function_ single+ 0xae/0x110( ) in the RIP register).
[ 3028.301755] CPU: 1 PID: 2887 Comm: id Not tainted 4.4.0-133-generic #159~14.04.1-Ubuntu ffffffff81102b1 e>] [<ffffffff81102 b1e>] smp_call_ function_ single+ 0xae/0x110 f60>] ? do_kernel_ range_flush+ 0x40/0x40 e9e>] smp_call_ function_ many+0x22e/ 0x270 3f8>] native_ flush_tlb_ others+ 0x48/0x120 56d>] flush_tlb_ mm_range+ 0x9d/0x180 1e3>] ptep_clear_ flush+0x53/ 0x60 5ed>] wp_page_ copy.isra. 58+0x29d/ 0x530 55d>] do_wp_page+ 0x8d/0x590 826>] handle_ mm_fault+ 0xd86/0x1ac0 c53>] ? free_pages+ 0x13/0x20 754>] ? finish_ task_switch+ 0x244/0x2a0 0fb>] __do_page_ fault+0x19b/ 0x430 3b2>] do_page_ fault+0x22/ 0x30 928>] page_fault+ 0x28/0x30
...
[ 3028.301760] RIP: 0010:[<
...
[ 3028.301797] Call Trace:
[ 3028.301803] [<ffffffff81071
[ 3028.301805] [<ffffffff81102
[ 3028.301808] [<ffffffff81072
[ 3028.301810] [<ffffffff81072
[ 3028.301815] [<ffffffff811cb
[ 3028.301819] [<ffffffff811b7
[ 3028.301822] [<ffffffff811b9
[ 3028.301824] [<ffffffff811bb
[ 3028.301829] [<ffffffff81192
[ 3028.301835] [<ffffffff810aa
[ 3028.301840] [<ffffffff8106b
[ 3028.301843] [<ffffffff8106b
[ 3028.301847] [<ffffffff81822
The smp_call_ function_ many() address line in the stack trace (smp_call_ function_ many+0x22e / ffffffff81102e9e) reflects it's called smp_call_ function_ single( ):
ffffffff81102c70 <smp_call_ function_ many>: function_ single>
...
ffffffff81102e99: e8 d2 fb ff ff callq ffffffff81102a70 <smp_call_
ffffffff81102e9e: 48 83 c4 10 add $0x10,%rsp
... which per the address in the RIP register (smp_call_ function_ single+ 0xae ffffffff81102b1e) exec_single( ).
is in the (inlined) call to csd_lock_wait() after generic_
csd_lock_wait() spins in the value of csd->flags with cpu_relax() / 'pause' instruction,
waiting for it to be unlocked (i.e., not have the CSD_FLAG_LOCK flag / 0x1 value in the
flags field / offset 0x18)
ffffffff81102a70 <smp_call_ function_ single> : exec_single> function_ singl.. .
...
ffffffff81102b0f: e8 0c fe ff ff callq ffffffff81102920 <generic_
ffffffff81102b14: 8b 55 e8 mov -0x18(%rbp),%edx
ffffffff81102b17: 83 e2 01 and $0x1,%edx
ffffffff81102b1a: 74 de je ffffffff81102afa <smp_call_