2020-04-08 18:35:43 |
Matt Coleman |
bug |
|
|
added bug |
2020-04-08 18:35:43 |
Matt Coleman |
attachment added |
|
output of `uname -a` https://bugs.launchpad.net/bugs/1871688/+attachment/5349843/+files/uname-a.log |
|
2020-04-08 18:36:36 |
Matt Coleman |
attachment added |
|
contents of /proc/version_signature https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871688/+attachment/5349844/+files/version.log |
|
2020-04-08 18:37:09 |
Matt Coleman |
attachment added |
|
dmesg output shortly after booting https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871688/+attachment/5349845/+files/dmesg.log |
|
2020-04-08 18:37:40 |
Matt Coleman |
attachment added |
|
output of `lspci -vvnn` https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1871688/+attachment/5349846/+files/lspci-vvnn.log |
|
2020-04-08 18:43:39 |
Matt Coleman |
tags |
iscsi lio target xenial |
bionic focal iscsi lio target xenial |
|
2020-04-08 18:44:06 |
Matt Coleman |
description |
The target subsystem (LIO) can hang if multiple threads try to destroy iSCSI sessions simultaneously. This is reproducible on systems that have multiple targets with initiators regularly connecting/disconnecting.
This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is executed when a logout operation is underway.
The iscsi target doesn't handle such events in a correct way: two or more threads may end up sleeping while waiting for the driver to close the remaining connections on the session. When the connections are closed, the driver wakes up only the first thread that will then proceed to destroy the session structure. The remaining threads are blocked there forever, waiting on a completion synchronization mechanism that doesn't exist in memory anymore because it has been freed by the first thread.
Note that if the blocked threads are somehow forced to wake up, they will try to free the same iSCSI session structure destroyed by the first thread, causing double frees, memory corruptions, etc.
The driver has been reorganized so the concurrent threads will set a flag in the session structure to notify the driver that the session should be destroyed; then, they wait for the driver to close the remaining connections. When the connections are all closed, the driver will wake up all the threads and will wait for the refcount of the iSCSI session structure to reach zero. When the last thread wakes up, the refcount is decreased to zero and the driver can proceed to destroy the session structure because no one is referencing it anymore.
I've witnessed this happening on hundreds of Ubuntu 16.04.5 systems. It is a regression, because this did not occur several years ago. Unfortunately, I don't have detailed records from that far back to determine exactly which kernel I was running that was not affected by this bug (I believe it was either 4.8.x or 4.10.x).
I've attached the requested uname, version_signature, dmesg, and lspci from my system. However, I've seen this happen on a wide array of hardware: 2 to 24 cores, 8GB to 256GB RAM, both AMD and Intel CPUs, onboard storage and PCIe SAS cards, etc.
This has been fixed in the upstream master branch, but it hasn't yet been backported to "-stable".
To fix this in the Ubuntu kernel, these three commits should be backported:
* https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad#diff-b7557d7ed3ba34645f6e9d510f281d3a
* https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796#diff-b7557d7ed3ba34645f6e9d510f281d3a
* https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e#diff-b7557d7ed3ba34645f6e9d510f281d3a |
The target subsystem (LIO) can hang if multiple threads try to destroy iSCSI sessions simultaneously. This is reproducible on systems that have multiple targets with initiators regularly connecting/disconnecting.
This may happen when a "targetcli iscsi/iqn.../tpg1 disable" command is executed when a logout operation is underway.
The iscsi target doesn't handle such events in a correct way: two or more threads may end up sleeping while waiting for the driver to close the remaining connections on the session. When the connections are closed, the driver wakes up only the first thread that will then proceed to destroy the session structure. The remaining threads are blocked there forever, waiting on a completion synchronization mechanism that doesn't exist in memory anymore because it has been freed by the first thread.
Note that if the blocked threads are somehow forced to wake up, they will try to free the same iSCSI session structure destroyed by the first thread, causing double frees, memory corruptions, etc.
The driver has been reorganized so the concurrent threads will set a flag in the session structure to notify the driver that the session should be destroyed; then, they wait for the driver to close the remaining connections. When the connections are all closed, the driver will wake up all the threads and will wait for the refcount of the iSCSI session structure to reach zero. When the last thread wakes up, the refcount is decreased to zero and the driver can proceed to destroy the session structure because no one is referencing it anymore.
I've witnessed this happening on hundreds of Ubuntu 16.04.5 systems. It is a regression, because this did not occur several years ago. Unfortunately, I don't have detailed records from that far back to determine exactly which kernel I was running that was not affected by this bug (I believe it was either 4.8.x or 4.10.x).
I've attached the requested uname, version_signature, dmesg, and lspci from my system. However, I've seen this happen on a wide array of hardware: 2 to 24 cores, 8GB to 256GB RAM, both AMD and Intel CPUs, onboard storage and PCIe SAS cards, etc.
This has been fixed in the upstream master branch, but it hasn't yet been backported to "-stable".
To fix this in the Ubuntu kernel, these three commits should be backported:
* https://github.com/torvalds/linux/commit/e49a7d994379278d3353d7ffc7994672752fb0ad
* https://github.com/torvalds/linux/commit/57c46e9f33da530a2485fa01aa27b6d18c28c796
* https://github.com/torvalds/linux/commit/626bac73371eed79e2afa2966de393da96cf925e
I'd like these commits to be added to the xenial, bionic, and focal kernels. |
|
2020-04-08 19:00:20 |
Ubuntu Kernel Bot |
linux (Ubuntu): status |
New |
Confirmed |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Focal |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Focal) |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Xenial |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Xenial) |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
nominated for series |
|
Ubuntu Bionic |
|
2020-04-09 12:30:54 |
Kleber Sacilotto de Souza |
bug task added |
|
linux (Ubuntu Bionic) |
|
2020-04-23 13:16:10 |
Kleber Sacilotto de Souza |
linux (Ubuntu Focal): status |
Confirmed |
In Progress |
|
2020-04-23 13:16:14 |
Kleber Sacilotto de Souza |
linux (Ubuntu Bionic): status |
New |
In Progress |
|
2020-04-24 17:25:22 |
Kelsey Steele |
linux (Ubuntu Bionic): status |
In Progress |
Fix Committed |
|
2020-04-24 17:25:26 |
Kelsey Steele |
linux (Ubuntu Focal): status |
In Progress |
Fix Committed |
|
2020-04-24 17:25:55 |
Kelsey Steele |
linux (Ubuntu Xenial): status |
New |
Won't Fix |
|
2020-04-24 18:20:57 |
Kelsey Steele |
bug task deleted |
linux (Ubuntu Xenial) |
|
|
2020-05-01 01:52:06 |
Ubuntu Kernel Bot |
tags |
bionic focal iscsi lio target xenial |
bionic focal iscsi lio target verification-needed-bionic xenial |
|
2020-05-06 07:12:30 |
Ubuntu Kernel Bot |
tags |
bionic focal iscsi lio target verification-needed-bionic xenial |
bionic focal iscsi lio target verification-needed-bionic verification-needed-focal xenial |
|
2020-05-15 05:13:30 |
Khaled El Mously |
tags |
bionic focal iscsi lio target verification-needed-bionic verification-needed-focal xenial |
bionic focal iscsi lio target verification-done-bionic verification-done-focal xenial |
|
2020-05-18 10:10:47 |
Launchpad Janitor |
linux (Ubuntu Bionic): status |
Fix Committed |
Fix Released |
|
2020-05-18 10:10:47 |
Launchpad Janitor |
cve linked |
|
2020-11494 |
|
2020-05-19 00:17:21 |
Launchpad Janitor |
linux (Ubuntu Focal): status |
Fix Committed |
Fix Released |
|
2020-07-28 00:57:39 |
Launchpad Janitor |
linux (Ubuntu): status |
In Progress |
Fix Released |
|
2020-07-28 00:57:39 |
Launchpad Janitor |
cve linked |
|
2019-16089 |
|
2020-07-28 00:57:39 |
Launchpad Janitor |
cve linked |
|
2019-19642 |
|
2020-07-28 00:57:39 |
Launchpad Janitor |
cve linked |
|
2020-11935 |
|