iSCSI volume detach does not correctly remove the multipath device descriptors
| Affects | Status | Importance | Assigned to | Milestone | |
|---|---|---|---|---|---|
| | OpenStack Compute (nova) |
Low
|
Unassigned | ||
| | Ubuntu Cloud Archive |
Undecided
|
Unassigned | ||
| | Icehouse |
Undecided
|
Billy Olsen | ||
| | Juno |
Undecided
|
Unassigned | ||
| | Kilo |
Undecided
|
Billy Olsen | ||
| | nova (Ubuntu) |
Low
|
Unassigned | ||
| | Trusty |
Low
|
Billy Olsen | ||
Bug Description
[Impact]
iSCSI volume detach does not correctly remove the multipath device descriptors.
The multipath devices are left on the compute node and multipath tools will occaisionally send IOs to known multipath devices.
[Test Case]
tested environment:
nova-compute on Ubuntu 14.04.1, iscsi_use_
I created 3 cinder volumes and attached them to a nova instance. Then I detach them one by one. First 2 volumes volumes detached successfully. 3rd volume also successfully detached but ended up with failed multipaths.
Here is the terminal log for last volume detach.
openstack@
+------
|
ID
| Status | Name | Size | Volume Type | Bootable |
Attached to
|
+------
| 56a63288-
None
| false | 5bd68785-
+------
openstack@
Fri Sep 19 21:38:13 JST 2014
360060160cf0036
size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| |- 4:0:0:42 sdb 8:16 active undef running
| |- 5:0:0:42 sdd 8:48 active undef running
| |- 6:0:0:42 sdf 8:80 active undef running
| `- 7:0:0:42 sdh 8:112 active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
|- 11:0:0:42 sdp 8:240 active undef running
|- 8:0:0:42 sdj 8:144 active undef running
|- 9:0:0:42 sdl 8:176 active undef running
`- 10:0:0:42 sdn 8:208 active undef running
openstack@
Fri Sep 19 21:38:19 JST 2014
tcp: [10] 172.23.
tcp: [3] 172.23.
tcp: [4] 172.23.
tcp: [5] 172.23.
tcp: [6] 172.23.
tcp: [7] 172.23.
tcp: [8] 172.23.
tcp: [9] 172.23.
openstack@
56a63288-
openstack@
openstack@
+------
|
ID
| Status | Name | Size | Volume Type | Bootable |
Attached to
|
+------
| 56a63288-
None
| false | 5bd68785-
+------
openstack@
openstack@
+------
|
ID
| Status | Name | Size | Volume Type | Bootable | Attached to |
+------
| 56a63288-
None
| false |
+------
|
openstack@
Fri Sep 19 21:39:23 JST 2014
360060160cf0036
size=1.0G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw
|-+- policy='round-robin 0' prio=-1 status=active
| |- #:#:#:# - #:# active undef running
| |- #:#:#:# - #:# active undef running
| |- #:#:#:# - #:# active undef running
| `- #:#:#:# - #:# active undef running
`-+- policy='round-robin 0' prio=-1 status=enabled
|- #:#:#:# - #:# active undef running
|- #:#:#:# - #:# active undef running
|- #:#:#:# - #:# active undef running
`- #:#:#:# - #:# active undef running
openstack@
Fri Sep 19 21:39:27 JST 2014
iscsiadm: No active sessions.
Then I manually removed the multipaths,
openstack@
openstack@
openstack@
I think the problem is in,
virt/libvirt/
def _disconnect_
End of this method executes following code to call remove_
return
Therefore, first two volumes worked fine. However, when it comes to the last device (in this case 3rd one), this method return without calling _remove_
if not in_use:
# disconnect if no other multipath devices with same iqn
return
It just disconnect them but not remove them.
One of the reasons why we have to remove them is,
https:/
IMO, we should call _remove_
[Regression Potential]
- Low: the change is relatively minor and allows the code to also remove the device when the last path is removed. Should a regression occur, it should be limited to iscsi-multipath device detachment, which is a small portion of installations.
Related branches
- Ubuntu Server Developers: Pending requested 2016-04-04
-
Diff: 157 lines (+137/-0)3 files modifieddebian/changelog (+8/-0)
debian/patches/fix-iscsi-detach.patch (+128/-0)
debian/patches/series (+1/-0)
CVE References
| Changed in nova: | |
| assignee: | nobody → Sampath Priyankara (sampath-priyankara) |
| Changed in nova: | |
| status: | New → Confirmed |
| assignee: | Sampath Priyankara (sampath-priyankara) → Rafael David Tinoco (inaddy) |
| Mark Brown (mstevenbrown) wrote : | #1 |
| Rafael David Tinoco (inaddy) wrote : | #2 |
Reading statement above, It looks like the following commit:
https:/
To fix this specific bug:
https:/
Has a minor error on the logic, not removing latest path of one particular removed device (from cinder).
^^^ to summarize and make things clear.
Providing a hotfix soon so we can check if it solves this issues.
Thank you
-Rafael Tinoco
| Rafael David Tinoco (inaddy) wrote : | #3 |
Finally, after some time, I could reproduce this issue using regular lvm+iscsi backend:
root@ostacktrus
root@ostacktrus
<nothing>
root@ostacktrus
root@ostacktrus
33000000100000001 dm-1 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 3:0:0:1 sda 8:0 active ready running
root@ostacktrus
root@ostacktrus
33000000100000001 dm-1 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- #:#:#:# - #:# active faulty running
Although latest path was "faulty"... it was still there.
I'm having some problems on getting the second path activated on the nova-compute node (using cinder lvm backend + libvirt multipath) but this is showing the behavior..
My next step is to review libvirt multipath code (for the addition of the second path) AND the removal issue (reported here).
Thank you very much
Rafael Tinoco
| tags: | added: cts |
| Rafael David Tinoco (inaddy) wrote : | #4 |
Actually the code is returning on:
if not devices:
# disconnect if no other multipath devices
return
So it never gets beyond that (at least for my test case).
For two devices, it passes the statement above...
root@ostacktrus
33000000100000001 dm-1 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 7:0:0:1 sda 8:0 active ready running
33000000200000001 dm-2 IET,VIRTUAL-DISK
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=1 status=active
`- 5:0:0:1 sdc 8:32 active ready running
Removing multipath device:
root@ostacktrus
root@ostacktrus
33000000100000001 dm-1 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=active
`- #:#:#:# - #:# active faulty running
Removing latest device:
root@ostacktrus
root@ostacktrus
33000000100000001 dm-1 ,
size=1.0G features='0' hwhandler='0' wp=rw
`-+- policy='round-robin 0' prio=0 status=enabled
`- #:#:#:# - #:# failed faulty running
Latest path stays failed....
| Rafael David Tinoco (inaddy) wrote : | #5 |
Removing the multipath on this statement:
if not devices:
# disconnect if no other multipath devices
return
seem to have solved the problem.
Creating one PPA with nova package + this hotfix for feedbacks...
Thank you
Rafael Tinoco
| Rafael David Tinoco (inaddy) wrote : | #6 |
Attaching proposed fix to this bug.
A "hotfixed" version can be found in the following PPA:
https:/
Please test fix and provide me feedback.
PS: Meanwhile I'm also proposing this fix upstream.
| SamP (sampath-priyankara) wrote : | #7 |
Thanks you for the patch.
I include this patch for our testing schedule, and will let you know the result.
The attachment "trusty_
[This is an automated message performed by a Launchpad user owned by ~brian-murray, for any issue please contact him.]
| tags: | added: patch |
| Launchpad Janitor (janitor) wrote : | #9 |
Status changed to 'Confirmed' because the bug affects multiple users.
| Changed in nova (Ubuntu Trusty): | |
| status: | New → Confirmed |
| Changed in nova (Ubuntu): | |
| status: | New → Confirmed |
| SamP (sampath-priyankara) wrote : | #11 |
Hi Tinoco,
I tested this patch with EMC VNX 5300 as iSCSI cinder backend.
Confirmed that this patch solve the issue.
Here is the console log verification of
before apply the patch.
http://
And after apply the patch,
http://
The issue is no longer exist.
i really appreciate it if you could back port this to stable/juno
Thank you.
| tags: | added: juno-backport-potential |
| Rafael David Tinoco (inaddy) wrote : | #12 |
I have created the following PPA:
https:/
containing a hotfixed version of nova for Trusty.
I'll keep this PPA available, updating when needed, until we get upstream fix accepted and backported. As soon as the fix is backported we can ask for a "SRU" for Trusty's nova version.
I'm assigning this case temporary to Felipe (responsible to propose the fix upstream) while I'm on vacation.
Let me know if there any issues.
Thank you.
| Changed in nova: | |
| assignee: | Rafael David Tinoco (inaddy) → nobody |
| Changed in nova: | |
| assignee: | nobody → Felipe Reyes (freyes) |
Fix proposed to branch: master
Review: https:/
| Changed in nova: | |
| status: | Confirmed → In Progress |
| Changed in nova: | |
| importance: | Undecided → Low |
| Brian Murray (brian-murray) wrote : | #14 |
Has this been fixed upstream yet?
| Changed in nova (Ubuntu): | |
| importance: | Undecided → Medium |
| importance: | Medium → Low |
| Changed in nova (Ubuntu Trusty): | |
| importance: | Undecided → Low |
| Changed in nova (Ubuntu): | |
| status: | Confirmed → Triaged |
| Changed in nova (Ubuntu Trusty): | |
| status: | Confirmed → Triaged |
| Changed in nova: | |
| assignee: | Felipe Reyes (freyes) → Jorge Niedbalski (niedbalski) |
| Changed in nova: | |
| assignee: | Jorge Niedbalski (niedbalski) → Felipe Reyes (freyes) |
| Changed in nova: | |
| assignee: | Felipe Reyes (freyes) → Jorge Niedbalski (niedbalski) |
| tags: | added: sts |
Change abandoned by Michael Still (<email address hidden>) on branch: master
Review: https:/
Reason: It sounds like this patch is now obsolete. Please restore it if that isn't the case.
| tags: | removed: juno-backport-potential |
| Matt Riedemann (mriedem) wrote : | #16 |
Can this be tried against liberty or mitaka nova when we're using the os-brick library which had other fixes for multipath issues than did nova?
| Changed in nova: | |
| status: | In Progress → Incomplete |
| Billy Olsen (billy-olsen) wrote : | #17 |
Marking this is confirmed against the Ubuntu Cloud Archive for Kilo, Juno, and Trusty which are still supported from the Ubuntu perspective and is known not to include the os-brick library dependencies. Certainly, the testing for the change to os-brick needs to be verified that the problem is fixed there, so leaving that as incomplete.
| Tobias Urdin (tobias-urdin) wrote : | #19 |
Could you post your multipath.conf configuration? We had the same issue with multipath devices being stuck/not removed which also blocked our ability to do live migrations. See https:/
| Lee Yarwood (lyarwood) wrote : | #20 |
Moving to invalid as this should no longer reproduce against Liberty or Mitaka after the move os-brick. Please reopen and reassign to os-brick if this issue persists.
| Changed in nova: | |
| assignee: | Jorge Niedbalski (niedbalski) → nobody |
| status: | Incomplete → Invalid |
| Billy Olsen (billy-olsen) wrote : | #21 |
| Billy Olsen (billy-olsen) wrote : | #22 |
| Corey Bryant (corey.bryant) wrote : | #23 |
Thanks for the patches Billy. I've uploaded to kilo-staging and juno-staging and they'll be working their way to *-proposed for testing.
Just a few comments on the icehouse patch. The other bugs that are fixed in the icehouse patch will need to be updated with SRU information (Impact, Test Case, Regression Potential).
Also I moved the drop of 'return' in the 'not in_use path' from d/p/Fix-
And I added the following to the juno and icehouse detach iscsi patch headers:
Forwarded: https:/
Bug: https:/
Thanks,
Corey
Hello SamP, or anyone else affected,
Accepted nova into kilo-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
| tags: | added: verification-kilo-needed |
| Corey Bryant (corey.bryant) wrote : | #25 |
Juno is EOL so we're going to stop working on that.
| Changed in nova (Ubuntu Trusty): | |
| status: | Triaged → In Progress |
| assignee: | nobody → Billy Olsen (billy-olsen) |
| Martin Pitt (pitti) wrote : | #26 |
There is an SRU waiting in the trusty-proposed queue for this. Please clarify in which Ubuntu release(s) this is already fixed, or upload the fix to yakkety, so that the trusty SRU can proceed.
| Ryan Beisner (1chb1n) wrote : Update Released | #27 |
The verification of the Stable Release Update for nova has completed successfully and the package has now been released to -updates. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.
| Ryan Beisner (1chb1n) wrote : | #28 |
This bug was fixed in the package nova - 1:2015.1.4-0ubuntu2
---------------
nova (1:2015.
.
* d/p/fix-
- Clear latest path for last remaining iscsi disk to ensure
disk is properly removed.
.
nova (1:2015.
.
* New upstream stable release (LP: #1580334).
* d/p/skip-
they are hitting ProxyError during package builds.
.
nova (1:2015.
.
* New upstream stable release (LP: #1559215).
.
nova (1:2015.
.
* d/control: Bump oslo.concurrency to >= 1.8.2 (LP: #1518016).
.
nova (1:2015.
.
* Resynchronize with stable/kilo (68e9359) (LP: #1506058):
- [68e9359] Fix quota update in init_instance on nova-compute restart
- [d864603] Raise InstanceNotFound when save FK constraint fails
- [db45b1e] Give instance default hostname if hostname is empty
- [61f119e] Relax restrictions on server name
- [2e731eb] Remove unnecessary 'context' param from quotas reserve method
- [5579928] Updated from global requirements
- [08d1153] Don't expect meta attributes in object_compat that aren't in the
db obj
- [5c6f01f] VMware: pass network info to config drive.
- [17b5052] Allow to use autodetection of volume device path
- [5642b17] Delete orphaned instance files from compute nodes
- [8110cdc] Updated from global requirements
- [1f5b385] Hyper-V: Fixes serial port issue on Windows Threshold
- [24251df] Handle FC LUN IDs greater 255 correctly on s390x architectures
- [dcde7e7] Update obj_reset_changes signatures to match
- [e16fcfa] Unshelving volume backed instance fails
- [8fccffd] Make pagination tolerate a deleted marker
- [587092c] Fix live-migrations usage of the wrong connector information
- [8794b93] Don't check flavor disk size when booting from volume
- [c1ad497] Updated from global requirements
- [0b37312] Hyper-V: Removes old instance dirs after live migration
- [2d571b1] Hyper-V: Fixes live migration configdrive copy operation
- [07506f5] Hyper-V: Fix SMBFS volume attach race condition
- [60356bf] Hyper-V: Fix missing WMI namespace issue on Windows 2008 R2
- [83fb8cc] Hyper-V: Fix virtual hard disk detach
- [6c857c2] Updated from global requirements
- [0313351] Compute: replace incorrect instance object with dict
- [9724d50] Don't pass the service catalog when making glance requests
- [b5020a0] libvirt: Kill rsync/scp processes before deleting instance
- [3f337f8] Support host type specific block volume attachment
- [cb2a8fb] Fix serializer supported version reporting in object_backport
- [701c889] Execute _poll_shelved_
> 0
- [eb3b1c8] Fix rebuild of an instance with a volume attached
- [e459add] Handle unexpected clear events call
- [8280575] Support ssh-keygen of OpenS...
| Billy Olsen (billy-olsen) wrote : | #29 |
@pitti - this bug does not apply to openstack releases >= Liberty due to upstream code changes. The only Ubuntu release this applies to that is not EOL is Trusty. The fix is applicable to the following:
- Trusty
- UCA precise-icehouse
- UCA trusty-juno (now EOL, so excluded)
- UCA trusty-kilo
Hope this clarifies!
Hello SamP, or anyone else affected,
Accepted nova into trusty-proposed. The package will build now and be available at https:/
Please help us by testing this new package. See https:/
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-
Further information regarding the verification process can be found at https:/
| Changed in nova (Ubuntu): | |
| status: | Triaged → Fix Released |
| Changed in nova (Ubuntu Trusty): | |
| status: | In Progress → Fix Committed |
| tags: | added: verification-needed |
| SamP (sampath-priyankara) wrote : | #31 |
Hello Martin,
Thank you for the fix.
I am really sorry to inform that I am not able to test this patch in my current environment, since it does not have iscsi backend for cinder.
| Ryan Beisner (1chb1n) wrote : | #32 |
Hello SamP, or anyone else affected,
Accepted nova into icehouse-proposed. The package will build now and be available in the Ubuntu Cloud Archive in a few hours, and then in the -proposed repository.
Please help us by testing this new package. To enable the -proposed repository:
sudo add-apt-repository cloud-archive:
sudo apt-get update
Your feedback will aid us getting this update out to other Ubuntu users.
If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-
Further information regarding the verification process can be found at https:/
| tags: | added: verification-icehouse-needed |
| tags: |
added: sts-sru removed: cts |
| Billy Olsen (billy-olsen) wrote : | #33 |
Sorry for the delay, but I finally finished the trusty-proposed SRU verification this evening (Yay).
| tags: |
added: verification-done removed: verification-icehouse-needed verification-kilo-needed verification-needed |
| Launchpad Janitor (janitor) wrote : | #34 |
This bug was fixed in the package nova - 1:2014.
---------------
nova (1:2014.
* Fix live migration usage of the wrong connector (LP: #1475411)
- d/p/Fix-
* Fix wrong used ProcessExecutio
- d/p/Fix-
* Clean up iSCSI multipath devices in Post Live Migration (LP: #1357368)
- d/p/Clean-
* Detach iSCSI latest path for latest disk (LP: #1374999)
- d/p/Detach-
-- Billy Olsen <email address hidden> Fri, 29 Apr 2016 15:35:01 -0700
| Changed in nova (Ubuntu Trusty): | |
| status: | Fix Committed → Fix Released |
| Changed in cloud-archive: | |
| status: | Confirmed → Fix Released |
| tags: | removed: sts-sru |


Adding archive with the openstack cinder and nova logs for Control and Compute nodes.
And also their kernel logs.
W1DEV103 is the Controller node. (Cinder-vol, scheduler, api are also in this host.)
W1CN103 is the Compute node. (nova-compute and its needed other process are running on this host.)
Following detach API request caused the problem. 3751-4bdc- a9bf-a54318b2d9 85.
The request id: req-1f8a6f05-
The PW for the zip file is: "lkjse89'&3" (without the quotes).