when live-migrate failed, lun-id couldn't be rollback in havana
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
Medium
|
Lee Yarwood | ||
| OpenStack Security Advisory |
Undecided
|
Unassigned |
Bug Description
Hi, guys
When live-migrate failed with error, lun-id of connection_info column in Nova's block_deivce_
and failed VM can have others volume.
my test environment is following :
Openstack Version : Havana ( 2013.2.3)
Compute Node OS : 3.5.0-23-generic #35~precise1-Ubuntu SMP Fri Jan 25 17:13:26 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
Compute Node multipath : multipath-tools 0.4.9-3ubuntu7.2
test step is :
1) create 2 Compute node (host#1 and host#2)
2) create 1 VM on host#1 (vm01)
3) create 1 cinder volume (vol01)
4) attach 1 volume to vm01 (/dev/vdb)
5) live-migrate vm01 from host#1 to host#2
6) live-migrate success
- please check the mapper by using multipath command in host#1 (# multipath -ll), then you can find mapper is not deleted.
and the status of devices is "failed faulty"
- please check the lun-id of vol01
7) Again, live-migrate vm01 from host#2 to host#1 (vm01 was migrated to host#2 at step 4)
8) live-migrate fail
- please check the mapper in host#1
- please check the lun-id of vol01, then you can find the lun hav "two" igroups
- please check the connection_info column in Nova's block_deivce_
This Bug is critical security issue because the failed VM can have others volume.
and every backend storage of cinder-volume can have same problem because this is the bug of live-migration's rollback process.
I suggest below methods to solve issue :
1) when live-migrate is complete, nova should delete mapper devices at origin host
2) when live-migrate is failed, nova should rollback lun-id in connection_info column
3) when live-migrate is failed, cinder should delete the mapping between lun and host (Netapp : igroup, EMC : storage_group ...)
4) when volume-attach is requested , cinder volume driver of vendors should make lun-id randomly for reduce of probability of mis-mapping
please check this bug.
Thank you.
CVE References
description: | updated |
affects: | cinder → nova-project |
information type: | Private Security → Public Security |
Hyun Ha (raymon-ha) wrote : | #2 |
Hi, Tristan
Icehouse and Juno has the same bug.
please see below that i reported :
https:/
when live-migration failed, the information of mapping between host and lun can be tangled.
this bug can affect not only security vulnerability but also data stability. (filesystem can be break when lun mis-mapped)
Thanks.
Thierry Carrez (ttx) wrote : | #3 |
I agree that's a bug with critical consequences, however unless I'm mistaken, it's not a situation that can be triggered or predicted by an attacker, in which case it may not be considered a vulnerability ?
I agree with ttx here, without a way to make the migration process fail, this is a bug with security consequence, but not a vulnerability.
@Hahyun, is there a missing steps that an attacker can use to make the live migration (step 8) failed ?
Else let's triage this as a class D type of report and remove the advisory task. ( https:/
Hyun Ha (raymon-ha) wrote : | #5 |
Hi,
Thank you for your comment.
The reason that I reported above issue as vulnerability is that attacker can attach others volume to his own VM on purpose.
There are many ways to make live-migration fail.
Firstly, there are two issues on havana (tag : 2013.2.3)
One issue is the bug with multipath rescan.(https:/
The other one is that when volume is detached, multipath device couldn’t be deleted.
Due to these reasons, live-migration process will be failed in the situation that one vm live-migrate to another compute-node and go back to the original compute-node.
Secondly, if live-migration is executed while process keep using big size of memory by benchmark tool or something like that in VM instance
and then the waiting status of live-migration could be persisted, eventually live-migration will be failed.
There are some ways to make live-migration fail except I explained above.
Make NIC of compute-node down and then excute live-migration, live-migration is going to be failed for example (using multipath, iscsi)
Using rollback bug is just one way that attacker can attach others volume to his VM.
I think the importance thing is that nova attach volume with lun-id so that if lun-id might be changed with errors or by attackers, it occurs critical security issues.
please think about below situation.
attaker get the admin authority of nova DB.
change lun-id of connection_info in block_device_
reboot hard his VM with volume changed lun-id.
finally attacker get others volume on his vm easily.
I think the root-cause of this bug is that nova use “lun-id” for mapping VM with volume.
lun-id is not unique and could be changed in attach/detach process because it is generaged dynamically.
I'd like to suggest that nova should attach volume to vm with "unique-id" of lun not lun-id.
And additionally, the bug that I reported should be fixed.
Users who have VM on Public cloud based on Openstack can feel their vm is unsafe, if they know about the possibility of volume mis-mapping because one compute-node have many different customers vm.
So, I think this issue should be triaged as a Class A type.
Thank you.
Jeremy Stanley (fungi) wrote : | #6 |
Unless I'm misreading, you're suggesting potential exploits involving :
1. disconnecting physical network interfaces
2. gaining administrative access to the nova database
Our report taxonomy is not based on feelings and user impressions, but rather on the feasibility of exploiting a bug weighed against its implied risks.
Thierry Carrez (ttx) wrote : | #7 |
I tend to agree that it becomes a security vulnerability if live-migrate regularly fails. Even if the leak can't be triggered or controlled, it is still a privacy issue. We issued one in the past for OSSA-2013-006:
Hyun Ha (raymon-ha) wrote : | #8 |
Hi,
Live-migrate regularly fails on Havana and all branch before Havana.
Live-migrate can be failed on Juno and Ice-House in specific condition as I reported above.
Thank you.
Thierry Carrez (ttx) wrote : | #9 |
Agree that it's a vulnerability in Havana (since live-migration fails so often there). I wouldn't consider it a vulnerability in Icehouse/Juno, since you can't trigger live migration failure without administrative or physical access to the machines.
It is a bug with security consequences there, and it should be fixed as soon as possible.
Changed in nova-project: | |
status: | New → Confirmed |
Thierry Carrez (ttx) wrote : | #10 |
As far as OSSA is going, I'd rate this class C1 or D.
affects: | nova-project → nova |
Changed in nova: | |
importance: | Undecided → High |
Jeremy Stanley (fungi) wrote : | #11 |
Agreed on C1: this wouldn't qualify for an advisory since Havana is no longer supported by the VMT, but it's still something a distro carrying Havana packages of Nova might fix on their own and request a corresponding CVE to track.
information type: | Public Security → Public |
tags: | added: security |
Changed in ossa: | |
status: | Incomplete → Won't Fix |
Garth Mollett (gmollett) wrote : | #12 |
I'm going to go ahead and request a CVE for this on oss-sec at least for havana (which we [redhat] still support downstream) unless someone has a good reason not to? (or beats me to it)
Matthew Edmonds (edmondsw) wrote : | #13 |
per http://
For purposes of CVE, we typically don't think of vulnerabilities in the way expressed in https:/
Jeremy Stanley (fungi) wrote : | #14 |
And as I indicated in follow-up replies on that thread, the OpenStack VMT doesn't decide whether or not a bug is worthy of getting a CVE assigned (only whether or not we're going to embargo it and/or eventually issue a security advisory about it).
tags: | added: live-migration volumes |
tags: |
added: live-migrate removed: live-migration |
Matt Riedemann (mriedem) wrote : | #15 |
I'm trying to sort this out a bit.
Looking at the nova.virt.
This is where pre_live_
http://
Let's assume we're using the LibvirtISCSIVol
http://
That change never gets persisted back to the block_device_
Justin Shepherd (jshepher) wrote : | #16 |
Should this bug remain open as it is targeted to a no longer supported version (havana)?
Matt Riedemann (mriedem) wrote : | #17 |
I have a feeling that this is fixed via https:/
Matt Riedemann (mriedem) wrote : | #18 |
Per comment 17, maybe not given that's post live migration which is only called for a successful live migration, the rollback is called for a failed live migration.
Matt Riedemann (mriedem) wrote : | #19 |
@Justin, per comment 16, this was reported against Havana but as far as I can tell this is not yet resolved in master (Liberty right now).
Matt Riedemann (mriedem) wrote : | #20 |
I'm wondering if the fix (https:/
tags: |
added: live-migration removed: live-migrate |
Changed in nova: | |
assignee: | nobody → lvmxh (shaohef) |
Changed in nova: | |
assignee: | lvmxh (shaohef) → nobody |
Tobias Urdin (tobias-urdin) wrote : | #21 |
Can anybody please verify if my bug is a duplicate of this one? https:/
information type: | Public → Public Security |
Jeremy Stanley (fungi) wrote : | #22 |
Since this bug was switched from public back to public security with no comment explaining why, I have reset it to public again. Please, whenever moving a bug to a security type, add a comment with your reasoning.
information type: | Public Security → Public |
Lee Yarwood (lyarwood) wrote : | #23 |
We've seen this downstream against RHEL OSP 7 (kilo) documented (mostly privately) in the following RHBZ:
iscsi details changed for cinder volume using EMCCLIISCSIDriver
https:/
We manually reverted the changes to the target_luns to workaround the issue in this case. This still looks possible against master so I'm going to propose a change refreshing connection_info on the source host during _rollback_
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | nobody → Lee Yarwood (lyarwood) |
status: | Confirmed → In Progress |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit b83cae02ece4c33
Author: Lee Yarwood <email address hidden>
Date: Thu Jul 14 11:53:09 2016 +0100
block_device: Make refresh_conn_infos py3 compatible
Also add a simple test ensuring that refresh_
for each DriverVolumeBlo
Related-Bug: #1419577
Partially-
Change-Id: Ib1ff00e7f4f5b5
Changed in nova: | |
assignee: | Lee Yarwood (lyarwood) → Dan Smith (danms) |
Changed in nova: | |
assignee: | Dan Smith (danms) → Lee Yarwood (lyarwood) |
Fix proposed to branch: master
Review: https:/
Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https:/
Changed in nova: | |
status: | In Progress → Confirmed |
assignee: | Lee Yarwood (lyarwood) → nobody |
Changed in nova: | |
importance: | High → Medium |
Lee Yarwood (lyarwood) wrote : | #28 |
https:/
Changed in nova: | |
assignee: | nobody → Lee Yarwood (lyarwood) |
status: | Confirmed → In Progress |
Sean Dague (sdague) wrote : | #29 |
Automatically discovered version havana in description. If this is incorrect, please update the description to include 'nova version: ...'
tags: | added: openstack-version.havana |
Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https:/
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.
Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https:/
Reason: https:/
Changed in nova: | |
status: | In Progress → Fix Released |
Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.
Is this only in Havana or does it also reproduce on Icehouse/Juno ?