Bug #1419577 “when live-migrate failed, lun-id couldn't be rollb...” : Bugs : OpenStack Compute (nova)

Hyun Ha (raymon-ha) on 2015-02-09

description:	updated
affects:	cinder → nova-project

Hyun Ha (raymon-ha) on 2015-02-09

information type:

Private Security → Public Security

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2015-02-09:

#1

Since this report concerns a possible security risk, an incomplete security advisory task has been added while the core security reviewers for the affected project or projects confirm the bug and discuss the scope of any vulnerability along with potential solutions.

Is this only in Havana or does it also reproduce on Icehouse/Juno ?

Changed in ossa:
status:	New → Incomplete

Revision history for this message

Hyun Ha (raymon-ha) wrote on 2015-02-09:

#2

Hi, Tristan

Icehouse and Juno has the same bug.

please see below that i reported :
https://bugs.launchpad.net/cinder/+bug/1416314

when live-migration failed, the information of mapping between host and lun can be tangled.
this bug can affect not only security vulnerability but also data stability. (filesystem can be break when lun mis-mapped)

Thanks.

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-02-12:

#3

I agree that's a bug with critical consequences, however unless I'm mistaken, it's not a situation that can be triggered or predicted by an attacker, in which case it may not be considered a vulnerability ?

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2015-02-18:

#4

I agree with ttx here, without a way to make the migration process fail, this is a bug with security consequence, but not a vulnerability.

@Hahyun, is there a missing steps that an attacker can use to make the live migration (step 8) failed ?

Else let's triage this as a class D type of report and remove the advisory task. ( https://wiki.openstack.org/wiki/Vulnerability_Management#Incident_report_taxonomy )

Revision history for this message

Hyun Ha (raymon-ha) wrote on 2015-02-23:

#5

Hi,
Thank you for your comment.

The reason that I reported above issue as vulnerability is that attacker can attach others volume to his own VM on purpose.
There are many ways to make live-migration fail.

Firstly, there are two issues on havana (tag : 2013.2.3)
One issue is the bug with multipath rescan.(https://bugs.launchpad.net/nova/+bug/1362916)
The other one is that when volume is detached, multipath device couldn’t be deleted.
Due to these reasons, live-migration process will be failed in the situation that one vm live-migrate to another compute-node and go back to the original compute-node.

Secondly, if live-migration is executed while process keep using big size of memory by benchmark tool or something like that in VM instance
and then the waiting status of live-migration could be persisted, eventually live-migration will be failed.

There are some ways to make live-migration fail except I explained above.
Make NIC of compute-node down and then excute live-migration, live-migration is going to be failed for example (using multipath, iscsi)

Using rollback bug is just one way that attacker can attach others volume to his VM.
I think the importance thing is that nova attach volume with lun-id so that if lun-id might be changed with errors or by attackers, it occurs critical security issues.
please think about below situation.
attaker get the admin authority of nova DB.
change lun-id of connection_info in block_device_mapping table.
reboot hard his VM with volume changed lun-id.
finally attacker get others volume on his vm easily.

I think the root-cause of this bug is that nova use “lun-id” for mapping VM with volume.
lun-id is not unique and could be changed in attach/detach process because it is generaged dynamically.
I'd like to suggest that nova should attach volume to vm with "unique-id" of lun not lun-id.
And additionally, the bug that I reported should be fixed.

Users who have VM on Public cloud based on Openstack can feel their vm is unsafe, if they know about the possibility of volume mis-mapping because one compute-node have many different customers vm.
So, I think this issue should be triaged as a Class A type.

Thank you.

Hi, 
Thank you for your comment.

The reason that I reported above issue as vulnerability is that attacker can attach others volume to his own VM on purpose.
There are many ways to make live-migration fail.

Firstly, there are two issues on havana (tag : 2013.2.3)
One issue is the bug with multipath rescan.(https://bugs.launchpad.net/nova/+bug/1362916)
The other one is that when volume is detached, multipath device couldn’t be deleted.
Due to these reasons, live-migration process will be failed in the situation that one vm live-migrate to another compute-node and go back to the original compute-node.

Secondly, if live-migration is executed while process keep using big size of memory by benchmark tool or something like that in VM instance 
and then the waiting status of live-migration could be persisted, eventually live-migration will be failed.

There are some ways to make live-migration fail except I explained above.
Make NIC of compute-node down and then excute live-migration, live-migration is going to be failed for example (using multipath, iscsi)

Using rollback bug is just one way that attacker can attach others volume to his VM.
I think the importance thing is that nova attach volume with lun-id so that if lun-id might be changed with errors or by attackers, it occurs critical security issues.
please think about below situation.
attaker get the admin authority of nova DB.
change lun-id of connection_info in block_device_mapping table.
reboot hard his VM with volume changed lun-id.
finally attacker get others volume on his vm easily.

I think the root-cause of this bug is that nova use “lun-id” for mapping VM with volume.
lun-id is not unique and could be changed in attach/detach process because it is generaged dynamically.
I'd like to suggest that nova should attach volume to vm with "unique-id" of lun not lun-id.
And additionally, the bug that I reported should be fixed.

Users who have VM on Public cloud based on Openstack can feel their vm is unsafe, if they know about the possibility of volume mis-mapping because one compute-node have many different customers vm.
So, I think this issue should be triaged as a Class A type.

Thank you.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2015-02-23:

#6

Unless I'm misreading, you're suggesting potential exploits involving :

1. disconnecting physical network interfaces

2. gaining administrative access to the nova database

Our report taxonomy is not based on feelings and user impressions, but rather on the feasibility of exploiting a bug weighed against its implied risks.

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-02-26:

#7

I tend to agree that it becomes a security vulnerability if live-migrate regularly fails. Even if the leak can't be triggered or controlled, it is still a privacy issue. We issued one in the past for OSSA-2013-006:

http://security.openstack.org/ossa/OSSA-2013-006.html

Revision history for this message

Hyun Ha (raymon-ha) wrote on 2015-03-02:

#8

Hi,
Live-migrate regularly fails on Havana and all branch before Havana.
Live-migrate can be failed on Juno and Ice-House in specific condition as I reported above.
Thank you.

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-03-02:

#9

Agree that it's a vulnerability in Havana (since live-migration fails so often there). I wouldn't consider it a vulnerability in Icehouse/Juno, since you can't trigger live migration failure without administrative or physical access to the machines.

It is a bug with security consequences there, and it should be fixed as soon as possible.

Changed in nova-project:
status:	New → Confirmed

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-03-02:

#10

As far as OSSA is going, I'd rate this class C1 or D.

affects:	nova-project → nova
Changed in nova:
importance:	Undecided → High

Revision history for this message

Jeremy Stanley (fungi) wrote on 2015-03-09:

#11

Agreed on C1: this wouldn't qualify for an advisory since Havana is no longer supported by the VMT, but it's still something a distro carrying Havana packages of Nova might fix on their own and request a corresponding CVE to track.

information type:	Public Security → Public
tags:	added: security

Jeremy Stanley (fungi) on 2015-03-09

Changed in ossa:
status:	Incomplete → Won't Fix

Revision history for this message

Garth Mollett (gmollett) wrote on 2015-03-17:

#12

I'm going to go ahead and request a CVE for this on oss-sec at least for havana (which we [redhat] still support downstream) unless someone has a good reason not to? (or beats me to it)

Revision history for this message

Matthew Edmonds (edmondsw) wrote on 2015-04-03:

#13

per http://seclists.org/oss-sec/2015/q1/990:

For purposes of CVE, we typically don't think of vulnerabilities in the way expressed in https://bugs.launchpad.net/nova/+bug/1419577/comments/4 "without a way to make the migration process fail, this is a bug with security consequence, but not a vulnerability." In other words, for a CVE, the attacker can be a person who wishes to have an unauthorized volume attachment after the bug is triggered. The attacker does not need to be a person who has determined a reproducible way to trigger the bug.

Revision history for this message

Jeremy Stanley (fungi) wrote on 2015-04-03:

#14

And as I indicated in follow-up replies on that thread, the OpenStack VMT doesn't decide whether or not a bug is worthy of getting a CVE assigned (only whether or not we're going to embargo it and/or eventually issue a security advisory about it).

Matt Riedemann (mriedem) on 2015-05-26

tags:

added: live-migration volumes

Pawel Koniszewski (pawel-koniszewski) on 2015-05-29

tags:

added: live-migrate
removed: live-migration

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-05-29:

#15

I'm trying to sort this out a bit.

Looking at the nova.virt.libvirt.driver.pre_live_migration() method, I see it's connecting to a volume and the connection_info dictionary is updated in the nova.virt.libvirt.volume code, but I don't see where that connection_info dict comes back to the virt driver's pre_live_migration method and persists the change to the database.

This is where pre_live_migration() connects the volume:

http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/driver.py?id=2015.1.0#n5813

Let's assume we're using the LibvirtISCSIVolumeDriver volume driver, the connect_volume method in there will update the connection_info dict here:

http://git.openstack.org/cgit/openstack/nova/tree/nova/virt/libvirt/volume.py?id=2015.1.0#n483

That change never gets persisted back to the block_device_mapping table for the bdm instance, but we've connected the volume potentially on another host so if live migration fails and we never rollback the volume connection_info to the source host (before pre_live_migration), and reboot the instance, then the bdm will be recreated from what's in the database which will be wrong.

Revision history for this message

Justin Shepherd (jshepher) wrote on 2015-08-03:

#16

Should this bug remain open as it is targeted to a no longer supported version (havana)?

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-08-18:

#17

I have a feeling that this is fixed via https://review.openstack.org/#/c/211051/ .

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-08-18:

#18

Per comment 17, maybe not given that's post live migration which is only called for a successful live migration, the rollback is called for a failed live migration.

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-08-18:

#19

@Justin, per comment 16, this was reported against Havana but as far as I can tell this is not yet resolved in master (Liberty right now).

Revision history for this message

Matt Riedemann (mriedem) wrote on 2015-09-28:

#20

I'm wondering if the fix (https://review.openstack.org/#/c/202770/) for bug 1475411 plays here, i.e. the case that the live migration from A to B is considered successful even though we didn't disconnect the correct volumes, and then that causes the migration from B back to A to fail.

Paul Murray (pmurray) on 2015-11-06

tags:

added: live-migration
removed: live-migrate

lvmxh (shaohef) on 2015-11-17

Changed in nova:
assignee:	nobody → lvmxh (shaohef)

lvmxh (shaohef) on 2015-12-03

Changed in nova:
assignee:	lvmxh (shaohef) → nobody

Revision history for this message

Tobias Urdin (tobias-urdin) wrote on 2015-12-14:

#21

Can anybody please verify if my bug is a duplicate of this one? https://bugs.launchpad.net/nova/+bug/1525802

lvmxh (shaohef) on 2016-01-27

information type:

Public → Public Security

Revision history for this message

Jeremy Stanley (fungi) wrote on 2016-01-27:

#22

Since this bug was switched from public back to public security with no comment explaining why, I have reset it to public again. Please, whenever moving a bug to a security type, add a comment with your reasoning.

information type:

Public Security → Public

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2016-07-07:

#23

We've seen this downstream against RHEL OSP 7 (kilo) documented (mostly privately) in the following RHBZ:

iscsi details changed for cinder volume using EMCCLIISCSIDriver
https://bugzilla.redhat.com/show_bug.cgi?id=1353147

We manually reverted the changes to the target_luns to workaround the issue in this case. This still looks possible against master so I'm going to propose a change refreshing connection_info on the source host during _rollback_live_migration.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-07-07: Fix proposed to nova (master)

#24

Fix proposed to branch: master
Review: https://review.openstack.org/338929

Changed in nova:
assignee:	nobody → Lee Yarwood (lyarwood)
status:	Confirmed → In Progress

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-18: Related fix merged to nova (master)

#25

Reviewed: https://review.openstack.org/342111
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b83cae02ece4c338e09c3606c6ae69b715bd6f8c
Submitter: Jenkins
Branch: master

commit b83cae02ece4c338e09c3606c6ae69b715bd6f8c
Author: Lee Yarwood <email address hidden>
Date: Thu Jul 14 11:53:09 2016 +0100

block_device: Make refresh_conn_infos py3 compatible

Also add a simple test ensuring that refresh_connection_info is called
for each DriverVolumeBlockDevice derived device provided.

    Related-Bug: #1419577
    Partially-Implements: blueprint goal-python35
    Change-Id: Ib1ff00e7f4f5b599317d7111c322ce9af8a9a2b1

OpenStack Infra (hudson-openstack) on 2016-10-26

Changed in nova:
assignee:	Lee Yarwood (lyarwood) → Dan Smith (danms)

OpenStack Infra (hudson-openstack) on 2016-10-26

Changed in nova:
assignee:	Dan Smith (danms) → Lee Yarwood (lyarwood)

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-10-29: Fix proposed to nova (master)

#26

Fix proposed to branch: master
Review: https://review.openstack.org/391598

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2016-11-07: Change abandoned on nova (master)

#27

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.openstack.org/391598

Matt Riedemann (mriedem) on 2016-12-09

Changed in nova:
status:	In Progress → Confirmed
assignee:	Lee Yarwood (lyarwood) → nobody

Sean Dague (sdague) on 2016-12-09

Changed in nova:
importance:	High → Medium

Revision history for this message

Lee Yarwood (lyarwood) wrote on 2016-12-09:

#28

https://review.openstack.org/#/c/338929/ is the correct WIP review that I've been working on. It's currently waiting on https://review.openstack.org/#/c/389608/ so we can use the stashed bdms to reset connection_info without additional calls to Cinder.

OpenStack Infra (hudson-openstack) on 2017-01-17

Changed in nova:
assignee:	nobody → Lee Yarwood (lyarwood)
status:	Confirmed → In Progress

Revision history for this message

Sean Dague (sdague) wrote on 2017-06-27:

#29

Automatically discovered version havana in description. If this is incorrect, please update the description to include 'nova version: ...'

tags:

added: openstack-version.havana

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2017-08-01:

#30

Change abandoned by Sean Dague (<email address hidden>) on branch: master
Review: https://review.openstack.org/338929
Reason: This review is > 4 weeks without comment, and is not mergable in it's current state. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2018-07-02:

#31

Change abandoned by Lee Yarwood (<email address hidden>) on branch: master
Review: https://review.openstack.org/338929
Reason: https://review.openstack.org/#/c/551302/

Lee Yarwood (lyarwood) on 2020-09-15

Changed in nova:
status:	In Progress → Fix Released

Affects		Status	Importance	Assigned to	Milestone
	OpenStack Compute (nova)	Fix Released	Medium	Lee Yarwood
	OpenStack Security Advisory	Won't Fix	Undecided	Unassigned

OpenStack Compute (nova)

when live-migrate failed, lun-id couldn't be rollback in havana

Bug Description

CVE References

Duplicates of this bug

Other bug subscribers

Remote bug watches