[OSSA 2015-015] Resize/delete combo allows to overload nova-compute (CVE-2015-3241)
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
| OpenStack Compute (nova) |
High
|
Abhishek Kekane | ||
| Juno |
Undecided
|
Abhishek Kekane | ||
| Kilo |
Undecided
|
Abhishek Kekane | ||
| OpenStack Security Advisory |
Medium
|
Tristan Cacqueray |
Bug Description
If user create instance, and resize it to larger flavor and than delete that instance, migration process does not stop. This allow user to repeat operation many times, causing overload to affected compute nodes over user quota.
Affected installation: most drastic effect happens on 'raw-disk' instances without live migration. Whole raw disk (full size of the flavor) is copied during migration.
If user delete instance it does not terminate rsync/scp keeping disk backing file opened regardless of removal by nova compute.
Because rsync/scp of large disks is rather slow, it gives malicious user enough time to repeat that operation few hundred times, causing disk space depletion on compute nodes, huge impact on management network and so on.
Proposed solution: abort migration (kill rsync/scp) as soon, as instance is deleted.
Affected installation: Havana, Icehouse, probably Juno (not tested).
CVE References
description: | updated |
George Shuklin (george-shuklin) wrote : | #2 |
Now that bug is unduplicated from https:/
Thierry Carrez (ttx) wrote : | #3 |
Yes, we might want to issue a single OSSA for both, if a patch can be logged for this one soon enough.
Thierry Carrez (ttx) wrote : | #4 |
I think we can confirm it since the other was confirmed
Changed in ossa: | |
importance: | Undecided → Medium |
status: | Incomplete → Confirmed |
Tony Breeds (o-tony) wrote : | #5 |
Looks like a real issue. Priority set to high just to show it's behind existing issues.
Changed in nova: | |
importance: | Undecided → High |
status: | New → Confirmed |
Thierry Carrez (ttx) wrote : | #6 |
Is the proposed solution (abort migration (kill rsync/scp) as soon as instance is deleted) somethign that works for everyone ? Anyone up for proposing a patch that would do that ?
George Shuklin (george-shuklin) wrote : | #7 |
For me it works in manual mode (go to node and kill all scp/rsyncs in sight). Because this bugreport is private, most of the tenants do not know about this and that happens mostly with my own machines in lab.
There is no patch up to now. Sorry, I'm operator, it's too deep and complicated for me.
Fix in the https:/
Tushar Patil (tpatil) wrote : | #8 |
In the resize operation, during copying files from source to destination compute node scp/rsync processes are not aborted after the instance is deleted because linux kernel doesn't delete instance files physically until all processes using the file handle is closed completely. Hence rsync/scp process keeps on running until it transfers 100% of file data.
One way to abort rsync/scp process is to call truncate command and set the size of the file to 0 bytes before deleting the instance files in the delete operation. We have added code to truncate files to 0 bytes and found out that it aborts rsync/scp process and doesn't copy the file to the destination node.
Previously rsync/scp copies 100% of file data but after adding truncate command, if the instance is deleted during copying files using rsync process it raises following ProcessExecutio
ProcessExecutio
Command: rsync --sparse --compress
/opt/stack/
isk
10.69.4.
e98/disk
Exit code: 23
Stdout: u''
Stderr: u'rsync: read errors mapping
"/opt/stack/
disk": No data available (61)\nfile has vanished:
"/opt/stack/
disk"\nrsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1183) [sender=3.1.0]\n'
Also timing wise, we have observed rsync/scp processes are aborted early as compared to copying 100% of file but unfortunately the difference is not that big, it's merely 10-15% gain in time.
Results:
Following are the test results for rsync/scp processes for 25,50,75 and 100 GB disks.
*Command rsync, 25 GB*
Master Branch:
Processing Time: 9:40 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory
Truncate Patch:
Processing Time: 7:30 mins
Source Node Files: Deleted
Destination Node Files: empty instance directory
*Command scp, 25 GB*
Master Branch:
Processing Time: 5:11 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory
Truncate Patch:
Processing Time: 4:20 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory
*Command rsync, 50 GB*
Master Branch:
Processing Time: 17:23 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory
Truncate Patch:
Processing Time: 15:40 mins
Source Node Files: Deleted
Destination Node Files: empty instance directory
*Command scp, 50 GB*
Master Branch:
Processing Time: 11:25 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory
Truncate Patch:
Processing Time: 9:35 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory
*Command rsync, 75 GB*
Master Branch:
Processing Time: 25.53 mins
Source Node Files: Deleted
Destination Node Files: disk file of 75 GB present in instance directory
Truncate Pat...
George Shuklin (george-shuklin) wrote : | #9 |
Hello. Seems cool, but is this documented feature of rsync? If this a random bug/undocumented behaviour, relying on it to close CVE is too fragile.
And one more, please check the codepath with scp too. I just tried to truncate file on the source (and destination) hosts and this does not terminate scp. It actually hangs it.
Steps:
fallocate big -l 10G
scp big remote_server: &
truncate big --size 0
ssh remote_server truncate big --size 0
and scp fall down to '-stalled-' mode.
Abhishek Kekane (abhishek-kekane) wrote : | #10 |
Hi,
Attached is the Truncate patch which Tushar Patil has referred in comment #8.
@George,
We have tried with scp process as well. In case of SCP with this attached patch processing time of copying file is less but it copies the disk file to destination node where as in case of rsync it does not copies the disk file to destination node.
George Shuklin (george-shuklin) wrote : | #11 |
It does not work with scp. I've rechecked it twice on different systems: if you truncate source file for scp, it still continue to copy data. It displays '-stalled-' state, but network activity is very high for very long time after the file was truncated.
Try yourself:
fallocate big -l 10G
scp big remote_server: &
truncate big --size 0
Check network interface load (atop utility, for example) right after that script. It will be very high until copy is not finished.
If this trick does not work with scp, would it be possible to use rsync everywhere and drop scp?
George Shuklin (george-shuklin) wrote : | #13 |
In our installation we specifically disable rsync to force nova to use scp, because scp with proper crypto settings is much faster than rsync for raw disk (they should be copied as is at full size and rsync smart tricks with scanning just makes thinks slower).
I really like to see fix for scp.
In my opinion it should just send SIGHUP to scp (or rsync).
Proper solution would be to store PID of the process during call in nova/common/
For very dirty trick (but not more dirty than utils.exec(
George Shuklin (george-shuklin) wrote : | #14 |
P.S. And there should be fallback from 'fuser' to 'truncate' in case if fuser is not installed on the host.
Tushar Patil (tpatil) wrote : | #15 |
I have confirmed there is an issue with scp. Also I agree with the solution you have mentioned in comment #13, it would fix this issue for both scp and rsync. We will submit a patch with this proposed solution soon.
George Shuklin (george-shuklin) wrote : | #16 |
I opened question on stackoverflow: http://
Right now there is one concern: If we are sending kill -9 to all application accessing file ('fuser -k' sends kill -9), can we by some chance to send it to nova-compute or libvirtd?
Some scenarios:
1) Concurrent calls 'create snapshot' and 'delete' from user
2) libvirtd performing scheduled tasks on image
May be we should not call fuser, but do the same (search in /proc/PID/fd) and try to narrow application list to 'scp' and 'rsync'?
I know it sounds paranoid, but I don't want to see random kill -9 on production on unspecified process list.
Abhishek Kekane (abhishek-kekane) wrote : | #17 |
Hi,
I have attached patches which will kill the rsync/scp processes if instance is deleted while resizing.
1. oslo_concurrency patch, to store the pid of the process - 0001-Store-
2. nova patch, to pass a callback handler to store/remove the pid and kill the process - 0001-Kill-
To run this patch successfully on master you need to apply following patches to avoid the nova-compute startup issue:
1. https:/
2. https:/
Note:
1. In attached nova patch I have used os.kill() to kill the process, this can also be done by using process_utils execute call.
2. If rsync/scp process is killed while deleting the instance, instance folder remains on destination node.
If you apply periodic patch [1] mentioned in security bug [2] the above orphan instance folder gets deleted on nova-compute startup
[1] 0001-Delete-
[2] https:/
Abhishek Kekane (abhishek-kekane) wrote : | #18 |
Michael Still (mikal) wrote : | #19 |
Current state of play:
173794 is merged, 173897 is abandoned in favour of the merged 174288.
0001-Store-
0001-Kill-
Jeremy Stanley (fungi) wrote : | #20 |
In theory oslo.concurrency has a corresponding stable branch where they would backport the necessary functionality, and then tag a corresponding stable point release which either matches the current caps in stable requirements or in such a way that we can adjust that requirement to only cover the new stable release of the library.
We proposed to issue a single advisory for this issue and #1392527 which are similar in term of impact. However, in case this bug couldn't be fixed at the same time, we might want to issue a different advisory (in order to not block one because of the other). Here is the impact description draft:
Title: Nova instance migration process does not stop when instance is deleted
Reporter: George Shuklin (Webzilla LTD)
Products: Nova
Affects: versions through 2014.1.4, and 2014.2 versions through 2014.2.3, and version 2015.1.0
Description:
George Shuklin from Webzilla LTD reported a vulnerability in Nova migration process. By resizing and deleting an instance repeatedly an authenticated user may overcome his quota and overload Nova computes node resulting in a denial of service attack. All Nova setups are affected.
Changed in ossa: | |
status: | Confirmed → Triaged |
Abhishek Kekane (abhishek-kekane) wrote : | #22 |
Hi All,
For stable/kilo oslo.concurrency required version is >=1.8.0,<1.9.0, so is it possible to release a new version 1.8.1 with the required changes in oslo.concurrency to fix this security issue.
For stable/juno oslo.concurrency is not used, instead changes need to be done in processutils module of oslo-incubator.
Is it possible to propose a separate patch for stable/juno by making changes in oslo-incubator processutils module?
Please let me know your opinion for the same.
Changed in nova: | |
assignee: | nobody → Michael Still (mikalstill) |
Jeremy Stanley (fungi) wrote : | #23 |
Yes, in theory we could update nova stable/juno from a fix on the stable/juno branch of oslo-incubator.
Changed in nova: | |
assignee: | Michael Still (mikalstill) → Tony Breeds (o-tony) |
Tony Breeds (o-tony) wrote : | #24 |
I subscribed Dims as the oslo PTL to get his advice on the oslo portions.
I've uploaded: https:/
Once that lands in master I'll follow the appropriate process to get that added to kilo and juno. as outlined in comments 22 and 23
Tony Breeds (o-tony) wrote : | #25 |
Oh rats I left "Closes-bug: #1387543" in the commit message.
Since this public commit message really disclose the bug: https:/
A CVE has been requested with impact description from comment #21.
Changed in ossa: | |
status: | Triaged → In Progress |
summary: |
- Resize/delete combo allows to overload nova-compute + Resize/delete combo allows to overload nova-compute (CVE-2015-3241) |
information type: | Private Security → Public Security |
Abhishek Kekane (abhishek-kekane) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #27 |
Hi,
Oslo concurrency patch is merged https:/
Should I attach the patch to bug or submit it to gerrit as this bug is already marked as public.
Please suggest.
Hi Abhishek,
Now that this bug is public, please submit fix and backports directly to gerrit.
thanks in advance!
Abhishek Kekane (abhishek-kekane) wrote : | #29 |
Hi Trisatan,
Thank you for the update, I will submit the patch to gerrit as soon as I finish with the functional testing.
Changed in nova: | |
assignee: | Tony Breeds (o-tony) → Abhishek Kekane (abhishek-kekane) |
status: | Confirmed → In Progress |
Michael Still (mikal) wrote : | #30 |
We need to bump global requirements so we can use the newly releases oslo.concurrency in nova to fix this. That review is at https:/
Changed in nova: | |
assignee: | Abhishek Kekane (abhishek-kekane) → Michael Still (mikalstill) |
Related fix proposed to branch: stable/juno
Review: https:/
Tristan Cacqueray (tristan-cacqueray) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #32 |
It seems that we are still missing:
* kilo requirement update
* kilo change to use the callbacks
* juno change to use the callbacks
Changed in nova: | |
assignee: | Michael Still (mikalstill) → Abhishek Kekane (abhishek-kekane) |
Davanum Srinivas (DIMS) (dims-v) wrote : | #33 |
fyi, we released oslo.concurrency 1.8.1 (stable/kilo) with support for the on_execute and on_completion today
Fix proposed to branch: master
Review: https:/
Changed in nova: | |
assignee: | Abhishek Kekane (abhishek-kekane) → Nikola Đipanov (ndipanov) |
Tushar Patil (tpatil) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #35 |
Abhishek has already submitted patch [1] to fix this issue but it looks like it didn't show up in this bug.
Change abandoned by Nikola Dipanov (<email address hidden>) on branch: master
Review: https:/
Reason: Abandoning in favor of https:/
Changed in nova: | |
assignee: | Nikola Đipanov (ndipanov) → Abhishek Kekane (abhishek-kekane) |
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: master
commit 7ab75d5b0b75fc3
Author: abhishekkekane <email address hidden>
Date: Mon Jul 6 01:51:26 2015 -0700
libvirt: Kill rsync/scp processes before deleting instance
In the resize operation, during copying files from source to
destination compute node scp/rsync processes are not aborted after
the instance is deleted because linux kernel doesn't delete instance
files physically until all processes using the file handle is closed
completely. Hence rsync/scp process keeps on running until it
transfers 100% of file data.
Added new module instancejobtracker to libvirt driver which will add,
remove or terminate the processes running against particular instances.
Added callback methods to execute call which will store the pid of
scp/rsync process in cache as a key: value pair and to remove the
pid from the cache after process completion. Process id will be used to
kill the process if it is running while deleting the instance. Instance
uuid is used as a key in the cache and pid will be the value.
SecurityImpact
Closes-bug: #1387543
Change-Id: Ie03acc00a7c904
Changed in nova: | |
status: | In Progress → Fix Committed |
tags: | added: juno-backport-potential |
Abhishek Kekane (abhishek-kekane) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #38 |
Hi,
To backport [1] patch to stable/kilo and stable/juno branches following steps are essential.
stable/kilo:
1. Backport oslo.concurrency change [2] to stable/kilo - Done (https:/
2. Bump oslo_concurrency version to be used in nova
3. Backport [1] patch to stable/kilo
stable/juno:
1. Backport oslo.concurrency change [2] to stable/juno - Done (https:/
2. Once above backport is merged, then abandon patch [3] and sync these changes in nova stable/juno (similar to patch [3])
3. Backport [1] patch to stable/juno
[1] https:/
[2] https:/
[3] https:/
Related fix proposed to branch: stable/juno
Review: https:/
Abhishek Kekane (abhishek-kekane) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #40 |
Hi,
oslo.concurrency stable/kilo patch https:/
Now it requires new release of oslo.concurrency to backport nova patch [1] to stable/kilo branch.
Davanum Srinivas (DIMS) (dims-v) wrote : | #41 |
1.8.2 of oslo.concurrency has been cut, please add a review to update global-
In stable/kilo, nova/requirements and requirements/
oslo.concurrenc
Do we need to force >=1.8.2 ?
Change abandoned by Tristan Cacqueray (<email address hidden>) on branch: stable/juno
Review: https:/
Reason: Replaced by https:/
Abhishek Kekane (abhishek-kekane) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #44 |
Hi,
As per comment from Matt on patch https:/
changes from oslo.concurrency stable/juno branch to oslo-incubatore stable/juno branch.
[1] https:/
As of now on oslo-incubator stable/juno branch following tests are failing
tests.unit.
tests.unit.
tests.unit.
These tests are passing on master branch, please let me know what can be done in this case.
Fix proposed to branch: stable/kilo
Review: https:/
Tristan Cacqueray (tristan-cacqueray) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #46 |
To sum-up:
1/ Required Oslo changes:
* Ifc23325eddb523
-> [oslo.concurrency] merged in master/kilo/juno
-> [oslo.incubator] *in progress* in juno
* I22b2d7bde87972
-> [oslo.concurrency] merged in master/kilo/juno
* Ica74dd6c35e6bd
-> [nova] *in progress* for juno
2/ Requirements bump:
* I08693891b2b4c1
-> [nova] merged in master, *patch needed* for kilo
3/ Actual Nova fix:
* Ie03acc00a7c904
-> [nova] merged in master, *in progress* for kilo, *patch needed* for juno
4/ What is missing:
* The priority is the oslo.incubator change ( https:/
* Once this gets in, we can update the sync ( https:/
* Seems like the nova requirements.txt olso bump needs to be adjusted, 2.1.0 does not have the "ensure on_completion is called" fix...
* The kilo fix ( https:/
* The juno fix ( should depend on the sync )
Am I missing something here ?
Abhishek Kekane (abhishek-kekane) wrote : | #47 |
Hi Tristan,
Thank you for summary,
We also need * I22b2d7bde87972
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/kilo
commit b5020a047fc487f
Author: abhishekkekane <email address hidden>
Date: Mon Jul 6 01:51:26 2015 -0700
libvirt: Kill rsync/scp processes before deleting instance
In the resize operation, during copying files from source to
destination compute node scp/rsync processes are not aborted after
the instance is deleted because linux kernel doesn't delete instance
files physically until all processes using the file handle is closed
completely. Hence rsync/scp process keeps on running until it
transfers 100% of file data.
Added new module instancejobtracker to libvirt driver which will add,
remove or terminate the processes running against particular instances.
Added callback methods to execute call which will store the pid of
scp/rsync process in cache as a key: value pair and to remove the
pid from the cache after process completion. Process id will be used to
kill the process if it is running while deleting the instance. Instance
uuid is used as a key in the cache and pid will be the value.
Conflicts:
SecurityImpact
Closes-bug: #1387543
Change-Id: Ie03acc00a7c904
(cherry picked from commit 7ab75d5b0b75fc3
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/juno
commit bf23643e36c8764
Author: abhishekkekane <email address hidden>
Date: Sat Aug 8 02:28:50 2015 -0700
Sync process utils from oslo for execute callbacks
---
The sync pulls in the following changes:
Ifc23325 Add 2 callbacks to processutils.
I22b2d7b processutils: ensure on_completion callback is always called
I59d5799 Let oslotest manage the six.move setting for mox
I245750f Remove `processutils` dependency on `log`
Ia5bb418 Fix exception message in openstack.
---
Related-Bug: 1387543
Change-Id: I22b2d7bde87972
tags: | added: in-stable-juno |
Fix proposed to branch: stable/juno
Review: https:/
Reviewed: https:/
Committed: https:/
Submitter: Jenkins
Branch: stable/juno
commit 539693e40388c47
Author: abhishekkekane <email address hidden>
Date: Mon Jul 6 01:51:26 2015 -0700
libvirt: Kill rsync/scp processes before deleting instance
In the resize operation, during copying files from source to
destination compute node scp/rsync processes are not aborted after
the instance is deleted because linux kernel doesn't delete instance
files physically until all processes using the file handle is closed
completely. Hence rsync/scp process keeps on running until it
transfers 100% of file data.
Added new module instancejobtracker to libvirt driver which will add,
remove or terminate the processes running against particular instances.
Added callback methods to execute call which will store the pid of
scp/rsync process in cache as a key: value pair and to remove the
pid from the cache after process completion. Process id will be used to
kill the process if it is running while deleting the instance. Instance
uuid is used as a key in the cache and pid will be the value.
Conflicts:
Note: The required unit-tests are manually added to the below path,
as new path for unit-tests is not present in stable/juno release.
nova/
nova/
SecurityImpact
Closes-bug: #1387543
Change-Id: Ie03acc00a7c904
(cherry picked from commit 7ab75d5b0b75fc3
(cherry picked from commit b5020a047fc487f
Tristan Cacqueray (tristan-cacqueray) wrote : Re: Resize/delete combo allows to overload nova-compute (CVE-2015-3241) | #52 |
Great, seems like all required changes are now merged.
In the upcoming OSSA ( https:/
Thanks everyone :)
Changed in ossa: | |
status: | In Progress → Fix Committed |
assignee: | nobody → Tristan Cacqueray (tristan-cacqueray) |
summary: |
- Resize/delete combo allows to overload nova-compute (CVE-2015-3241) + [OSSA 2015-015] Resize/delete combo allows to overload nova-compute + (CVE-2015-3241) |
Changed in ossa: | |
status: | Fix Committed → Fix Released |
George Shuklin (george-shuklin) wrote : | #53 |
Hello.
CVE page does not filled yet (http://
Jeremy Stanley (fungi) wrote : | #54 |
It's not uncommon for MITRE to take months or even years to fill in CVE details. We have no control over this.
Changed in nova: | |
milestone: | none → liberty-3 |
status: | Fix Committed → Fix Released |
Changed in nova: | |
milestone: | liberty-3 → 12.0.0 |
Thanks for the report! The OSSA tasks is set to incomplete pending project core security review.
At first glance, it's seems a valid DoS avenue...