Bug #1387543 “[OSSA 2015-015] Resize/delete combo allows to over...” : Bugs : OpenStack Compute (nova)

George Shuklin (george-shuklin) on 2014-10-30

description:

updated

Revision history for this message

Tristan Cacqueray (tristan-cacqueray) wrote on 2014-10-30:

#1

Thanks for the report! The OSSA tasks is set to incomplete pending project core security review.
At first glance, it's seems a valid DoS avenue...

Changed in ossa:
status:	New → Incomplete

Revision history for this message

George Shuklin (george-shuklin) wrote on 2015-03-16:

#2

Now that bug is unduplicated from https://bugs.launchpad.net/bugs/1392527, but very similar.

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-03-17:

#3

Yes, we might want to issue a single OSSA for both, if a patch can be logged for this one soon enough.

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-03-19:

#4

I think we can confirm it since the other was confirmed

Changed in ossa:
importance:	Undecided → Medium
status:	Incomplete → Confirmed

Revision history for this message

Tony Breeds (o-tony) wrote on 2015-03-27:

#5

Looks like a real issue. Priority set to high just to show it's behind existing issues.

Changed in nova:
importance:	Undecided → High
status:	New → Confirmed

Revision history for this message

Thierry Carrez (ttx) wrote on 2015-04-02:

#6

Is the proposed solution (abort migration (kill rsync/scp) as soon as instance is deleted) somethign that works for everyone ? Anyone up for proposing a patch that would do that ?

Revision history for this message

George Shuklin (george-shuklin) wrote on 2015-04-02:

#7

For me it works in manual mode (go to node and kill all scp/rsyncs in sight). Because this bugreport is private, most of the tenants do not know about this and that happens mostly with my own machines in lab.

There is no patch up to now. Sorry, I'm operator, it's too deep and complicated for me.

Fix in the https://bugs.launchpad.net/bugs/1392527 is working on destination host (as far as I understand), but scp or rsync should be killed on the source host.

Revision history for this message

Tushar Patil (tpatil) wrote on 2015-04-02:

#8

Download full text (4.1 KiB)

In the resize operation, during copying files from source to destination compute node scp/rsync processes are not aborted after the instance is deleted because linux kernel doesn't delete instance files physically until all processes using the file handle is closed completely. Hence rsync/scp process keeps on running until it transfers 100% of file data.

One way to abort rsync/scp process is to call truncate command and set the size of the file to 0 bytes before deleting the instance files in the delete operation. We have added code to truncate files to 0 bytes and found out that it aborts rsync/scp process and doesn't copy the file to the destination node.

Previously rsync/scp copies 100% of file data but after adding truncate command, if the instance is deleted during copying files using rsync process it raises following ProcessExecutionError

ProcessExecutionError: Unexpected error while running command.
   Command: rsync --sparse --compress
/opt/stack/data/nova/instances/49225011-aefb-4262-b89c-50daa967fe98_resize/d
isk
10.69.4.130:/opt/stack/data/nova/instances/49225011-aefb-4262-b89c-50daa967f
e98/disk
   Exit code: 23
   Stdout: u''
   Stderr: u'rsync: read errors mapping
"/opt/stack/data/nova/instances/49225011-aefb-4262-b89c-50daa967fe98_resize/
disk": No data available (61)\nfile has vanished:
"/opt/stack/data/nova/instances/49225011-aefb-4262-b89c-50daa967fe98_resize/
disk"\nrsync error: some files/attrs were not transferred (see previous
errors) (code 23) at main.c(1183) [sender=3.1.0]\n'

Also timing wise, we have observed rsync/scp processes are aborted early as compared to copying 100% of file but unfortunately the difference is not that big, it's merely 10-15% gain in time.

Results:

Following are the test results for rsync/scp processes for 25,50,75 and 100 GB disks.

*Command rsync, 25 GB*
Master Branch:
Processing Time: 9:40 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory

Truncate Patch:
Processing Time: 7:30 mins
Source Node Files: Deleted
Destination Node Files: empty instance directory

*Command scp, 25 GB*
Master Branch:
Processing Time: 5:11 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory

Truncate Patch:
Processing Time: 4:20 mins
Source Node Files: Deleted
Destination Node Files: disk file of 25 GB present in instance directory

*Command rsync, 50 GB*
Master Branch:
Processing Time: 17:23 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory

Truncate Patch:
Processing Time: 15:40 mins
Source Node Files: Deleted
Destination Node Files: empty instance directory

*Command scp, 50 GB*
Master Branch:
Processing Time: 11:25 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory

Truncate Patch:
Processing Time: 9:35 mins
Source Node Files: Deleted
Destination Node Files: disk file of 50 GB present in instance directory

*Command rsync, 75 GB*
Master Branch:
Processing Time: 25.53 mins
Source Node Files: Deleted
Destination Node Files: disk file of 75 GB present in instance directory

Truncate Pat...

In the resize operation, during copying files from source to destination compute node  scp/rsync processes are not aborted after the instance is deleted because linux kernel doesn't delete instance files physically until all processes using the file handle is closed completely. Hence rsync/scp process keeps on running until it transfers 100% of file data.