Libvirt migrations with rsync are slow

Bug #1478800 reported by Kalle Happonen on 2015-07-28
18
This bug affects 3 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Wishlist
Marian Horban

Bug Description

Setup:
CentOS 6 + RDO Icehouse (code seems to be the same in trunk)

When doing a nova migrate, the actual backing disk file is copied over with rsync. I assume the code came from this report
https://bugs.launchpad.net/nova/+bug/1025259

The rsync code uses the "-z" flag for compression. This is probably fine for cases with lightly used disks. However, with a disk full of content, it gets very slow. Rsync is not multithreaded so with a single E5-2670v2 core, we get ~12MB/s transfer speed (CPU bound). With the modest compression that is achieved this is significantly slower than no compression.

If possible, some speed test should be done without compression for disk files with different content. There might not be a reason to use compression here at all.

tags: added: live-migration
Kalle Happonen (kalle-happonen) wrote :

Actually this is not live migration. It's "nova migrate", i.e. shut down vm, qemu-img convert, rsync over file, start vm.

tags: added: migration
removed: live-migration
Marian Horban (mhorban) on 2015-08-05
Changed in nova:
status: New → Confirmed
Marian Horban (mhorban) wrote :

I tested migration of instances with and without compress flag in rsync in
nova/virt/libvirt/utils.py:copy_image function and figured out that migration without compression flag is faster.
Because now we transfer files between nodes in qcow2 format, so they already compressed and additional compression takes time.
I can see two different solution:
1. Remove compression flag from rsync command in nova/virt/libvirt/utils.py:copy_image function.
2. Do not change nova's code at all but configure rsync using rsyncd.conf file. We can specify parameter "dont compress" with suffix "_rbase". Before executing rsync command image is renamed with adding "_rbase" suffix. With such global configuration we can avoid compressing image even with --compress option in rsync command.

Changed in nova:
importance: Undecided → Wishlist
status: Confirmed → Triaged
assignee: nobody → Marian Horban (mhorban)

Fix proposed to branch: master
Review: https://review.openstack.org/209957

Changed in nova:
status: Triaged → In Progress

How much faster is it without compression? And how much CPU time is saved?

Do you have test scripts and benchmark results that we can use to try and reproduce your results?

Marian Horban (mhorban) wrote :

Efficiency of rsync with compression depends on many variables: content type, processor speed on both sides, network speed, etc.
So I made an investigation for comparison rsync with and without compression.
In this test I used two qcow images with the same size 10Gb:
1. image data consists of text files(copies of nova sources). This image could be compressed very well.
2. image contains copies of ubuntu server iso image. This image couldn't be compressed well.
In test I used processors are Intel i5 2.5Ghz, network speed 1000 Mbps.
In my test I copied each of images with and without rsync compression.
For tests rsync version 3.1.0 protocol version 31 was used.
Results:
1. rsync image with text data without compression:
real 3m37.876s
user 1m6.936s
sys 0m19.919s
user + sys = 87s

2. rsync image with text data with compression:
real 8m16.501s
user 6m55.878s
sys 0m19.695s
user + sys = 436s

3. rsync image with binary data without compression:
real 3m37.846s
user 1m6.098s
sys 0m20.485s
user + sys = 87s

4. rsync image with binary data with compression:
real 8m20.839s
user 6m34.998s
sys 0m20.554s
user + sys = 416s

Analizing results:
- I was wondering that rsync with compression faster for binary data than text data.
- With described environment I obtained huge advantage of compression-free rsync for both images.
Since rsync uses only one processor's core we can't not expect big improvement of compression speed soon.
My conclusion is that compression could be usefull for networks with speed less that 10Mb.
Since network with such a slow performance are obsolete rsync compression must be removed from code. Also this fix will allow us to decrease CPU consumption.

Reviewed: https://review.openstack.org/209957
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b7ad5a312a8cfbce60510f598a34da9358f25a4c
Submitter: Jenkins
Branch: master

commit b7ad5a312a8cfbce60510f598a34da9358f25a4c
Author: Marian Horban <email address hidden>
Date: Wed Aug 26 09:41:02 2015 -0400

    libvirt:Rsync compression removed

    During migration libvirt driver copies qcow image between nodes.
    Compression ratio of qcow disk image is small because such type
    of image doesn't allocate the whole image space to a file. It
    grows as data is added. But compression procedure takes many CPU
    time.

    Closes-Bug: #1478800
    Change-Id: Iabcbbb576f7e9411310c540badb805eb1bf21bf5

Changed in nova:
status: In Progress → Fix Committed
Thierry Carrez (ttx) on 2015-09-24
Changed in nova:
milestone: none → liberty-rc1
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2015-10-15
Changed in nova:
milestone: liberty-rc1 → 12.0.0
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers