Fix QEMU cache mode used for image conversion and Nova instances

Bug #1818847 reported by Kashyap Chamarthy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Fix Released
Medium
Kashyap Chamarthy

Bug Description

Nova uses QEMU's disk image cache modes in two main areas:

(1) When decicding what cache mode to use for the target disk image when
    converting (using `qemu-img convert`) images from one format to
    another (qcow2 <-> raw).

    See unprivileged_convert_image() in nova/privsep/qemu.py.

(2) When configuring cache modes for running guests (Nova instances).
    Nova tells libvirt what cache mode to use, and libvirt will in turn
    configure block devices via QEMU (using its '-drive' command-line
    option).

    See disk_cachemode() in nova/virt/libvirt/driver.py. (And also for
    "volume drivers" like SMBFS and Virtuozzo Storage also use
    'writethrough' -- refer smbfs.py and vzstorage.py.)

In both cases Nova uses QEMU's a combination of cache modes 'none' and
'writethrough'. But that is incorrect, because of our misunderstanding
of how cache modes work. E.g. Nova's libvirt driver currently assumes
(refer disk_cachemode()) that 'writethrough' and 'none' cache modes have
the same behaviour with respect to host crash safety, which is not at
all true.

Fix these wrong assumptions.

(Also consult the QEMU Block Layer developers to double-check the
behaviour of cache modes and where they are applicable.)

Tags: libvirt
summary: - Fix QEMU cache mode for image conversion and Nova instances
+ Fix QEMU cache mode used for image conversion and Nova instances
tags: added: libvirt
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (master)

Fix proposed to branch: master
Review: https://review.openstack.org/641981

Changed in nova:
assignee: nobody → Kashyap Chamarthy (kashyapc)
status: New → In Progress
Changed in nova:
importance: Undecided → Low
Changed in nova:
importance: Low → Medium
Changed in nova:
assignee: Kashyap Chamarthy (kashyapc) → Sylvain Bauza (sylvain-bauza)
Changed in nova:
assignee: Sylvain Bauza (sylvain-bauza) → Kashyap Chamarthy (kashyapc)
Changed in nova:
assignee: Kashyap Chamarthy (kashyapc) → Eric Fried (efried)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)
Download full text (3.4 KiB)

Reviewed: https://review.openstack.org/640781
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=e7b64eaad82db38dd46f586b650da4ddde42533b
Submitter: Zuul
Branch: master

commit e7b64eaad82db38dd46f586b650da4ddde42533b
Author: Kashyap Chamarthy <email address hidden>
Date: Thu Feb 28 12:33:12 2019 +0100

    qemu: Make disk image conversion dramatically faster

    tl;dr: Use 'writeback' instead of 'writethrough' as the cache mode of
    the target image for `qemu-img convert`. Two reasons: (a) if the image
    conversion completes succesfully, then 'writeback' calls fsync() to
    safely write data to the physical disk; and (b) 'writeback' makes the
    image conversion a _lot_ faster.

    Back-of-the-envelope "benchmark" (on an SSD)
    --------------------------------------------

    (Ran both the tests thrice each; version: qemu-img-2.11.0)

    With 'writethrough':

        $> time (qemu-img convert -t writethrough -f qcow2 -O raw \
                Fedora-Cloud-Base-29.qcow2 Fedora-Cloud-Base-29.raw)
        real 1m43.470s
        user 0m8.310s
        sys 0m3.661s

    With 'writeback':

        $> time (qemu-img convert -t writeback -f qcow2 -O raw \
                Fedora-Cloud-Base-29.qcow2 5-Fedora-Cloud-Base-29.raw)

        real 0m7.390s
        user 0m5.179s
        sys 0m1.780s

    I.e. ~103 seconds of elapsed wall-clock time for 'writethrough' vs. ~7
    seconds for 'writeback' -- IOW, 'writeback' is nearly _15_ times faster!

    Details
    -------

    Nova commit e6ce9557f84cdcdf4ffdd12ce73a008c96c7b94a ("qemu-img do not
    use cache=none if no O_DIRECT support") was introduced to make instances
    boot on filesystems that don't support 'O_DIRECT' (which bypasses the
    host page cache and flushes data directly to the disk), such as 'tmpfs'.
    In doing so it introduced the 'writethrough' cache for the target image
    for `qemu-img convert`.

    This patch proposes to change that to 'writeback'.

    Let's addresses the 'safety' concern:

      "What about data integrity in the event of a host crash (especially
       on shared file systems such as NFS)?"

    Answer: If the host crashes mid-way during image conversion, then
    neither "data integrity" nor the cache mode in use matters. But if the
    image conversion completes _succesfully_, then 'writeback' will safely
    write the data to the physical disk, just as 'writethough' does.

    So we are as safe as we can, but with the extra benefit of image
    conversion being _much_ faster.

            * * *

    The `qemu-img convert` command defaults to 'cache=writeback' for the
    source image. And 'cache=unsafe' for the target, because if `qemu-img`
    "crashes during the conversion, the user will throw away the broken
    output file anyway and start over"[1]. And `qemu-img convert`
    supports[2] fsync() for the target image since QEMU 1.1 (2012).

    [1] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=1bd8e175
        -- "qemu-img convert: Use cache=unsafe for output image"
    [2] https://git.qemu.org/?p=qemu.git;a=commitdiff;h=8...

Read more...

Changed in nova:
status: In Progress → Fix Released
Changed in nova:
assignee: Eric Fried (efried) → Kashyap Chamarthy (kashyapc)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 19.0.0.0rc1

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to nova (master)

Reviewed: https://review.openstack.org/641981
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=b9dc86d8d646472195070022ff7ae4c372bef4ca
Submitter: Zuul
Branch: master

commit b9dc86d8d646472195070022ff7ae4c372bef4ca
Author: Kashyap Chamarthy <email address hidden>
Date: Mon Mar 4 17:20:53 2019 +0100

    libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable

    When configuring QEMU cache modes for Nova instances, we use
    'writethrough' when 'none' is not available. But that's not correct,
    because of our misunderstanding of how cache modes work. E.g. the
    function disk_cachemode() in the libvirt driver assumes that
    'writethrough' and 'none' cache modes have the same behaviour with
    respect to host crash safety, which is not at all true.

    The misunderstanding and complexity stems from not realizing that each
    QEMU cache mode is a shorthand to toggle *three* booleans. Refer to the
    convenient cache mode table in the code comment (in
    nova/virt/libvirt/driver.py).

    As Kevin Wolf (thanks!), QEMU Block Layer maintainer, explains (I made
    a couple of micro edits for clarity):

        The thing that makes 'writethrough' so safe against host crashes is
        that it never keeps data in a "write cache", but it calls fsync()
        after _every_ write. This is also what makes it horribly slow. But
        'cache=none' doesn't do this and therefore doesn't provide this kind
        of safety. The guest OS must explicitly flush the cache in the
        right places to make sure data is safe on the disk. And OSes do
        that.

        So if 'cache=none' is safe enough for you, then 'cache=writeback'
        should be safe enough for you, too -- because both of them have the
        boolean 'cache.writeback=on'. The difference is only in
        'cache.direct', but 'cache.direct=on' only bypasses the host kernel
        page cache and data could still sit in other caches that could be
        present between QEMU and the disk (such as commonly a volatile write
        cache on the disk itself).

    So use 'writeback' mode instead of the debilitatingly slow
    'writethrough' for cases where the O_DIRECT-based 'none' is unsupported.

    Do the minimum required update to the `disk_cachemodes` config help
    text. (In a future patch, rewrite the cache modes documentation to fix
    confusing fragments and outdated information.)

    Closes-Bug: #1818847
    Change-Id: Ibe236988af24a3b43508eec4efbe52a4ed05d45f
    Signed-off-by: Kashyap Chamarthy <email address hidden>
    Looks-good-to-me'd-by: Kevin Wolf <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to nova (master)

Reviewed: https://review.openstack.org/643376
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=44edfa045c69b18b0a991475cb9c1d87b5941d52
Submitter: Zuul
Branch: master

commit 44edfa045c69b18b0a991475cb9c1d87b5941d52
Author: Kashyap Chamarthy <email address hidden>
Date: Thu Mar 14 14:28:12 2019 +0100

    libvirt: vzstorage: Use 'writeback' QEMU cache mode

    The 'writethrough' cache mode is terribly slow. Use the saner
    alternative, "writeback" cache mode.

    See further details in the documentation of change Ibe236988af2
    ("libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable).

    Related-Bug: #1818847

    Change-Id: I5aae681bc8e8feb7703a89f91b942360dd69c35d
    Signed-off-by: Kashyap Chamarthy <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.openstack.org/643377
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=425f17e2af864c8c3068044be70f05180ec79b1b
Submitter: Zuul
Branch: master

commit 425f17e2af864c8c3068044be70f05180ec79b1b
Author: Kashyap Chamarthy <email address hidden>
Date: Thu Mar 14 14:20:33 2019 +0100

    libvirt: smbfs: Use 'writeback' QEMU cache mode

    The 'writethrough' cache mode is terribly slow. Use the saner
    alternative, "writeback" cache mode.

    See further details in the documentation of change Ibe236988af2
    ("libvirt: Use 'writeback' QEMU cache mode when 'none' is not viable).

    Related-Bug: #1818847

    Change-Id: I2d84c8576c01202fa4ec73a57aa739e2a2f66d06
    Signed-off-by: Kashyap Chamarthy <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 20.0.0.0rc1

This issue was fixed in the openstack/nova 20.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to nova (stable/stein)

Fix proposed to branch: stable/stein
Review: https://review.opendev.org/747163

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on nova (stable/stein)

Change abandoned by "Elod Illes <email address hidden>" on branch: stable/stein
Review: https://review.opendev.org/c/openstack/nova/+/747163
Reason: This branch transitioned to End of Life for this project, open patches needs to be closed to be able to delete the branch.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.