qemu-img "-t none" parameter fails image conversion on tmpfs

Bug #1375487 reported by Eric Harney
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
High
John Griffith

Bug Description

I ran into this running the "test_snapshot_create_with_volume_in_use" test on the RBD driver.

We pass "-t none" to qemu_img which enables O_DIRECT on the output file. If the file system does not support this, the image convert operation fails.

On my machine, the output is in /tmp/ which is mounted with tmpfs, and removing "-t none" allows conversion to work.

We should not pass this option if it is not supported by the destination. (Similar to how we test for O_DIRECT support with dd commands in copy_volume.)

strace will show:
28785 open("/tmp/tmpzP2s8_", O_RDWR|O_DIRECT|O_CLOEXEC) = -1 EINVAL (Invalid argument)
when this happens, resulting in the error:
ImageCopyFailure: Failed to copy image to volume: qemu-img: Could not open '/tmp/tmpzP2s8_': Invalid argument

Introduced in Juno by:
https://review.openstack.org/#/c/112033/
5c5f3c0 Avoid using the disk cache on volume initialisation

Eric Harney (eharney)
tags: added: juno-rc-potential
Revision history for this message
Jon Bernard (jbernard) wrote :

Why not remove the use of O_DIRECT, unless someone has a supporting benchmark (which I doubt), there's likely to be no measurable performance gain. If we're worried about efficient page cache management, then we have other options besides O_DIRECT.

Changed in cinder:
milestone: none → juno-rc2
importance: Undecided → High
Revision history for this message
Duncan Thomas (duncan-thomas) wrote :

-t is needed for the c-group rate limiting code, I believe? Removing it completely will entirely break that. It also solved some data consistency problems for certain drivers.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/126637

Changed in cinder:
status: New → In Progress
Revision history for this message
John Griffith (john-griffith) wrote :

So just a recap, this was originally added due to a customer issue using StorWiz found that the cache wasn't flushed before disconnect causing a bogus image create on the volume. So the '-t none' addresses that.

I've added a check to make sure the device support direct before setting this now so it should cover both cases.

Changed in cinder:
assignee: nobody → John Griffith (john-griffith)
Revision history for this message
Eric Harney (eharney) wrote :

> -t is needed for the c-group rate limiting code, I believe? Removing it completely will entirely break that.

I saw this comment in the code but can't find any explanation for it... does anyone know where this comes from?

Revision history for this message
John Griffith (john-griffith) wrote :

See my post in comment #4

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (proposed/juno)

Fix proposed to branch: proposed/juno
Review: https://review.openstack.org/127116

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.openstack.org/126637
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=c42273fbc1983b146180c82b8a34b0d832a6f431
Submitter: Jenkins
Branch: master

commit c42273fbc1983b146180c82b8a34b0d832a6f431
Author: John Griffith <email address hidden>
Date: Tue Oct 7 11:49:58 2014 -0600

    Make sure device support Direct before setting

    We added '-t none' option to the qemu-img convert operation
    in image_utils.py a while back to accomodate a couple of
    backend devices that didn't flush writes on disconnect.
    (Change: I7a04f683add8c23b9125fe837c4048ccc3ac224d)

    The only problem here is that some backend devices don't
    support Direct mode and raise an exception and fail when
    setting this option.

    This patch adds a simple check using dd to see if the dest
    supports the Direct flag and only sets '-t none' if the device
    does in fact support it.

    Additionally it was brought up that even yet other backends
    are using file devices not blk devices. In their case setting
    Direct will still work, however it's sub-optimal as qemu-convert
    has internal mechanisms to make sure flushing etc are done
    correctly and efficiently for those devices. So to accomodate
    that particular use case I'm also adding a check if blk dev
    that can be used for determining whether to set Direct for the
    qemu-convert process.

    Change-Id: I34127ac373ceadcfb6fc2662628b1a91eb7b0046
    Closes-Bug: 1375487

Changed in cinder:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (proposed/juno)

Reviewed: https://review.openstack.org/127116
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=aaecfcf15e6b9defde5822453f2ae97aaf959408
Submitter: Jenkins
Branch: proposed/juno

commit aaecfcf15e6b9defde5822453f2ae97aaf959408
Author: John Griffith <email address hidden>
Date: Tue Oct 7 11:49:58 2014 -0600

    Make sure device support Direct before setting

    We added '-t none' option to the qemu-img convert operation
    in image_utils.py a while back to accomodate a couple of
    backend devices that didn't flush writes on disconnect.
    (Change: I7a04f683add8c23b9125fe837c4048ccc3ac224d)

    The only problem here is that some backend devices don't
    support Direct mode and raise an exception and fail when
    setting this option.

    This patch adds a simple check using dd to see if the dest
    supports the Direct flag and only sets '-t none' if the device
    does in fact support it.

    Additionally it was brought up that even yet other backends
    are using file devices not blk devices. In their case setting
    Direct will still work, however it's sub-optimal as qemu-convert
    has internal mechanisms to make sure flushing etc are done
    correctly and efficiently for those devices. So to accomodate
    that particular use case I'm also adding a check if blk dev
    that can be used for determining whether to set Direct for the
    qemu-convert process.

    Change-Id: I34127ac373ceadcfb6fc2662628b1a91eb7b0046
    Closes-Bug: 1375487
    (cherry picked from commit c42273fbc1983b146180c82b8a34b0d832a6f431)

Changed in cinder:
status: Fix Committed → Fix Released
Thierry Carrez (ttx)
Changed in cinder:
milestone: juno-rc2 → 2014.2
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/128920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by Mike Perez (<email address hidden>) on branch: master
Review: https://review.openstack.org/128920

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)
Download full text (11.8 KiB)

Reviewed: https://review.openstack.org/128920
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=66494f54112fdfa135b3974c75aa388c8d1fb49e
Submitter: Jenkins
Branch: master

commit be3d4604dc0566e0838959d998ff1d37755de6d3
Author: Tomoki Sekiyama <email address hidden>
Date: Tue Oct 14 19:09:44 2014 -0400

    Fix LVM iSCSI driver tgtadm CHAP authentication

    Currently CHAP Authentication in LVM iSCSI driver with tgtadm does not work.
    This is because the tgtadm helper creates the target configuration file
    with an 'IncomingUser' entry, which is ignored by tgtd.
    This patch fixes it to 'incominguser'.

    Change-Id: I14871985a2a916834122f849238f05b75726bc1a
    Closes-Bug: #1329214
    (cherry picked from commit e3563891545c801726d227f752cf99488ed5c7dd)

commit f7ee62cc58d8b642af67510a310f6259492a4508
Author: Mitsuhiro Tanino <email address hidden>
Date: Tue Oct 14 12:41:41 2014 -0400

    Export cinder volumes only if the status is 'in-use'

    Currently, cinder volumes are exported both 'in-use' and 'available'
    after restarting cinder-volume service.
    This behavior was introduced following commit.

      commit ffefe18334a9456250e1b6ff88b7b47fb366f374
      Author: Zhiteng Huang <email address hidden>
      Date: Sat Aug 23 18:32:57 2014 +0000

    If the volumes are attached to nova instances, they should be exported
    via tgtd after restarting cinder-volume.
    But the volumes which are not attached to instances must not be exported
    because everyone can connect these volumes.

    This patch changes volume export behavior that exports a volume only if
    the volume status is 'in-use'.

    Change-Id: I4c598c240b9290c81bd8001e5a0720c8c329aeb9
    Signed-off-by: Mitsuhiro Tanino <email address hidden>
    Closes-bug: #1381106
    (cherry picked from commit e2f28b967910625432be0eab6a851adf53ac58ea)

commit 01e7c516852e53df661b2eedc970c327c1ff10ce
Author: Vipin Balachandran <email address hidden>
Date: Fri Oct 10 23:06:27 2014 +0530

    Revert "Relocate volume to compliant datastore"

    Commit 4be8913520f5e9fe4109ade101da9509e4a83360 introduced a regression
    which causes failures during cinder volume re-attach. This patch reverts
    commit 4be8913520f5e9fe4109ade101da9509e4a83360 as an immediate fix.

    Closes-Bug: #1379830
    Change-Id: I5dfbd45533489c3c81db8d256bbfd2f85614a357
    (cherry picked from commit 48cb82971e0418f9a629e2b39d0433dc2c0e6919)

commit 900d49723f65e87658381ff955559f54ac98c487
Author: Andreas Jaeger <email address hidden>
Date: Thu Oct 9 12:25:28 2014 +0200

    Updated translations

    Commands run:-
    $ python setup.py extract_messages
    $ python setup.py update_catalog --no-fuzzy-matching \
      --ignore-obsolete=true
    $ source \
      ../openstack-infra/project-config/jenkins/scripts/common_translation_update.sh
    $ setup_loglevel_vars
    $ cleanup_po_files cinder

    Change-Id: I73f3bdccb4be98df95fa853864e465f4d83a8884

commit 8e94aaa2b28b491314fe8642061ac73e3fe8e966
Author: Navneet Singh <email address hidden>
Date: Thu Aug 28 16:03:41 2014 +0530

    NetApp fix eseries unit test mock clean

 ...

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.