Spawn may fail when cache=none on block device with logical block size > 512

Bug #1801702 reported by Alexandre arents on 2018-11-05
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
Medium
Dr. Jens Harbott
Ocata
Medium
Elod Illes
Pike
Medium
Dr. Jens Harbott
Queens
Medium
Dr. Jens Harbott
Rocky
Medium
Dr. Jens Harbott

Bug Description

Description
===========
When we spawn instances without cache enabled (cache='none') on a file system
there a check in nova code that test if file system support direct IO:
https://github.com/openstack/nova/blob/master/nova/privsep/utils.py#L34
Because this test use 512b alignment size it seems to failed on newer block device that have
logical block size > 512b like nvme:

parted /dev/nvme0n1 print | grep "Sector size"
Sector size (logical/physical): 4096B/4096B

reason should be that alignement size of direct io must be a multiple of logical block size of underlying device (not of fs block size) as explain here:

http://man7.org/linux/man-pages/man2/open.2.html
 O_DIRECT
       ...
       Under Linux 2.4, transfer sizes, and the alignment of the user buffer
       and the file offset must all be multiples of the logical block size
       of the filesystem. Since Linux 2.6.0, alignment to the logical block
       size of the underlying storage (typically 512 bytes) suffices

Because this test failed, it fallbacks value of cache to "writethrough" which have following consequences:
1) qemu run without direct io even device/fs support but with higher block size
2) qemu failed to start because cache=writethrough may conflict with other dev paramer like "io=native": with the following message:

2018-08-22 20:50:41.226 80512 ERROR oslo_messaging.rpc.server File "/usr/lib/python2.7/dist-packages/libvirt.py", line 1065, in createWithFlags
2018-08-22 20:50:41.226 80512 ERROR oslo_messaging.rpc.server if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self)
2018-08-22 20:50:41.226 80512 ERROR oslo_messaging.rpc.server libvirtError: unsupported configuration: native I/O needs either no disk cache or directsync cache mode, QEMU will fallback to aio=threads

Steps to reproduce
==================
to reproduce spawn issue:
having instances on fs with block device with logical block size > 512b (typically nvme with 4096 8192 sector size)
nova.conf with:
images_type=raw
preallocate_images=space

Solution
========
Can we consider increasing align_size from 512b to 8192b as it will work on most cases?
Is there any other reason to keep 512b ?

Set it to 4096 or 8192 fix the issue in my environment.

Environment
===========
I met the issue on newton, but same check with 512b exists on master.

summary: - Spawn may failed when cache=none on block device with logical block size
- > 512
+ Spawn may fail when cache=none on block device with logical block size >
+ 512
Dr. Jens Harbott (j-harbott) wrote :

We are affected by this on stable/pike and stable/queens now after we had to enable preallocate_images=space in order to avoid some out-of-space issues. Have been running on nvme for local storage with block size 4096 for some time for performance reasons.

I haven't seen any disks with block size 4096 yet, so setting the value to 4096 would be fine for us.

Dr. Jens Harbott (j-harbott) wrote :

Erm, I meant to write "... any disks with block size 8192 ..." of course.

Matt Riedemann (mriedem) wrote :

Maybe the check should try 8192, 4096 and then 512 and failing all three consider it not supported.

Changed in nova:
status: New → Triaged
importance: Undecided → High
Matt Riedemann (mriedem) on 2018-11-08
Changed in nova:
importance: High → Medium

Fix proposed to branch: master
Review: https://review.openstack.org/616580

Changed in nova:
assignee: nobody → Dr. Jens Harbott (j-harbott)
status: Triaged → In Progress
Changed in nova:
assignee: Dr. Jens Harbott (j-harbott) → melanie witt (melwitt)
melanie witt (melwitt) on 2018-11-13
Changed in nova:
assignee: melanie witt (melwitt) → Dr. Jens Harbott (j-harbott)

Reviewed: https://review.openstack.org/616580
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=14d98ef1b48ca7b2ea468a8f1ec967b954955a63
Submitter: Zuul
Branch: master

commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63
Author: Jens Harbott <email address hidden>
Date: Thu Nov 8 15:06:26 2018 +0000

    Make supports_direct_io work on 4096b sector size

    The current check uses an alignment of 512 bytes and will fail when the
    underlying device has sectors of size 4096 bytes, as is common e.g. for
    NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
    512 bytes and thus will cover both cases.

    Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
    Closes-Bug: 1801702
    Co-Authored-By: Alexandre Arents <email address hidden>

Changed in nova:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/619251
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=386e574f4ae6ead225edd65f2759e9aa14c4887b
Submitter: Zuul
Branch: stable/rocky

commit 386e574f4ae6ead225edd65f2759e9aa14c4887b
Author: Jens Harbott <email address hidden>
Date: Thu Nov 8 15:06:26 2018 +0000

    Make supports_direct_io work on 4096b sector size

    The current check uses an alignment of 512 bytes and will fail when the
    underlying device has sectors of size 4096 bytes, as is common e.g. for
    NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
    512 bytes and thus will cover both cases.

    Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
    Closes-Bug: 1801702
    Co-Authored-By: Alexandre Arents <email address hidden>
    (cherry picked from commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63)

Reviewed: https://review.openstack.org/619220
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=f8eb8a7bc52448ede25cb8ac67fcb1818d3fdd2e
Submitter: Zuul
Branch: stable/queens

commit f8eb8a7bc52448ede25cb8ac67fcb1818d3fdd2e
Author: Jens Harbott <email address hidden>
Date: Thu Nov 8 15:06:26 2018 +0000

    Make supports_direct_io work on 4096b sector size

    The current check uses an alignment of 512 bytes and will fail when the
    underlying device has sectors of size 4096 bytes, as is common e.g. for
    NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
    512 bytes and thus will cover both cases.

    Conflicts:
     nova/privsep/utils.py
            - supports_direct_io() is still in nova/utils.py for older
            stable branches

    Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
    Closes-Bug: 1801702
    Co-Authored-By: Alexandre Arents <email address hidden>
    (cherry picked from commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63)

Reviewed: https://review.openstack.org/619254
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=ba844f2a7c1d4602017bb0fc600a30b150c28dbc
Submitter: Zuul
Branch: stable/pike

commit ba844f2a7c1d4602017bb0fc600a30b150c28dbc
Author: Jens Harbott <email address hidden>
Date: Thu Nov 8 15:06:26 2018 +0000

    Make supports_direct_io work on 4096b sector size

    The current check uses an alignment of 512 bytes and will fail when the
    underlying device has sectors of size 4096 bytes, as is common e.g. for
    NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
    512 bytes and thus will cover both cases.

    Conflicts:
            nova/privsep/utils.py
            - supports_direct_io() is still in
              nova/virt/libvirt/driver.py for this branch

    Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
    Closes-Bug: 1801702
    Co-Authored-By: Alexandre Arents <email address hidden>
    (cherry picked from commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63)

This issue was fixed in the openstack/nova 18.1.0 release.

This issue was fixed in the openstack/nova 16.1.7 release.

This issue was fixed in the openstack/nova 17.0.9 release.

This issue was fixed in the openstack/nova 19.0.0.0rc1 release candidate.

Reviewed: https://review.openstack.org/631843
Committed: https://git.openstack.org/cgit/openstack/nova/commit/?id=c77c3eb61ca7a0da64eed62dea084ee3df1d0722
Submitter: Zuul
Branch: stable/ocata

commit c77c3eb61ca7a0da64eed62dea084ee3df1d0722
Author: Jens Harbott <email address hidden>
Date: Thu Nov 8 15:06:26 2018 +0000

    Make supports_direct_io work on 4096b sector size

    The current check uses an alignment of 512 bytes and will fail when the
    underlying device has sectors of size 4096 bytes, as is common e.g. for
    NVMe disks. So use an alignment of 4096 bytes, which is a multiple of
    512 bytes and thus will cover both cases.

    Conflicts:
            nova/privsep/utils.py
            - supports_direct_io() is still in
              nova/virt/libvirt/driver.py for this branch

    Change-Id: I5151ae01e90506747860d9780547b0d4ce91d8bc
    Closes-Bug: 1801702
    Co-Authored-By: Alexandre Arents <email address hidden>
    (cherry picked from commit 14d98ef1b48ca7b2ea468a8f1ec967b954955a63)
    (cherry picked from commit ba844f2a7c1d4602017bb0fc600a30b150c28dbc)

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers