LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume

Bug #1336568 reported by Mitsuhiro Tanino on 2014-07-01
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Cinder
Undecided
Mitsuhiro Tanino
cinder (Ubuntu)
Undecided
Billy Olsen
Trusty
Undecided
Billy Olsen

Bug Description

I found a problem which LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume.

In current implementation, qemu-kvm opens device(storage volume) using cache='node'at nova side, however, tgtd opens device without "--bsoflags direct" at cinder side.

Therefore, I/O from guest instances are cached at control node even though the compute node issues O_DIRECT I/O to iSCSI volume.
As a result, if control node has a crash, cached data will be lost. This causes data lost problem of guest instance.

I will propose a fix of this issue.

Here are test environment and confirmation results.
(1) Control node
    - Control node has nova, cinder(c-sch, c-api, c-vol), glance, horizon
      services.
    - Use LVMiSCSI driver for cinder backend.
(2) Compute node has n-cpu and n-net services.

[Confirmation at compute node]
On a compute node, qemu opens device file using cache='none'. This means instance can issue direct I/O from guest to the device.

[root@compute ~]# cat /etc/libvirt/qemu/instance-0000000b.xml
<domain type='kvm'>
  <name>instance-0000000b</name>
  <uuid>9e1eb5cc-4c40-4023-bca5-a7d1720c6f51</uuid>
....
    <disk type='block' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      ######### Open the device without cache.
      <source dev='/dev/disk/by-path/ip-10.16.42.67:3260-iscsi-iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5-lun-1'/>
      <target dev='vdc' bus='virtio'/>
      <serial>fdd23217-6e95-4aee-a586-6ef174567ba5</serial>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
....
</domain>

Confirm a file descriptor whether the device is opened with O_DIRECT or not at compute node.

=> qemu Process ID is "24836"
[root@compute ~]# ps uax | grep qemu
root 11421 0.0 0.0 112672 912 pts/6 S+ 17:13 0:00 grep --color=auto qemu
qemu 24836 13.8 16.0 4638484 1312668 ? Sl Jun29 462:12 /usr/bin/qemu-system-x86_64 -machine accel=kvm -name instance-0000000b .....

=> Device file of iscsi cinder volume is "/dev/sde".
[root@compute ~]# ls -la /dev/disk/by-path/
.....
-rw-r--r-- 1 root root 349525333 Jun 27 10:01 ip-10.16.42.67
lrwxrwxrwx 1 root root 9 Jun 30 00:16 ip-10.16.42.67:3260-iscsi-iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5-lun-1 -> ../../sde

=> "fd18" is infomation of /dev/sde
[root@compute ~]# ls -la /proc/24836/fd
total 0
dr-x------ 2 qemu qemu 0 Jun 29 09:40 .
dr-xr-xr-x 9 qemu qemu 0 Jun 29 09:31 ..
.....
lrwx------ 1 qemu qemu 64 Jun 29 09:40 18 -> /dev/sde

=> The flags is "02140002". O_DIRECT flag is "0x40000". This flag is raised at compute node side.
[root@compute ~]# cat /proc/24836/fdinfo/18
pos: 10737418240
flags: 02140002

[Confirmation at control node]
Confirm iscsi target status and exported disk.
Backing store path is /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5

[mtanino@control ~]$ sudo tgt-admin -s
.....
Target 3: iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5
    System information:
        Driver: iscsi
...
        LUN: 1
...
            Backing store path: /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5
            Backing store flags:
    Account information:
    ACL information:
        ALL

=>Condirm device mapper file of the backing store

[mtanino@control ~]$ ls -la /dev/disk/by-id/ | grep stack
lrwxrwxrwx 1 root root 10 Jun 30 00:16 dm-name-stack--volumes-volume--fdd23217--6e95--4aee--a586--6ef174567ba5 -> ../../dm-0

=> tgtd Process ID is "31010"
[mtanino@control ~]$ ps aux | grep tgtd
root 31010 2.0 0.0 476584 900 ? Ssl Jun30 52:27 /usr/sbin/tgtd -f

=> "fd11" is infomation of /dev/dm-0
[mtanino@control ~]$ sudo ls -la /proc/31010/fd
...
lrwx------ 1 root root 64 Jul 1 16:11 11 -> /dev/dm-0
lrwx------ 1 root root 64 Jul 1 16:11 12 -> /dev/sdb2
...

=> The flags is "0100002". O_DIRECT flag is "0x40000". This flag is not raised at control node side.
[mtanino@control ~]$ sudo cat /proc/31010/fdinfo/11
pos: 0
flags: 0100002

Regards,
Mitsuhiro Tanino

========================================================================
[Impact]

 * May see data loss without the ability to use write-through caching
   (write-cache off) option instead of write-back (write-cache on)
   option for iscsi targets.

[Test Case]

 * Configure Cinder to use LVMiSCSIDriver
 * Create cinder volume (cinder create --display-name foo 1G)
 * Attach volume to nova instance (nova volume-attach my-instance <vol-uuid>)

 * Observe the write-cache policy specified per cinder volume (found in)
   - /var/lib/cinder/volumes/volume-<uuid>

 * Observe above information (detailed by Mitsuhiro)

[Regression Potential]

 * Low risk of regression as the feature is enabled through a
   configurable option in which default value takes original behavior.

Related branches

description: updated
description: updated
description: updated
Changed in cinder:
assignee: nobody → Mitsuhiro Tanino (mitsuhiro-tanino)
description: updated

After issuing direct I/O from guest instance, dirty cache of control node increases from 0KB to 202436 KB.
This is same size of I/O from guest instance.

[Guest]
[root@driver-test fio]# dd if=/dev/zero of=/dev/vdd oflag=direct bs=1024k count=200
200+0 records in
200+0 records out

[Control node]
[root@control rpmbuild]# cat /proc/meminfo
MemTotal: 8164312 kB
MemFree: 971492 kB
MemAvailable: 3838688 kB
Buffers: 1509816 kB
Cached: 1413724 kB
SwapCached: 37264 kB
Active: 4148700 kB
Inactive: 2574248 kB
Active(anon): 2796988 kB
Inactive(anon): 1016204 kB
Active(file): 1351712 kB
Inactive(file): 1558044 kB
Unevictable: 26172 kB
Mlocked: 26172 kB
SwapTotal: 4079612 kB
SwapFree: 2563736 kB
Dirty: 202436 kB <**************** Cached I/O
Writeback: 0 kB
AnonPages: 3797040 kB
Mapped: 56800 kB
Shmem: 2240 kB
Slab: 269472 kB
SReclaimable: 210844 kB
SUnreclaim: 58628 kB
KernelStack: 5624 kB
PageTables: 68488 kB
NFS_Unstable: 0 kB
Bounce: 0 kB
WritebackTmp: 0 kB
CommitLimit: 8161768 kB
Committed_AS: 10106272 kB
VmallocTotal: 34359738367 kB
VmallocUsed: 300076 kB
VmallocChunk: 34359336928 kB
HardwareCorrupted: 0 kB
AnonHugePages: 237568 kB
HugePages_Total: 0
HugePages_Free: 0
HugePages_Rsvd: 0
HugePages_Surp: 0
Hugepagesize: 2048 kB
DirectMap4k: 148068 kB
DirectMap2M: 8230912 kB

There are two parameters related to cache.

(1) "write-cache off"
In default setting the write cache is enabled(write-cache on). Therefore I/O is issued via write-back mode
and the I/O is cached on dirty cache.

If we use "write-cache off", Write I/O is issued via write-through mode and the write I/O is not cached.
Read I/O is still cached even where we user this parameter. So read I/O can keep good performance.

(2) "--bsoflags direct"
When we use this parameter, both Read and Write I/O are issued without dirty cache and buffer cache.
Using this parameter, we can suppress increasing buffer cache via tgtd's I/O.

I think these parameters should be used case by case basis.

Fix proposed to branch: master
Review: https://review.openstack.org/104714

Changed in cinder:
status: New → In Progress

Reviewed: https://review.openstack.org/104714
Committed: https://git.openstack.org/cgit/openstack/cinder/commit/?id=617e59bde660f919df0818611b911ae35f8b7247
Submitter: Jenkins
Branch: master

commit 617e59bde660f919df0818611b911ae35f8b7247
Author: Mitsuhiro Tanino <email address hidden>
Date: Tue Jul 8 15:52:11 2014 -0400

    Configure write cache option of tgtd iscsi driver

    Cinder LVMiSCSI driver is using default value of write-cache parameter
    of tgtd iscsi driver. In this setting, write I/O from guest instance is
    cached on dirty cache of a host.(write-back mode)

    In this case, data lost may be occurred if the host crashes before
    flushing dirty cache. This may cause a lot of instances to lose
    their data.

    In order to avoid this issue, it is better to turn off the write cache.
    (write-through mode)

    This patch adds "iscsi_write_cache" parameter to configure a behavior of
    write cache. The default value is "iscsi_write_cache=on".(write-back mode)

    Closes-Bug: 1336568
    DocImpact

    Change-Id: I7a495bc6118d4254576bdf1620a04ac537b3078d
    Signed-off-by: Mitsuhiro Tanino <email address hidden>

Changed in cinder:
status: In Progress → Fix Committed
Changed in cinder:
milestone: none → juno-2
status: Fix Committed → Fix Released
Thierry Carrez (ttx) on 2014-10-16
Changed in cinder:
milestone: juno-2 → 2014.2
Changed in cinder (Ubuntu):
assignee: nobody → Billy Olsen (billy-olsen)
Billy Olsen (billy-olsen) wrote :

This patch was included in the juno release of cinder and thus is available already in utopic, vivid, etc. It is not available in icehouse/trusty - which this patch is for.

description: updated
tags: added: sts
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in cinder (Ubuntu):
status: New → Confirmed

Hello Mitsuhiro, or anyone else affected,

Accepted cinder into trusty-proposed. The package will build now and be available at https://launchpad.net/ubuntu/+source/cinder/1:2014.1.5-0ubuntu2 in a few hours, and then in the -proposed repository.

Please help us by testing this new package. See https://wiki.ubuntu.com/Testing/EnableProposed for documentation how to enable and use -proposed. Your feedback will aid us getting this update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug, mentioning the version of the package you tested, and change the tag from verification-needed to verification-done. If it does not fix the bug for you, please add a comment stating that, and change the tag to verification-failed. In either case, details of your testing will help us make a better decision.

Further information regarding the verification process can be found at https://wiki.ubuntu.com/QATeam/PerformingSRUVerification . Thank you in advance!

Changed in cinder (Ubuntu Trusty):
status: New → Fix Committed
tags: added: verification-needed
Billy Olsen (billy-olsen) wrote :

Was able to verify the fix for this bug today. Installed cinder from the trusty-proposed pocket and ran the following tests to confirm:

# Test one, ensure default option remains to write-cache on
1. create volume
2. attach iscsi volume to instance
3. Verify generated xml in /var/lib/cinder/volumes/<volume-id> is generated with write-cache on.

# Change the iscsi-write-cache to off and restart cinder volumes
1. Set iscsi_write_cache = off in /etc/cinder/cinder.conf
2. Create lvm volume
3. Attach via iscsi to instance
4. Verify generated xml in /var/lib/cinder/volumes/<volume-id> is generated with write-cache off

tags: added: verification-done
removed: verification-needed

The verification of the Stable Release Update for cinder has completed successfully and the package has now been released to -updates. Subsequently, the Ubuntu Stable Release Updates Team is being unsubscribed and will not receive messages about this bug report. In the event that you encounter a regression using the package from -updates please report a new bug using ubuntu-bug and tag the bug report regression-update so we can easily find any regressions.

Changed in cinder (Ubuntu):
status: Confirmed → Fix Released
Changed in cinder (Ubuntu Trusty):
assignee: nobody → Billy Olsen (billy-olsen)
Launchpad Janitor (janitor) wrote :

This bug was fixed in the package cinder - 1:2014.1.5-0ubuntu2

---------------
cinder (1:2014.1.5-0ubuntu2) trusty; urgency=medium

  * Enable iscsi_write_cache option for tgtadm backends (LP: #1336568):
    - d/p/tgtadmin-iscsi-write-cache-config.patch - Includes backport of
      change from the juno release for enabling iscsi write cache policy
      for tgtadm.

 -- Billy Olsen <email address hidden> Mon, 27 Jul 2015 17:35:57 -0700

Changed in cinder (Ubuntu Trusty):
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers