I found a problem which LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume.
In current implementation of qemu-kvm opens device(storage volume) using cache='node'
at nova side, however, tgtd opens device without "--bsoflags direct" at cinder side
.
Therefore, I/O from guest instances are cached at control node even though the compute
node issues O_DIRECT I/O to iSCSI volume.
As a result, if control node has a crash, cached data will be lost. This causes data lost problem
of guest instance.
I will propose a fix of this issue.
Here are test environment and confirmation results.
(1) Control node
- Control node has nova, cinder(c-sch, c-api, c-vol), glance, horizon services.
- Use LVMiSCSI driver for cinder backend.
(2) Compute node has n-cpu and n-net services.
[Confirmation at compute node]
On a compute node, qemu opens device file using cache='none'. This means instance can issue
direct I/O from guest to the device.
=> Device file of iscsi cinder volume is "/dev/sde".
[root@compute ~]# ls -la /dev/disk/by-path/
.....
-rw-r--r-- 1 root root 349525333 Jun 27 10:01 ip-10.16.42.67
lrwxrwxrwx 1 root root 9 Jun 30 00:16 ip-10.16.42.67:3260-iscsi-iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5-lun-1 -> ../../sde
=> "fd18" is infomation of /dev/sde
[root@compute ~]# ls -la /proc/24836/fd
total 0
dr-x------ 2 qemu qemu 0 Jun 29 09:40 .
dr-xr-xr-x 9 qemu qemu 0 Jun 29 09:31 ..
.....
lrwx------ 1 qemu qemu 64 Jun 29 09:40 18 -> /dev/sde
=> The flags is "02140002". O_DIRECT flag is "0x40000". This flag is raised at compute node side.
[root@compute ~]# cat /proc/24836/fdinfo/18
pos: 10737418240
flags: 02140002
[Confirmation at control node]
Confirm iscsi target status and exported disk.
Backing store path is /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5
[mtanino@control ~]$ sudo tgt-admin -s
.....
Target 3: iqn.2010-10.org.openstack:volume-fdd23217-6e95-4aee-a586-6ef174567ba5
System information:
Driver: iscsi
...
LUN: 1
...
Backing store path: /dev/stack-volumes/volume-fdd23217-6e95-4aee-a586-6ef174567ba5
Backing store flags:
Account information:
ACL information:
ALL
=>Condirm device mapper file of the backing store
[mtanino@control ~]$ ls -la /dev/disk/by-id/ | grep stack
lrwxrwxrwx 1 root root 10 Jun 30 00:16 dm-name-stack--volumes-volume--fdd23217--6e95--4aee--a586--6ef174567ba5 -> ../../dm-0
=> tgtd Process ID is "31010"
[mtanino@control ~]$ ps aux | grep tgtd
root 31010 2.0 0.0 476584 900 ? Ssl Jun30 52:27 /usr/sbin/tgtd -f
=> The flags is "0100002". O_DIRECT flag is "0x40000". This flag is not raised at control node side.
[mtanino@control ~]$ sudo cat /proc/31010/fdinfo/11
pos: 0
flags: 0100002
I found a problem which LVMiSCSI driver can't issue direct I/O through tgtd to iscsi volume.
In current implementation of qemu-kvm opens device(storage volume) using cache='node'
at nova side, however, tgtd opens device without "--bsoflags direct" at cinder side
.
Therefore, I/O from guest instances are cached at control node even though the compute
node issues O_DIRECT I/O to iSCSI volume.
As a result, if control node has a crash, cached data will be lost. This causes data lost problem
of guest instance.
I will propose a fix of this issue.
Here are test environment and confirmation results.
(1) Control node
- Control node has nova, cinder(c-sch, c-api, c-vol), glance, horizon services.
- Use LVMiSCSI driver for cinder backend.
(2) Compute node has n-cpu and n-net services.
[Confirmation at compute node]
On a compute node, qemu opens device file using cache='none'. This means instance can issue
direct I/O from guest to the device.
[root@compute ~]# cat /etc/libvirt/ qemu/instance- 0000000b. xml instance- 0000000b< /name> 9e1eb5cc- 4c40-4023- bca5-a7d1720c6f 51</uuid> disk/by- path/ip- 10.16.42. 67:3260- iscsi-iqn. 2010-10. org.openstack: volume- fdd23217- 6e95-4aee- a586-6ef174567b a5-lun- 1'/> serial> fdd23217- 6e95-4aee- a586-6ef174567b a5</serial>
<domain type='kvm'>
<name>
<uuid>
....
<disk type='block' device='disk'>
<driver name='qemu' type='raw' cache='none'/> .........### Open the device without cache.
<source dev='/dev/
<target dev='vdc' bus='virtio'/>
<
<address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
</disk>
....
</domain>
Confirm a file descriptor whether the device is opened with O_DIRECT or not at compute node.
=> qemu Process ID is "24836" qemu-system- x86_64 -machine accel=kvm -name instance-0000000b .....
[root@compute ~]# ps uax | grep qemu
root 11421 0.0 0.0 112672 912 pts/6 S+ 17:13 0:00 grep --color=auto qemu
qemu 24836 13.8 16.0 4638484 1312668 ? Sl Jun29 462:12 /usr/bin/
=> Device file of iscsi cinder volume is "/dev/sde". 42.67:3260- iscsi-iqn. 2010-10. org.openstack: volume- fdd23217- 6e95-4aee- a586-6ef174567b a5-lun- 1 -> ../../sde
[root@compute ~]# ls -la /dev/disk/by-path/
.....
-rw-r--r-- 1 root root 349525333 Jun 27 10:01 ip-10.16.42.67
lrwxrwxrwx 1 root root 9 Jun 30 00:16 ip-10.16.
=> "fd18" is infomation of /dev/sde
[root@compute ~]# ls -la /proc/24836/fd
total 0
dr-x------ 2 qemu qemu 0 Jun 29 09:40 .
dr-xr-xr-x 9 qemu qemu 0 Jun 29 09:31 ..
.....
lrwx------ 1 qemu qemu 64 Jun 29 09:40 18 -> /dev/sde
=> The flags is "02140002". O_DIRECT flag is "0x40000". This flag is raised at compute node side. fdinfo/ 18
[root@compute ~]# cat /proc/24836/
pos: 10737418240
flags: 02140002
[Confirmation at control node] volumes/ volume- fdd23217- 6e95-4aee- a586-6ef174567b a5
Confirm iscsi target status and exported disk.
Backing store path is /dev/stack-
[mtanino@control ~]$ sudo tgt-admin -s 10.org. openstack: volume- fdd23217- 6e95-4aee- a586-6ef174567b a5 volumes/ volume- fdd23217- 6e95-4aee- a586-6ef174567b a5
.....
Target 3: iqn.2010-
System information:
Driver: iscsi
...
LUN: 1
...
Backing store path: /dev/stack-
Backing store flags:
Account information:
ACL information:
ALL
=>Condirm device mapper file of the backing store
[mtanino@control ~]$ ls -la /dev/disk/by-id/ | grep stack stack-- volumes- volume- -fdd23217- -6e95-- 4aee--a586- -6ef174567ba5 -> ../../dm-0
lrwxrwxrwx 1 root root 10 Jun 30 00:16 dm-name-
=> tgtd Process ID is "31010"
[mtanino@control ~]$ ps aux | grep tgtd
root 31010 2.0 0.0 476584 900 ? Ssl Jun30 52:27 /usr/sbin/tgtd -f
=> "fd11" is infomation of /dev/dm-0
[mtanino@control ~]$ sudo ls -la /proc/31010/fd
...
lrwx------ 1 root root 64 Jul 1 16:11 11 -> /dev/dm-0
lrwx------ 1 root root 64 Jul 1 16:11 12 -> /dev/sdb2
...
=> The flags is "0100002". O_DIRECT flag is "0x40000". This flag is not raised at control node side. fdinfo/ 11
[mtanino@control ~]$ sudo cat /proc/31010/
pos: 0
flags: 0100002
Regards,
Mitsuhiro Tanino