Cinder

LVM backed ISCSI device not reporting same size to nova

Bug #1956887 reported by Mark Olie on 2022-01-09

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	Cinder	New	Medium	Unassigned

Bug Description

Environment:

- Compute nodes: CentOS 8 Stream (latest) (Supermicro)
- Storage nodes: CentOS 8 Stream (latest)(Supermicro hardware with 18TB Storage hardware raid5)
- Controller nodes: CentOS 8 Stream (latest) (Supermicro)

Openstack version: Wallaby, deployed by kolla-ansible not using Ironic.

When using LVM backed devices exposed to nova through the iscsi protocol, the blocksize of the device differs.

Example:
Using horizon, create a volume 20GB or bigger and attach it to a Virtual Machine.

Exact device size in bytes gotten by running on the storage-node:
fdisk -l /dev/cinder-volumes/volume-UUID
4398046511104 (Example 4 Terabyte volume on storage-node, in bytes)

However, when using fdisk /dev/vda (attached to nova-instance on compute node, inside the virtual machine)

4398066466816 (Example 4 Terabyte volume as seen by the virtual machine, in bytes)

Now if the sizes would be the other way around this would not be a problem, but the VM disksize is bigger than the real disk size on the iscsi-backed lvm volume.

This is from a 20GB backed volume vm, because smaller vm disks are affected sooner rather than later:

Thus resulting in the following messages in the kernel log on the vm:
[111761.391344] blk_update_request: I/O error, dev vda, sector 17777976
[111761.394839] blk_update_request: I/O error, dev vda, sector 17778984
[111761.396241] blk_update_request: I/O error, dev vda, sector 17779992
[111761.397782] blk_update_request: I/O error, dev vda, sector 17781000
[111761.399343] blk_update_request: I/O error, dev vda, sector 17782008
[111761.400929] blk_update_request: I/O error, dev vda, sector 17783016
[111761.402189] blk_update_request: I/O error, dev vda, sector 17784024
[111761.403377] blk_update_request: I/O error, dev vda, sector 17785032
[111761.404569] blk_update_request: I/O error, dev vda, sector 17786040
[111761.406165] blk_update_request: I/O error, dev vda, sector 17787048

Double-checked, by creating an All-in-One node from a storage-node so network issues can be ruled out.

Issue did not go away.

Tags:

Sofia Enriquez (lsofia-enriquez) on 2022-01-10

Changed in cinder:
importance:	Undecided → Medium

Revision history for this message

Sofia Enriquez (lsofia-enriquez) wrote on 2022-01-12:

Greetings Mark Olie,
I'd like to ask you the next questions:
- Is the vm allowed to read/write into the "extra" space?
- Is this scenario happening only for <20G or have you only tried with 20G?
Thanks in advance

Revision history for this message

Mark Olie (lfmolie) wrote on 2022-01-12: Re: [Bug 1956887] Re: LVM backed ISCSI device not reporting same size to nova

Download full text (6.5 KiB)

Hi Sofia,

Thanks for reaching out,

The VM is not allowed to read/write into the space that is bigger than
the volume.

Example:

Message at 2.986 : Here the machine finishes booting and cloud-init
takes over installing the user-data script.

I focused on vda / xfs / iscsi and network errors in dmesg, giving me
the following :

[ 2.986795] XFS (vda1): Ending clean mount

[ 284.439066] blk_update_request: I/O error, dev vda, sector 2641440

[ 284.443818] blk_update_request: I/O error, dev vda, sector 2642448

[ 284.445159] blk_update_request: I/O error, dev vda, sector 2643456

[ 284.446351] blk_update_request: I/O error, dev vda, sector 2644464

[ 284.447578] blk_update_request: I/O error, dev vda, sector 2645472

[ 284.448825] blk_update_request: I/O error, dev vda, sector 2646480

[ 284.450015] blk_update_request: I/O error, dev vda, sector 2647488

[ 284.451242] blk_update_request: I/O error, dev vda, sector 2648496

[ 284.452505] blk_update_request: I/O error, dev vda, sector 2649504

[ 284.942258] XFS (vda1): writeback error on sector 2556768

and-so-on. (note: No iscsi-errors in the virtual server log, none
expected either but good to note.)

On the hypervisor , I do see iscsi-errors , namely:

[14153.160954] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[14158.338313] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[14162.583635] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17015.456426] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17086.182440] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17089.104229] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.

As to question 2:

I have tried with 10g, 20g, 40g 50g, 4T and 9T volumes.

I have also tried with a simple AIO-cluster of 9T image, it worked fine
when Wallaby was the newest, (cluster built about 2 months before Xena
came out)

The biggest problem is when I am trying to attach a 'data' volume to a
vm and then create a filesystem on it, because both xfs ...

Hi Sofia,

Thanks for reaching out,

The VM is not allowed to read/write into the space that is bigger than
the volume.

Example:

Message at 2.986 : Here the machine finishes booting and cloud-init
takes over installing the user-data script.

I focused on vda / xfs / iscsi and network errors in dmesg, giving me
the following :

[    2.986795] XFS (vda1): Ending clean mount                           
                                                                        
                           
[  284.439066] blk_update_request: I/O error, dev vda, sector 2641440   
                                                                        
                           
[  284.443818] blk_update_request: I/O error, dev vda, sector 2642448   
                                                                        
                           
[  284.445159] blk_update_request: I/O error, dev vda, sector 2643456   
                                                                        
                           
[  284.446351] blk_update_request: I/O error, dev vda, sector 2644464   
                                                       
[  284.447578] blk_update_request: I/O error, dev vda, sector 2645472   
                                                                 
[  284.448825] blk_update_request: I/O error, dev vda, sector 2646480   
                                                                        
                  
[  284.450015] blk_update_request: I/O error, dev vda, sector 2647488   
                                                       
[  284.451242] blk_update_request: I/O error, dev vda, sector 2648496   
                                                       
[  284.452505] blk_update_request: I/O error, dev vda, sector 2649504   
                                                              
[  284.942258] XFS (vda1): writeback error on sector 2556768

and-so-on. (note: No iscsi-errors in the virtual server log, none
expected either but good to note.)

On the hypervisor , I do see iscsi-errors , namely:

As to question 2:

I have tried with 10g, 20g, 40g 50g, 4T and 9T volumes.

I have also tried with a simple AIO-cluster of 9T image, it worked fine
when Wallaby was the newest, (cluster built about 2 months before Xena
came out)

The biggest problem is when I am trying to attach a 'data' volume to a
vm and then create a filesystem on it, because both xfs and ext4 try to
write a copy of the partition table to the last block.

So far I have solved that by using lvm and simply creating a logical
volume max 95% of the reported disk-size.

Normal glance-image backed volumes are less affected, because it "grows"
the filesystem from qcow2 and move on. so as long as disks dont run
above 95% I am safe.

Exception: When installing kubernetes-cluster through a heat-template ,
the "master" node also reports disk-errors whereas the 2 slave nodes
attach cleanly.

I would love to debug this issue because I recently started hosting
kubernetes classrooms and the students complain sometimes.

Kind Regards,

Mark.

Sofia Enriquez schreef op 2022-01-12 16:49:

> Greetings Mark Olie,
> I'd like to ask you the next questions:
> - Is the vm allowed to read/write into the "extra" space?
> - Is this scenario happening only for <20G or have you only tried with 20G?
> Thanks in advance
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1956887
> 
> Title:
> LVM backed ISCSI device not reporting same size to nova
> 
> Status in Cinder:
> New
> 
> Bug description:
> Environment:
> 
> - Compute    nodes: CentOS 8 Stream (latest) (Supermicro)
> - Storage    nodes: CentOS 8 Stream (latest)(Supermicro hardware with 18TB Storage hardware raid5)
> - Controller nodes: CentOS 8 Stream (latest) (Supermicro)
> 
> Openstack version: Wallaby, deployed by kolla-ansible not using
> Ironic.
> 
> When using LVM backed devices exposed to nova through the iscsi
> protocol, the blocksize of the device differs.
> 
> Example:
> Using horizon, create a volume 20GB or bigger and attach it to a Virtual Machine.
> 
> Exact device size in bytes gotten by running on the storage-node:
> fdisk -l /dev/cinder-volumes/volume-UUID
> 4398046511104 (Example 4 Terabyte volume on storage-node, in bytes)
> 
> However, when using fdisk /dev/vda (attached to nova-instance on
> compute node, inside the virtual machine)
> 
> 4398066466816 (Example 4 Terabyte volume as seen by the virtual
> machine, in bytes)
> 
> Now if the sizes would be the other way around this would not be a
> problem, but the VM disksize is bigger than the real disk size on the
> iscsi-backed lvm volume.
> 
> This is from a 20GB backed volume vm, because smaller vm disks are
> affected sooner rather than later:
> 
> Thus resulting in the following messages in the kernel log on the vm:
> [111761.391344] blk_update_request: I/O error, dev vda, sector 17777976
> [111761.394839] blk_update_request: I/O error, dev vda, sector 17778984
> [111761.396241] blk_update_request: I/O error, dev vda, sector 17779992
> [111761.397782] blk_update_request: I/O error, dev vda, sector 17781000
> [111761.399343] blk_update_request: I/O error, dev vda, sector 17782008
> [111761.400929] blk_update_request: I/O error, dev vda, sector 17783016
> [111761.402189] blk_update_request: I/O error, dev vda, sector 17784024
> [111761.403377] blk_update_request: I/O error, dev vda, sector 17785032
> [111761.404569] blk_update_request: I/O error, dev vda, sector 17786040
> [111761.406165] blk_update_request: I/O error, dev vda, sector 17787048
> 
> Double-checked, by creating an All-in-One node from a storage-node so
> network issues can be ruled out.
> 
> Issue did not go away.
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cinder/+bug/1956887/+subscriptions

Revision history for this message

Mark Olie (lfmolie) wrote on 2022-01-13:

Hi,

Could it be that this bug is also affected/created by the following bug?

https://gitlab.com/qemu-project/qemu/-/issues/649

My correctly working node (the one created 2 weeks before Xena)
has the following packages (seen from the nova-libvirt container
deployed by kolla)release)

qemu-kvm-block-curl-6.0.0-31.el8s.x86_64
qemu-kvm-block-ssh-6.0.0-31.el8s.x86_64
qemu-kvm-6.0.0-31.el8s.x86_64
qemu-kvm-ui-spice-6.0.0-31.el8s.x86_64
qemu-kvm-block-rbd-6.0.0-31.el8s.x86_64
qemu-kvm-docs-6.0.0-31.el8s.x86_64
qemu-kvm-ui-opengl-6.0.0-31.el8s.x86_64
qemu-kvm-block-iscsi-6.0.0-31.el8s.x86_64
qemu-kvm-hw-usbredir-6.0.0-31.el8s.x86_64
qemu-kvm-common-6.0.0-31.el8s.x86_64
qemu-kvm-block-gluster-6.0.0-31.el8s.x86_64
qemu-kvm-core-6.0.0-31.el8s.x86_64
libvirt-daemon-kvm-7.6.0-4.el8s.x86_64

The newer cluster, with read/write i/o error has (as seen from the
nova-libvirt container deployed by kolla):

qemu-kvm-ui-opengl-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-block-rbd-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-core-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
libvirt-daemon-kvm-7.9.0-1.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-docs-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-common-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-block-iscsi-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-block-gluster-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-hw-usbredir-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-ui-spice-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-block-curl-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64
qemu-kvm-block-ssh-6.1.0-4.module_el8.6.0+983+a7505f3f.x86_64