Cinder

Bug #1956887
Comment #2

Comment 2 for bug 1956887

Revision history for this message

Mark Olie (lfmolie) wrote on 2022-01-12: Re: [Bug 1956887] Re: LVM backed ISCSI device not reporting same size to nova

Hi Sofia,

Thanks for reaching out,

The VM is not allowed to read/write into the space that is bigger than
the volume.

Example:

Message at 2.986 : Here the machine finishes booting and cloud-init
takes over installing the user-data script.

I focused on vda / xfs / iscsi and network errors in dmesg, giving me
the following :

[ 2.986795] XFS (vda1): Ending clean mount

[ 284.439066] blk_update_request: I/O error, dev vda, sector 2641440

[ 284.443818] blk_update_request: I/O error, dev vda, sector 2642448

[ 284.445159] blk_update_request: I/O error, dev vda, sector 2643456

[ 284.446351] blk_update_request: I/O error, dev vda, sector 2644464

[ 284.447578] blk_update_request: I/O error, dev vda, sector 2645472

[ 284.448825] blk_update_request: I/O error, dev vda, sector 2646480

[ 284.450015] blk_update_request: I/O error, dev vda, sector 2647488

[ 284.451242] blk_update_request: I/O error, dev vda, sector 2648496

[ 284.452505] blk_update_request: I/O error, dev vda, sector 2649504

[ 284.942258] XFS (vda1): writeback error on sector 2556768

and-so-on. (note: No iscsi-errors in the virtual server log, none
expected either but good to note.)

On the hypervisor , I do see iscsi-errors , namely:

[14153.160954] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[14158.338313] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[14162.583635] iSCSI/iqn.1994-05.com.redhat:44b41107f44: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17015.456426] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17086.182440] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.
[17089.104229] iSCSI/iqn.1994-05.com.redhat:8565d64f61b1: Unsupported
SCSI Opcode 0xa3, sending CHECK_CONDITION.

As to question 2:

I have tried with 10g, 20g, 40g 50g, 4T and 9T volumes.

I have also tried with a simple AIO-cluster of 9T image, it worked fine
when Wallaby was the newest, (cluster built about 2 months before Xena
came out)

The biggest problem is when I am trying to attach a 'data' volume to a
vm and then create a filesystem on it, because both xfs and ext4 try to
write a copy of the partition table to the last block.

So far I have solved that by using lvm and simply creating a logical
volume max 95% of the reported disk-size.

Normal glance-image backed volumes are less affected, because it "grows"
the filesystem from qcow2 and move on. so as long as disks dont run
above 95% I am safe.

Exception: When installing kubernetes-cluster through a heat-template ,
the "master" node also reports disk-errors whereas the 2 slave nodes
attach cleanly.

I would love to debug this issue because I recently started hosting
kubernetes classrooms and the students complain sometimes.

Kind Regards,

Mark.

Sofia Enriquez schreef op 2022-01-12 16:49:

> Greetings Mark Olie,
> I'd like to ask you the next questions:
> - Is the vm allowed to read/write into the "extra" space?
> - Is this scenario happening only for <20G or have you only tried with 20G?
> Thanks in advance
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1956887
>
> Title:
> LVM backed ISCSI device not reporting same size to nova
>
> Status in Cinder:
> New
>
> Bug description:
> Environment:
>
> - Compute nodes: CentOS 8 Stream (latest) (Supermicro)
> - Storage nodes: CentOS 8 Stream (latest)(Supermicro hardware with 18TB Storage hardware raid5)
> - Controller nodes: CentOS 8 Stream (latest) (Supermicro)
>
> Openstack version: Wallaby, deployed by kolla-ansible not using
> Ironic.
>
> When using LVM backed devices exposed to nova through the iscsi
> protocol, the blocksize of the device differs.
>
> Example:
> Using horizon, create a volume 20GB or bigger and attach it to a Virtual Machine.
>
> Exact device size in bytes gotten by running on the storage-node:
> fdisk -l /dev/cinder-volumes/volume-UUID
> 4398046511104 (Example 4 Terabyte volume on storage-node, in bytes)
>
> However, when using fdisk /dev/vda (attached to nova-instance on
> compute node, inside the virtual machine)
>
> 4398066466816 (Example 4 Terabyte volume as seen by the virtual
> machine, in bytes)
>
> Now if the sizes would be the other way around this would not be a
> problem, but the VM disksize is bigger than the real disk size on the
> iscsi-backed lvm volume.
>
> This is from a 20GB backed volume vm, because smaller vm disks are
> affected sooner rather than later:
>
> Thus resulting in the following messages in the kernel log on the vm:
> [111761.391344] blk_update_request: I/O error, dev vda, sector 17777976
> [111761.394839] blk_update_request: I/O error, dev vda, sector 17778984
> [111761.396241] blk_update_request: I/O error, dev vda, sector 17779992
> [111761.397782] blk_update_request: I/O error, dev vda, sector 17781000
> [111761.399343] blk_update_request: I/O error, dev vda, sector 17782008
> [111761.400929] blk_update_request: I/O error, dev vda, sector 17783016
> [111761.402189] blk_update_request: I/O error, dev vda, sector 17784024
> [111761.403377] blk_update_request: I/O error, dev vda, sector 17785032
> [111761.404569] blk_update_request: I/O error, dev vda, sector 17786040
> [111761.406165] blk_update_request: I/O error, dev vda, sector 17787048
>
> Double-checked, by creating an All-in-One node from a storage-node so
> network issues can be ruled out.
>
> Issue did not go away.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cinder/+bug/1956887/+subscriptions

Hi Sofia,

Thanks for reaching out,

The VM is not allowed to read/write into the space that is bigger than
the volume.

Example:

Message at 2.986 : Here the machine finishes booting and cloud-init
takes over installing the user-data script.

I focused on vda / xfs / iscsi and network errors in dmesg, giving me
the following :

[    2.986795] XFS (vda1): Ending clean mount                           
                                                                        
                           
[  284.439066] blk_update_request: I/O error, dev vda, sector 2641440   
                                                                        
                           
[  284.443818] blk_update_request: I/O error, dev vda, sector 2642448   
                                                                        
                           
[  284.445159] blk_update_request: I/O error, dev vda, sector 2643456   
                                                                        
                           
[  284.446351] blk_update_request: I/O error, dev vda, sector 2644464   
                                                       
[  284.447578] blk_update_request: I/O error, dev vda, sector 2645472   
                                                                 
[  284.448825] blk_update_request: I/O error, dev vda, sector 2646480   
                                                                        
                  
[  284.450015] blk_update_request: I/O error, dev vda, sector 2647488   
                                                       
[  284.451242] blk_update_request: I/O error, dev vda, sector 2648496   
                                                       
[  284.452505] blk_update_request: I/O error, dev vda, sector 2649504   
                                                              
[  284.942258] XFS (vda1): writeback error on sector 2556768

and-so-on. (note: No iscsi-errors in the virtual server log, none
expected either but good to note.)

On the hypervisor , I do see iscsi-errors , namely:

As to question 2:

I have tried with 10g, 20g, 40g 50g, 4T and 9T volumes.

I have also tried with a simple AIO-cluster of 9T image, it worked fine
when Wallaby was the newest, (cluster built about 2 months before Xena
came out)

The biggest problem is when I am trying to attach a 'data' volume to a
vm and then create a filesystem on it, because both xfs and ext4 try to
write a copy of the partition table to the last block.

So far I have solved that by using lvm and simply creating a logical
volume max 95% of the reported disk-size.

Normal glance-image backed volumes are less affected, because it "grows"
the filesystem from qcow2 and move on. so as long as disks dont run
above 95% I am safe.

Exception: When installing kubernetes-cluster through a heat-template ,
the "master" node also reports disk-errors whereas the 2 slave nodes
attach cleanly.

I would love to debug this issue because I recently started hosting
kubernetes classrooms and the students complain sometimes.

Kind Regards,

Mark.

Sofia Enriquez schreef op 2022-01-12 16:49:

> Greetings Mark Olie,
> I'd like to ask you the next questions:
> - Is the vm allowed to read/write into the "extra" space?
> - Is this scenario happening only for <20G or have you only tried with 20G?
> Thanks in advance
> 
> -- 
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1956887
> 
> Title:
> LVM backed ISCSI device not reporting same size to nova
> 
> Status in Cinder:
> New
> 
> Bug description:
> Environment:
> 
> - Compute    nodes: CentOS 8 Stream (latest) (Supermicro)
> - Storage    nodes: CentOS 8 Stream (latest)(Supermicro hardware with 18TB Storage hardware raid5)
> - Controller nodes: CentOS 8 Stream (latest) (Supermicro)
> 
> Openstack version: Wallaby, deployed by kolla-ansible not using
> Ironic.
> 
> When using LVM backed devices exposed to nova through the iscsi
> protocol, the blocksize of the device differs.
> 
> Example:
> Using horizon, create a volume 20GB or bigger and attach it to a Virtual Machine.
> 
> Exact device size in bytes gotten by running on the storage-node:
> fdisk -l /dev/cinder-volumes/volume-UUID
> 4398046511104 (Example 4 Terabyte volume on storage-node, in bytes)
> 
> However, when using fdisk /dev/vda (attached to nova-instance on
> compute node, inside the virtual machine)
> 
> 4398066466816 (Example 4 Terabyte volume as seen by the virtual
> machine, in bytes)
> 
> Now if the sizes would be the other way around this would not be a
> problem, but the VM disksize is bigger than the real disk size on the
> iscsi-backed lvm volume.
> 
> This is from a 20GB backed volume vm, because smaller vm disks are
> affected sooner rather than later:
> 
> Thus resulting in the following messages in the kernel log on the vm:
> [111761.391344] blk_update_request: I/O error, dev vda, sector 17777976
> [111761.394839] blk_update_request: I/O error, dev vda, sector 17778984
> [111761.396241] blk_update_request: I/O error, dev vda, sector 17779992
> [111761.397782] blk_update_request: I/O error, dev vda, sector 17781000
> [111761.399343] blk_update_request: I/O error, dev vda, sector 17782008
> [111761.400929] blk_update_request: I/O error, dev vda, sector 17783016
> [111761.402189] blk_update_request: I/O error, dev vda, sector 17784024
> [111761.403377] blk_update_request: I/O error, dev vda, sector 17785032
> [111761.404569] blk_update_request: I/O error, dev vda, sector 17786040
> [111761.406165] blk_update_request: I/O error, dev vda, sector 17787048
> 
> Double-checked, by creating an All-in-One node from a storage-node so
> network issues can be ruled out.
> 
> Issue did not go away.
> 
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cinder/+bug/1956887/+subscriptions