Add support for Pacific to RBD driver

Bug #1931004 reported by Jon Bernard
50
This bug affects 9 people
Affects Status Importance Assigned to Milestone
Cinder
Fix Released
Low
Jon Bernard
Ubuntu Cloud Archive
New
Undecided
Unassigned
Wallaby
New
Undecided
Unassigned
Xena
New
Undecided
Unassigned
glance (Ubuntu)
Confirmed
Undecided
Unassigned
Hirsute
Won't Fix
Undecided
Unassigned
Impish
Won't Fix
Undecided
Unassigned

Bug Description

When using ceph pacific, volume-from-image operations where both glance and cinder are configured to use RBD result in an exception when calling clone():

    rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating clone)

    ERROR cinder.volume.manager Traceback (most recent call last):
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
    ERROR cinder.volume.manager result = task.execute(**arguments)
    ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 1132, in execute
    ERROR cinder.volume.manager model_update = self._create_from_image(context,
    ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/utils.py", line 638, in _wrapper
    ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 411, in call
    ERROR cinder.volume.manager return self.__call__(*args, **kwargs)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 423, in __call__
    ERROR cinder.volume.manager do = self.iter(retry_state=retry_state)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 360, in iter
    ERROR cinder.volume.manager return fut.result()
    ERROR cinder.volume.manager File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 438, in result
    ERROR cinder.volume.manager return self.__get_result()
    ERROR cinder.volume.manager File "/usr/lib64/python3.9/concurrent/futures/_base.py", line 390, in __get_result
    ERROR cinder.volume.manager raise self._exception
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/tenacity/__init__.py", line 426, in __call__
    ERROR cinder.volume.manager result = fn(*args, **kwargs)
    ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/flows/manager/create_volume.py", line 998, in _create_from_image
    ERROR cinder.volume.manager model_update, cloned = self.driver.clone_image(context,
    ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 1571, in clone_image
    ERROR cinder.volume.manager volume_update = self._clone(volume, pool, image, snapshot)
    ERROR cinder.volume.manager File "/opt/stack/cinder/cinder/volume/drivers/rbd.py", line 1023, in _clone
    ERROR cinder.volume.manager self.RBDProxy().clone(src_client.ioctx,
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 190, in doit
    ERROR cinder.volume.manager result = proxy_call(self._autowrap, f, *args, **kwargs)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 148, in proxy_call
    ERROR cinder.volume.manager rv = execute(f, *args, **kwargs)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 129, in execute
    ERROR cinder.volume.manager six.reraise(c, e, tb)
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/six.py", line 719, in reraise
    ERROR cinder.volume.manager raise value
    ERROR cinder.volume.manager File "/usr/local/lib/python3.9/site-packages/eventlet/tpool.py", line 83, in tworker
    ERROR cinder.volume.manager rv = meth(*args, **kwargs)
    ERROR cinder.volume.manager File "rbd.pyx", line 698, in rbd.RBD.clone
    ERROR cinder.volume.manager rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating clone)
    ERROR cinder.volume.manager

In Pacific a check was added to make sure during a clone operation that the child's strip unit was not less than that of its parent. Failing this condition returns -EINVAL, which is then raised by python-rbd as an exception. This maps to the 'order' argument in clone(), where order is log base 2 of the strip unit. Ceph's default is 4 megabytes. The reason we're seeing EINVAL exceptions in the Pacific CI is that: when Openstack is configured to use Ceph for both cinder and glance, volume-from-image tests fail because Glance's default stripe unit is 8 (distinctly larger than Cinder's 4). This results in an order calculation of 22, which is invalid for clone() (too small).

I see two possible solutions and have proposed patches:

1. Increase Cinder's default chunk size to match Glance's. I think this makes sense for both consistency and performance.

2. When doing a clone(), consider the configured chunk size /and/ the strip unit of the parent volume and choose the higher value.

Either of these approaches prevent the failures we're seeing, I think they are both useful individually as well.

Changed in cinder:
status: New → In Progress
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :
Revision history for this message
Sofia Enriquez (lsofia-enriquez) wrote :

Hi Jon,

Do you mind sharing the steps you are doing? I'm not able to reproduce it. This are the steps that I'm doing:
- Create vol1 from images.
- Create vol2 from vol1.

Cheers,
Sofi

Changed in cinder:
importance: Undecided → Low
tags: added: clone glance image rbd
Changed in cinder:
assignee: nobody → Jon Bernard (jbernard)
Revision history for this message
Matthew Teehee (mvteehee) wrote :

Howdy

We have been having this issue since updating to Wallaby. We have been able to reproduce this via this method:

1.) Upload a RAW image or already have a RAW image created. Currently we have tried Ubuntu 18, Cirros and Ubuntu 20. When uploading the image, it will have an rbd direct_url.
2.) Attempt to create a volume from the image and it immediately fails

```
2021-06-03 22:04:02.379 58557 ERROR cinder.scheduler.filter_scheduler [req-f36207d7-e298-4ce1-a6b2-224bdf37b775 c04f7008f7154d2093b350d1c58686c8 3c256bf48de5461e9fe3b839f7dc66a2 - - -] Error scheduling 982537fb-7554-4dda-b399-9c19c212ba28 from last vol-service: volume-f30e3dbe-2400-53b9-a38b-5247723dea12@rbd#RBD : ['Traceback (most recent call last):\n', ' File "/usr/lib/python3/dist-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task\n result = task.execute(**arguments)\n', ' File "/usr/lib/python3/dist-packages/cinder/volume/flows/manager/create_volume.py", line 1132, in execute\n model_update = self._create_from_image(context,\n', ' File "/usr/lib/python3/dist-packages/cinder/utils.py", line 614, in _wrapper\n return r.call(f, *args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 409, in call\n do = self.iter(retry_state=retry_state)\n', ' File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 356, in iter\n return fut.result()\n', ' File "/usr/lib/python3.8/concurrent/futures/_base.py", line 432, in result\n return self.__get_result()\n', ' File "/usr/lib/python3.8/concurrent/futures/_base.py", line 388, in __get_result\n raise self._exception\n', ' File "/usr/lib/python3/dist-packages/tenacity/__init__.py", line 412, in call\n result = fn(*args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/cinder/volume/flows/manager/create_volume.py", line 998, in _create_from_image\n model_update, cloned = self.driver.clone_image(context,\n', ' File "/usr/lib/python3/dist-packages/cinder/volume/drivers/rbd.py", line 1567, in clone_image\n volume_update = self._clone(volume, pool, image, snapshot)\n', ' File "/usr/lib/python3/dist-packages/cinder/volume/drivers/rbd.py", line 1019, in _clone\n self.RBDProxy().clone(src_client.ioctx,\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 193, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 151, in proxy_call\n rv = execute(f, *args, **kwargs)\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 132, in execute\n six.reraise(c, e, tb)\n', ' File "/usr/lib/python3/dist-packages/six.py", line 703, in reraise\n raise value\n', ' File "/usr/lib/python3/dist-packages/eventlet/tpool.py", line 86, in tworker\n rv = meth(*args, **kwargs)\n', ' File "rbd.pyx", line 698, in rbd.RBD.clone\n', 'rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating clone)\n']
```
However, when doing this same process using a qcow2 image, there is not issue.

Revision history for this message
Jon Bernard (jbernard) wrote :

Hi Sofi, Matthew, that's right, qcow2 images will not trigger the bug, as the image must be converted to raw prior to import - which prevents a COW clone from occurring. To reproduce this you need glance and cinder both using the same ceph cluster, an image in glance that has 'disk-format' equal to 'raw', and then simply create a volume from that image. The raw format glance image will allow cinder to call clone(), and this is where the issue arises.

Revision history for this message
Javier Diaz Jr (javierdiazcharles) wrote :

Can we bump this issues importance to something higher than low? Cinder/Ceph is a pretty standard config so I assume this bug will impact a large amount of people. We are just the lucky first few to encounter it.

Revision history for this message
Matthew Teehee (mvteehee) wrote :

Hello, Sofi, Jon; Jon is correct, in our environment glance and cinder both use the same backend storage cluster. I feel that Javier is correct; people who use this setup are going to have issues with this process.

Revision history for this message
Matthew Teehee (mvteehee) wrote :
Download full text (5.4 KiB)

Hello, Sofi, Jon: I have tested the purposed fix in our testing environment, but now we are getting permission issues:

Any updates on this?

root@volume-25018ecd-acf9-51dd-bcaa-d20cb0d49ae5:/etc/ceph# rbd -c /etc/ceph/ceph.conf -n client.volumes -k /etc/ceph/ceph.client.volumes.keyring -p images snap create 90f29707-8d2e-45bb-af05-baa2ab325d93@test --debug-ms=1 --debug-rbd=20
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 Processor -- start
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 -- start start
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 --2- >> v2:172.18.0.7:3300/0 conn(0x55d46cecaee0 0x55d46cecb2c0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 --2- >> v2:172.18.0.6:3300/0 conn(0x55d46ce39a80 0x55d46ced2cf0 unknown :-1 s=NONE pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0).connect
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 -- --> v1:172.18.0.6:6789/0 -- auth(proto 0 32 bytes epoch 0) v1 -- 0x55d46cda3370 con 0x55d46cecb7b0
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 -- --> v2:172.18.0.6:3300/0 -- mon_getmap magic: 0 v1 -- 0x55d46cdb4bf0 con 0x55d46ce39a80
2021-06-14T23:09:37.098+0000 7fb2f30c41c0 1 -- --> v2:172.18.0.7:3300/0 -- mon_getmap magic: 0 v1 -- 0x55d46cdb4d30 con 0x55d46cecaee0
2021-06-14T23:09:37.102+0000 7fb2f1cf9700 1 --2- >> v2:172.18.0.6:3300/0 conn(0x55d46ce39a80 0x55d46ced2cf0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
2021-06-14T23:09:37.102+0000 7fb2f0cf7700 1 --1- >> v1:172.18.0.6:6789/0 conn(0x55d46cecb7b0 0x55d46ce391d0 :-1 s=CONNECTING_WAIT_BANNER_AND_IDENTIFY pgs=0 cs=0 l=0).handle_server_banner_and_identify peer v1:172.18.0.6:6789/0 says I am v1:172.18.0.28:60048/0 (socket says 172.18.0.28:60048)
2021-06-14T23:09:37.102+0000 7fb2f0cf7700 1 -- 172.18.0.28:0/3517608923 learned_addr learned my addr 172.18.0.28:0/3517608923 (peer_addr_for_me v1:172.18.0.28:0/0)
2021-06-14T23:09:37.102+0000 7fb2f14f8700 1 --2- 172.18.0.28:0/3517608923 >> v2:172.18.0.7:3300/0 conn(0x55d46cecaee0 0x55d46cecb2c0 unknown :-1 s=BANNER_CONNECTING pgs=0 cs=0 l=0 rev1=0 rx=0 tx=0)._handle_peer_banner_payload supported=1 required=0
2021-06-14T23:09:37.102+0000 7fb2f1cf9700 1 --2- 172.18.0.28:0/3517608923 >> v2:172.18.0.6:3300/0 conn(0x55d46ce39a80 0x55d46ced2cf0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).handle_auth_bad_method method=2 result (13) Permission denied, allowed methods=[2], allowed modes=[2,1]
2021-06-14T23:09:37.102+0000 7fb2f1cf9700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [2]
2021-06-14T23:09:37.102+0000 7fb2f1cf9700 1 -- 172.18.0.28:0/3517608923 >> v2:172.18.0.6:3300/0 conn(0x55d46ce39a80 msgr2=0x55d46ced2cf0 unknown :-1 s=STATE_CONNECTION_ESTABLISHED l=0).mark_down
2021-06-14T23:09:37.102+0000 7fb2f1cf9700 1 --2- 172.18.0.28:0/3517608923 >> v2:172.18.0.6:3300/0 conn(0x55d46ce39a80 0x55d46ced2cf0 unknown :-1 s=AUTH_CONNECTING pgs=0 cs=0 l=0 rev1=1 rx=0 tx=0).stop
2021-06-14T23:09:37.102+0000 7fb2e3fff700 1 -- 172.18.0.28:0/3517608923 <== mon.0 v1:172.18.0.6:6789/0 1 ==== auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (...

Read more...

Revision history for this message
Jon Bernard (jbernard) wrote :

Hi Matthew, I’m not able to connect the command you’re running in your paste to the proposed patch, can you elaborate on what you did and what you’re seeing? The patch is working both locally and in our CI job, I need more info.

Revision history for this message
kyle schleich (poptar7) wrote :

We ran into this issue with Wallaby and Pacific.

Manually applying the patch allows the creation of new volumes from images again.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on cinder (master)

Change abandoned by "Jon Bernard <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/cinder/+/786266
Reason: See my last review response.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (master)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/786260
Committed: https://opendev.org/openstack/cinder/commit/49a2c85eda9fd3cddc75fd904fe62c87a6b50735
Submitter: "Zuul (22348)"
Branch: master

commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735
Author: Jon Bernard <email address hidden>
Date: Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation

    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime. This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for. When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.

    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images. If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default. The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder). If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.

    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.

    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone(). Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.

    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3

Changed in cinder:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/cinder/+/804265

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/804265
Committed: https://opendev.org/openstack/cinder/commit/5db58159feec3d2d39d1abf3637310f5ac60a3cf
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 5db58159feec3d2d39d1abf3637310f5ac60a3cf
Author: Jon Bernard <email address hidden>
Date: Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation

    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime. This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for. When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.

    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images. If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default. The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder). If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.

    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.

    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone(). Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.

    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
    (cherry picked from commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735)

tags: added: in-stable-wallaby
Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in glance (Ubuntu Hirsute):
status: New → Confirmed
Changed in glance (Ubuntu):
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 18.1.0

This issue was fixed in the openstack/cinder 18.1.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/victoria)

Fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/cinder/+/808475

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/ussuri)

Fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/cinder/+/809181

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to cinder (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/cinder/+/809190

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/victoria)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/808475
Committed: https://opendev.org/openstack/cinder/commit/07ead73eec0ac6b962b533b07861d6a81226fa37
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit 07ead73eec0ac6b962b533b07861d6a81226fa37
Author: Jon Bernard <email address hidden>
Date: Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation

    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime. This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for. When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.

    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images. If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default. The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder). If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.

    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.

    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone(). Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.

    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
    (cherry picked from commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735)
    (cherry picked from commit 5db58159feec3d2d39d1abf3637310f5ac60a3cf)
    Conflicts:
            cinder/tests/unit/volume/drivers/test_rbd.py

tags: added: in-stable-victoria
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 19.0.0.0rc1

This issue was fixed in the openstack/cinder 19.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/ussuri)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/809181
Committed: https://opendev.org/openstack/cinder/commit/06b32da4be8b69e626eb7eb8091f695cbdcd92e7
Submitter: "Zuul (22348)"
Branch: stable/ussuri

commit 06b32da4be8b69e626eb7eb8091f695cbdcd92e7
Author: Jon Bernard <email address hidden>
Date: Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation

    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime. This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for. When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.

    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images. If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default. The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder). If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.

    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.

    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone(). Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.

    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
    (cherry picked from commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735)
    (cherry picked from commit 5db58159feec3d2d39d1abf3637310f5ac60a3cf)
    Conflicts:
            cinder/tests/unit/volume/drivers/test_rbd.py
    (cherry picked from commit 07ead73eec0ac6b962b533b07861d6a81226fa37)

tags: added: in-stable-ussuri
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 17.2.0

This issue was fixed in the openstack/cinder 17.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to cinder (stable/train)

Reviewed: https://review.opendev.org/c/openstack/cinder/+/809190
Committed: https://opendev.org/openstack/cinder/commit/7fd4d46a8ea1fa18cc641845ec932b84e45c4657
Submitter: "Zuul (22348)"
Branch: stable/train

commit 7fd4d46a8ea1fa18cc641845ec932b84e45c4657
Author: Jon Bernard <email address hidden>
Date: Wed Apr 14 11:14:13 2021 -0400

    RBD: use correct stripe unit in clone operation

    The recent release of Ceph Pacific saw a change to the clone() logic
    where invalid values of stripe unit would cause an error to be returned
    where previous versions would correct the value at runtime. This
    becomes a problem when creating a volume from an image, where the source
    RBD image may have a larger stripe unit than cinder's RBD driver is
    configured for. When this happens, clone() is called with a stripe unit
    that is too small given that of the source image and the clone fails.

    The RBD driver in Cinder has a configuration parameter
    'rbd_store_chunk_size' that stores the preferred object size for cloned
    images. If clone() is called without a stripe_unit passed in, the
    stripe unit defaults to the object size, which is 4MB by default. The
    issue arises when creating a volume from a Glance image, where Glance is
    creating images with a default stripe unit of 8MB (distinctly larger
    than that of Cinder). If we do not consider the incoming stripe unit
    and select the larger of the two, Ceph cannot clone an RBD image with a
    smaller stripe unit and raises an error.

    This patch adds a function in our driver's clone logic to select the
    larger of the two stripe unit values so that the appropriate stripe unit
    is chosen.

    It should also be noted that we're determining the correct stripe unit,
    but using the 'order' argument to clone(). Ceph will set the stripe
    unit equal to the object size (order) by default and we rely on this
    behaviour for the following reason: passing stripe-unit alone or with
    an order argument causes an invalid argument exception to be raised in
    pre-pacific releases of Ceph, as it's argument parsing appears to have
    limitations.

    Closes-Bug: #1931004
    Change-Id: Iec111ab83e9ed8182c9679c911e3d90927d5a7c3
    (cherry picked from commit 49a2c85eda9fd3cddc75fd904fe62c87a6b50735)
    (cherry picked from commit 5db58159feec3d2d39d1abf3637310f5ac60a3cf)
    Conflicts:
            cinder/tests/unit/volume/drivers/test_rbd.py
    (cherry picked from commit 07ead73eec0ac6b962b533b07861d6a81226fa37)
    (cherry picked from commit 06b32da4be8b69e626eb7eb8091f695cbdcd92e7)
    Conflicts:
            cinder/tests/unit/volume/drivers/test_rbd.py

tags: added: in-stable-train
Revision history for this message
Bartosz Bezak (bbezak) wrote :
Download full text (4.0 KiB)

I've upgraded recently environment to latest stable/victoria build, and I am not able to boot from rbd volume - actually cinder cannot fetch flance images from RBD - it worked on train and ussuri. Ceph is Pacific (16.2.6) - https://paste.opendev.org/raw/809512/
Looks like related to this one https://review.opendev.org/c/openstack/cinder/+/808475 (it is not merged to ussuri yet)
I've tested rbd_store_chunk_size = 8 (the same as in glance). But same issue.

I've used older cinder Victoria build and it worked.

           |__Flow 'volume_create_manager': rbd.PermissionError: [errno 1] error opening image b'7e0beafa-f1a2-403e-8394-f5e6900d0785' at snapshot None
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager Traceback (most recent call last):
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager result = task.execute(**arguments)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/volume/flows/manager/create_volume.py", line 1164, in execute
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager **volume_spec)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/utils.py", line 694, in _wrapper
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager return r.call(f, *args, **kwargs)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/tenacity/__init__.py", line 409, in call
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager do = self.iter(retry_state=retry_state)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/tenacity/__init__.py", line 356, in iter
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager return fut.result()
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 425, in result
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager return self.__get_result()
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/usr/lib64/python3.6/concurrent/futures/_base.py", line 384, in __get_result
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager raise self._exception
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/tenacity/__init__.py", line 412, in call
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager result = fn(*args, **kwargs)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/volume/flows/manager/create_volume.py", line 1032, in _create_from_image
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager image_service)
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager File "/var/lib/kolla/venv/lib/python3.6/site-packages/cinder/volume/drivers/rbd.py", line 1584, in clone_image
2021-09-22 18:23:19.974 72 ERROR cinder.volume.manager volume_update ...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder 16.4.2

This issue was fixed in the openstack/cinder 16.4.2 release.

Revision history for this message
Brian Murray (brian-murray) wrote :

The Hirsute Hippo has reached End of Life, so this bug will not be fixed for that release.

Changed in glance (Ubuntu Hirsute):
status: Confirmed → Won't Fix
Revision history for this message
Brian Murray (brian-murray) wrote :

Ubuntu 21.10 (Impish Indri) has reached end of life, so this bug will not be fixed for that specific release.

Changed in glance (Ubuntu Impish):
status: Confirmed → Won't Fix
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/cinder train-eol

This issue was fixed in the openstack/cinder train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.