External ceph fails to create volume with error 22

Bug #1938595 reported by Boris Lukashev
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
kolla-ansible
New
Undecided
Unassigned

Bug Description

Using Wallaby and Pacific, with Ceph and LVM servicing cinder on multiple back-ends, i'm unable to create volumes from images due to:
```
Error scheduling b9be1947-8dea-4abf-b2d5-fffff629a1ab from last vol-service: control0@rbd-1#rbd-1 : ['Traceback (most recent call last):\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/taskflow/engines/action_engine/executor.py", line 53, in _execute_task\n result = task.execute(**arguments)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 1132, in execute\n model_update = self._create_from_image(context,\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/cinder/utils.py", line 614, in _wrapper\n return r.call(f, *args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/tenacity/__init__.py", line 411, in call\n return self.__call__(*args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/tenacity/__init__.py", line 423, in __call__\n do = self.iter(retry_state=retry_state)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/tenacity/__init__.py", line 360, in iter\n return fut.result()\n', ' File "/usr/lib/python3.8/concurrent/futures/_base.py", line 437, in result\n return self.__get_result()\n', ' File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result\n raise self._exception\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/tenacity/__init__.py", line 426, in __call__\n result = fn(*args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/cinder/volume/flows/manager/create_volume.py", line 998, in _create_from_image\n model_update, cloned = self.driver.clone_image(context,\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/cinder/volume/drivers/rbd.py", line 1567, in clone_image\n volume_update = self._clone(volume, pool, image, snapshot)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/cinder/volume/drivers/rbd.py", line 1019, in _clone\n self.RBDProxy().clone(src_client.ioctx,\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 190, in doit\n result = proxy_call(self._autowrap, f, *args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 148, in proxy_call\n rv = execute(f, *args, **kwargs)\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 129, in execute\n six.reraise(c, e, tb)\n', ' File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise\n raise value\n', ' File "/var/lib/kolla/venv/lib/python3.8/site-packages/eventlet/tpool.py", line 83, in tworker\n rv = meth(*args, **kwargs)\n', ' File "rbd.pyx", line 698, in rbd.RBD.clone\n', 'rbd.InvalidArgument: [errno 22] RBD invalid argument (error creating clone)\n']
```
When i was dealing with this in Canonical's implementation, the fix was to add `show_image_direct_url=False` to Glance configs.
Having added that and reconfigured:
```
 grep -r show_image_direct_url /etc/kolla/
/etc/kolla/glance-api/glance-api.conf:show_image_direct_url = False
/etc/kolla/glance-api/glance-cache.conf:show_image_direct_url = False

```
im still getting the same error.

Revision history for this message
Boris Lukashev (rageltman) wrote :

Well, this is heartening - the problem is caused by #1938594. Looks like cinder depends on swift for some reason to talk to glance when downloading volumes, so when the v3 token issued to swift and used by the RGWs expires, it somehow creates the same breaking condition seen in Canonical's default config (`show_image_direct_url=True`) when using multiple Cinder back-ends.

I'm not sure we should close this as a duplicate until #1938594 is resolved because anyone searching for that error code will run into the "other cause" in Google search results instead of being led here.

Revision history for this message
Radosław Piliszek (yoctozepto) wrote :

I'm not really following your comment as the other issue is about RGW.

Anyhow, there is likely something to fix here and for now you can try to also add this to the glance.conf:

  show_multiple_locations = False

to hide RBD URLs.
Both go in the [DEFAULT] section (case sensitive).

Revision history for this message
Boris Lukashev (rageltman) wrote :

Thanks, trying that now.
So far i've narrowed this down (after rebooting all the nodes underpinning both ceph and openstack containers - because that makes such perfect sense as a fix to the inability to auth via UNAP for RGW :) ) to only impacting raw-format images going from Glance (Ceph images pool) to Cinder's RBD backend (the volumes pool, VMs pool seems to create volumes just fine out of Glance)

Revision history for this message
Boris Lukashev (rageltman) wrote :

Thank you - that seems to have done it. With Glance configs injecting
```
[DEFAULT]
show_image_direct_url = False
show_multiple_locations = False
```
and with the control hosts under ceph/openstack control planes rebooted after installation, i now see even raw images getting created correctly.

Revision history for this message
Mitchell Walls (miwalls) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.