Cinder volume type creation race condition failure

Bug #1588777 reported by Samuel Matzek
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Chhavi Agarwal

Bug Description

If you have multiple Ceph Cinder volume drivers on different HA proxied hosts pointing to the same ceph cluster the OSA deploy can fail due to a race condition when creating the volume types.

It fails like this:
TASK: [os_cinder | Add in cinder devices types]
failed: [sm14_cinder_volumes_container-ade7a532] => (item={'key': u'ceph', 'value': {u'rbd_store_chunk_size': 4, u'rbd_ceph_conf': u'/etc/ceph/ceph.conf', u'volume_backend_name': u'ceph', u'rbd_secret_uuid': u'4f8dc617-4c46-4925-bd11-2c14397b17f2', u'volume_driver': u'cinder.volume.drivers.rbd.RBDDriver', u'rados_connect_timeout': -1, u'rbd_pool': u'volumes', u'rbd_flatten_volume_from_snapshot': u'false', u'rbd_max_clone_depth': 5, u'rbd_user': u'cinder'}}) => {"changed": true, "cmd": ". /root/openrc\n if ! /openstack/venvs/cinder-13.1.2/bin/cinder type-list | grep \"ceph\"; then\n /openstack/venvs/cinder-13.1.2/bin/cinder type-create \"ceph\"\n /openstack/venvs/cinder-13.1.2/bin/cinder type-key \"ceph\" set volume_backend_name=\"ceph\"\n fi", "delta": "0:00:03.847680", "end": "2016-06-02 09:32:20.565656", "item": {"key": "ceph", "value": {"rados_connect_timeout": -1, "rbd_ceph_conf": "/etc/ceph/ceph.conf", "rbd_flatten_volume_from_snapshot": "false", "rbd_max_clone_depth": 5, "rbd_pool": "volumes", "rbd_secret_uuid": "4f8dc617-4c46-4925-bd11-2c14397b17f2", "rbd_store_chunk_size": 4, "rbd_user": "cinder", "volume_backend_name": "ceph", "volume_driver": "cinder.volume.drivers.rbd.RBDDriver"}}, "rc": 1, "start": "2016-06-02 09:32:16.717976", "warnings": []}
stderr: ERROR: Multiple volumetype matches found for 'ceph', use an ID to be more specific.
stdout: +--------------------------------------+------+-------------+-----------+
| ID | Name | Description | Is_Public |
+--------------------------------------+------+-------------+-----------+
| 2a2578d6-1097-4991-b92d-83a16949a0ed | ceph | - | True |
+--------------------------------------+------+-------------+-----------+

#####################################
openstack_user_config.yml:
storage_hosts:
  sm13:
    ip: 172.29.236.13
    container_vars:
      cinder_backends:
        limit_container_types: cinder_volume
        ceph:
          volume_driver: cinder.volume.drivers.rbd.RBDDriver
          rbd_pool: volumes
          rbd_ceph_conf: /etc/ceph/ceph.conf
          rbd_flatten_volume_from_snapshot: 'false'
          rbd_max_clone_depth: 5
          rbd_store_chunk_size: 4
          rados_connect_timeout: -1
          volume_backend_name: ceph
          rbd_user: "{{ cinder_ceph_client }}"
          rbd_secret_uuid: "{{ cinder_ceph_client_uuid }}"
  sm14:
    ip: 172.29.236.14
    container_vars:
      cinder_backends:
        limit_container_types: cinder_volume
        ceph:
          volume_driver: cinder.volume.drivers.rbd.RBDDriver
          rbd_pool: volumes
          rbd_ceph_conf: /etc/ceph/ceph.conf
          rbd_flatten_volume_from_snapshot: 'false'
          rbd_max_clone_depth: 5
          rbd_store_chunk_size: 4
          rados_connect_timeout: -1
          volume_backend_name: ceph
          rbd_user: "{{ cinder_ceph_client }}"
          rbd_secret_uuid: "{{ cinder_ceph_client_uuid }}"

#####################################
What we see is that two volume_types with the name 'ceph' get created but neither of them have the volume_backend_name extra spec set:

root@sm14-cinder-volumes-container-ade7a532:~# /openstack/venvs/cinder-13.1.2/bin/cinder --insecure type-list
+--------------------------------------+------+-------------+-----------+
| ID | Name | Description | Is_Public |
+--------------------------------------+------+-------------+-----------+
| 2a2578d6-1097-4991-b92d-83a16949a0ed | ceph | - | True |
| 9c89fb16-2b07-40e2-8537-90192c809056 | ceph | - | True |
+--------------------------------------+------+-------------+-----------+

root@sm13-cinder-volumes-container-c3ba6bdc:~# /openstack/venvs/cinder-13.1.2/bin/cinder --insecure type-show 9c89fb16-2b07-40e2-8537-90192c809056
+---------------------------------+--------------------------------------+
| Property | Value |
+---------------------------------+--------------------------------------+
| description | None |
| extra_specs | {} |
| id | 9c89fb16-2b07-40e2-8537-90192c809056 |
| is_public | True |
| name | ceph |
| os-volume-type-access:is_public | True |
| qos_specs_id | None |
+---------------------------------+--------------------------------------+

root@sm13-cinder-volumes-container-c3ba6bdc:~# /openstack/venvs/cinder-13.1.2/bin/cinder --insecure type-show 2a2578d6-1097-4991-b92d-83a16949a0ed
+---------------------------------+--------------------------------------+
| Property | Value |
+---------------------------------+--------------------------------------+
| description | None |
| extra_specs | {} |
| id | 2a2578d6-1097-4991-b92d-83a16949a0ed |
| is_public | True |
| name | ceph |
| os-volume-type-access:is_public | True |
| qos_specs_id | None |
+---------------------------------+--------------------------------------+

This is in line with the task code:
- name: Add in cinder devices types
  shell: |
    . {{ ansible_env.HOME }}/openrc
    if ! {{ cinder_bin }}/cinder {{ keystone_service_adminuri_insecure | bool | ternary('--insecure','') }} type-list | grep "{{ item.key }}"; then
      {{ cinder_bin }}/cinder {{ keystone_service_adminuri_insecure | bool | ternary('--insecure','') }} type-create "{{ item.key }}"
      {{ cinder_bin }}/cinder {{ keystone_service_adminuri_insecure | bool | ternary('--insecure','') }} type-key "{{ item.key }}" set volume_backend_name="{{ item.value.volume_backend_name }}"
    fi

What is likely happening is both hosts get into the "if it already exists, don't do it check", through the type-create and then they both fail when trying to the extra_spec key set due to two volume types having the same name and the set command would need the UUID to distinguish.

A fix for this would be to make this task run serialized.

Samuel Matzek (smatzek)
Changed in openstack-ansible:
assignee: nobody → Samuel Matzek (smatzek)
Revision history for this message
Samuel Matzek (smatzek) wrote :

Tasks can't be serialize with the 'serial' tag and you can only serialize at the at the playbook level. Looking for other options to fix it.

Revision history for this message
Logan V (loganv) wrote :

Maybe volume type creation should be run_once

Revision history for this message
Jesse Pretorius (jesse-pretorius) wrote :

Two solutions here:

One is to pull this task into the playbook and make it only run once on the group (or only execute against the first host in the group).

Another is to only make that task execute against the first member of the group through the use of a conditional. This is similar to https://github.com/openstack/openstack-ansible-os_neutron/blob/master/tasks/main.yml#L55-L57

Changed in openstack-ansible:
status: New → Confirmed
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_cinder (master)

Fix proposed to branch: master
Review: https://review.openstack.org/329325

Changed in openstack-ansible:
assignee: Samuel Matzek (smatzek) → Chhavi Agarwal (chhagarw)
status: Confirmed → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_cinder (master)

Reviewed: https://review.openstack.org/329325
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_cinder/commit/?id=27a895cbd5602a35e797adbb619520553e0e7eb8
Submitter: Jenkins
Branch: master

commit 27a895cbd5602a35e797adbb619520553e0e7eb8
Author: Chhavi Agarwal <email address hidden>
Date: Tue Jun 14 03:47:58 2016 -0500

    Cinder volume type creation race condition

    If multiple cinder volume drivers on different HA hosts
    are configured OSA deploy fails due to race condition while
    creating volume types. In such cases voluem type creation
    should be only run once.

    Change-Id: Ifcadde08de66f87a35754d6e4b2c6004888d49aa
    Closes-Bug: #1588777

Changed in openstack-ansible:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible-os_cinder (stable/mitaka)

Fix proposed to branch: stable/mitaka
Review: https://review.openstack.org/329906

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible-os_cinder (stable/mitaka)

Reviewed: https://review.openstack.org/329906
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible-os_cinder/commit/?id=a4afeb94600e6fb1006a009a512525266a0fd4fd
Submitter: Jenkins
Branch: stable/mitaka

commit a4afeb94600e6fb1006a009a512525266a0fd4fd
Author: Chhavi Agarwal <email address hidden>
Date: Tue Jun 14 03:47:58 2016 -0500

    Cinder volume type creation race condition

    If multiple cinder volume drivers on different HA hosts
    are configured OSA deploy fails due to race condition while
    creating volume types. In such cases voluem type creation
    should be only run once.

    Change-Id: Ifcadde08de66f87a35754d6e4b2c6004888d49aa
    Closes-Bug: #1588777
    (cherry picked from commit 27a895cbd5602a35e797adbb619520553e0e7eb8)

tags: added: in-stable-mitaka
Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible-os_cinder 13.1.4

This issue was fixed in the openstack/openstack-ansible-os_cinder 13.1.4 release.

Revision history for this message
Doug Hellmann (doug-hellmann) wrote : Fix included in openstack/openstack-ansible-os_cinder 14.0.0.0b2

This issue was fixed in the openstack/openstack-ansible-os_cinder 14.0.0.0b2 development milestone.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.