kolla-ansible

external ceph cinder volume config breaks volumes on ussuri upgrade

Bug #1904062 reported by Alexander Diana on 2020-11-12

This bug affects 2 people

	Status	Importance	Assigned to	Milestone
kolla-ansible	In Progress	High	Michal Nasiadka	kolla-ansible 12.0.0 "wallaby"
Ussuri	Won't Fix	High	Unassigned
Victoria	Won't Fix	High	Unassigned
Wallaby	Won't Fix	High	Michal Nasiadka	kolla-ansible 12.0.0 "wallaby"

Bug Description

**Bug Report**

What happened:
When refactoring to use the new external-ceph templates in ussuri, cinder-volume agents came up under their own hosts, which results in 3 "different" storage hosts.

This results in all pre-ussuri volumes being unmanagable, as they are still tied to rbd:volumes@rbd-1, and new volumes will also become unmanagable if their host agent goes down.

What you expected to happen:

cinder-volume services to come up under a single host, so that a single node failure, does not result in unmanagable volumes.

How to fix:
cinder.conf needs backend_host=rbd:volumes added to the rbd-1 config as a sane default, which matches up with previous recommendations and expected behavior.
This will make existing deployments work without changes, and fix the single-node-failure condition of the current settings.

How to reproduce it (minimal and precise):

**Environment**:
* Kolla-Ansible version: stable/ussuri

Tags:

Mark Goddard (mgoddard) on 2020-11-13

Changed in kolla-ansible:
status:	New → Triaged
importance:	Undecided → High

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-11-13:

Train external ceph docs: https://docs.openstack.org/kolla-ansible/train/reference/storage/external-ceph-guide.html#cinder

[rbd-1]
rbd_ceph_conf=/etc/ceph/ceph.conf
rbd_user=cinder
backend_host=rbd:volumes
rbd_pool=volumes
volume_backend_name=rbd-1
volume_driver=cinder.volume.drivers.rbd.RBDDriver
rbd_secret_uuid = {{ cinder_rbd_secret_uuid }}

Ussuri made the integration simpler, adding the following to ceph.conf:

{% if cinder_backend_ceph | bool %}
[rbd-1]
volume_driver = cinder.volume.drivers.rbd.RBDDriver
volume_backend_name = rbd-1
rbd_pool = {{ ceph_cinder_pool_name }}
rbd_ceph_conf = /etc/ceph/ceph.conf
rbd_flatten_volume_from_snapshot = false
rbd_max_clone_depth = 5
rbd_store_chunk_size = 4
rados_connect_timeout = 5
rbd_user = {{ ceph_cinder_user }}
rbd_secret_uuid = {{ cinder_rbd_secret_uuid }}
report_discard_supported = True
image_upload_use_cinder_backend = True
{% endif %}

This is missing backend_host=rbd:volumes. There is a related Tripleo bug [1], which explains that this option is used to set the same host for all backends in an environment with multiple cinder-volume services representing a single storage cluster.

[1] https://bugs.launchpad.net/bugs/1753596

summary:

- external_ceph cinder-volume config break volumes on ussuri upgrade
+ external ceph cinder volume config breaks volumes on ussuri upgrade

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-11-13:

Actually, this OpenStack Ansible bug suggests that backend_host is not recommended: https://bugs.launchpad.net/cinder/+bug/1837403. We might need to set [DEFAULT] cluster to use active/active cinder-volume though.

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-11-13:

OSA highlights the 'cinder-manage volume update_host' command in their release notes. It's not clear to me what the right solution is at this point though.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-11-17: Fix proposed to kolla-ansible (master)

Fix proposed to branch: master
Review: https://review.opendev.org/763011

Changed in kolla-ansible:
assignee:	nobody → Michal Nasiadka (mnasiadka)
status:	Triaged → In Progress

Revision history for this message

Mark Goddard (mgoddard) wrote on 2020-11-17:

High level cinder active/active spec: https://specs.openstack.org/openstack/cinder-specs/specs/mitaka/cinder-volume-active-active-support.html

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-12-18:

The relevant openstack-discuss ML thread: http://lists.openstack.org/pipermail/openstack-discuss/2020-November/018838.html (thank you all for answering our questions!)

tags:

added: ceph cinder rbd

Revision history for this message

Radosław Piliszek (yoctozepto) wrote on 2020-12-19:

Failure testing in https://review.opendev.org/c/openstack/kolla-ansible/+/767951

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-09-09: Related fix proposed to kolla-ansible (stable/victoria)

Related fix proposed to branch: stable/victoria
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/808003

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-09-09: Related fix proposed to kolla-ansible (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/808004

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-09-30: Related fix merged to kolla-ansible (stable/victoria)

#10

Reviewed: https://review.opendev.org/c/openstack/kolla-ansible/+/808003
Committed: https://opendev.org/openstack/kolla-ansible/commit/f97e752018affcb81604230e7e9b0101960cec83
Submitter: "Zuul (22348)"
Branch: stable/victoria

commit f97e752018affcb81604230e7e9b0101960cec83
Author: Radosław Piliszek <email address hidden>
Date: Fri Dec 18 21:46:35 2020 +0100

[CI] Cinder upgrade testing

    To gain visibility into how our upgrades affect existing Cinder
    volumes, a new testing path is required.
    This patch adds it.

Additionally, it refactors the repeated actions and ensures that
we wait for volume deletions as well.

    Change-Id: Ic08d461e6fdf91c378a87860765a489c2f86d690
    Related-Bug: #1904062
    (cherry picked from commit 62b8c6b68413330da032d14b45c6fbd340ec9e2d)

tags:

added: in-stable-victoria

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2021-11-04: Change abandoned on kolla-ansible (stable/ussuri)

#11

Change abandoned by "Radosław Piliszek <email address hidden>" on branch: stable/ussuri
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/808004
Reason: ussuri going em, thus /us not extending CI

Revision history for this message

Sven Kieske (s-kieske) wrote on 2022-03-03:

#12

Hi,

can someone provide an update on this bug?

Because we hit this in real life deployments:

1. volume has a specific os controller node as `os-vol-host-attr:host`
2. that os controller node gets maintenance
3. vm instance with attached volume gets deleted
3. nova throws: openstack.nova nova-compute c84d9828-1277-457a-828d-db7dc3c03216 [instance: d5832d72-0b70-422e-ba94-12b24f1a75e1] Ignoring unknown cinder exception for volume 615d4759-bed3-4a84-91a8-8fce612bfb2a: Gateway Time-out (HTTP 504): cinderclient.exceptions.ClientException: Gateway Time-out (HTTP 504)

the volume than still exists and still claims to be attached to a nonexistant vm.

we can of course clean this up manually.

afaiu there needs to be added some active/active cinder deployment with a coordinator service like pacemaker or etcd to kolla-ansible?

would it be possible to mimic somehow what tripleo does? as far as I understand they have implemented active active cinder deployment with etcd coordinator?

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-06-22: Fix proposed to kolla-ansible (stable/yoga)

#13

Fix proposed to branch: stable/yoga
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/847151

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2022-10-10: Change abandoned on kolla-ansible (stable/yoga)

#14

Change abandoned by "Radosław Piliszek <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/847151
Reason: proper docs are the way to go

Tom Fifield (fifieldt) on 2023-04-05

tags:

added: docs

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2023-06-01:

#15

Change abandoned by "Michal Nasiadka <email address hidden>" on branch: stable/yoga
Review: https://review.opendev.org/c/openstack/kolla-ansible/+/847151

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.