ceph-rgw installation/configuration lacks idempotency, configuration changes break the setup

Bug #1802195 reported by Florian Haas
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack-Ansible
Fix Released
Medium
Unassigned

Bug Description

Deploying a Ceph radosgw with the ceph-rgw-install.yml playbook creates unexpected configuration changes.

Steps to reproduce:

(1) Create a user_variables.yml configuration that includes the following items:

ceph_conf_overrides_rgw:
  "client.rgw.{{ ansible_hostname }}":
    host: "{{ ansible_hostname }}"
    rgw keystone accepted roles: "Member, member, _member_, admin, swiftoperator"
    rgw keystone admin domain: default
    rgw keystone admin password: "{{ radosgw_admin_password }}"
    rgw keystone admin project: "{{ radosgw_admin_tenant }}"
    rgw keystone admin tenant: "{{ radosgw_admin_tenant }}"
    rgw keystone admin user: "{{ radosgw_admin_user }}"
    rgw keystone api version: 3
    rgw keystone revocation interval: 900
    rgw keystone token cache size: 10000
    rgw keystone url: "{{ keystone_service_adminuri }}"
    rgw swift account in url: "true"
    rgw enable apis: swift
    rgw swift url prefix: ""
    rgw swift versioning enabled: true

(2) Run setup-everything.yml.

(3) Shell into (or attach to) one of the rgw containers.

You'll see a ceph.conf similar to this:

[client.rgw.daisy-ceph-rgw-container-44ba9503]
host = daisy-ceph-rgw-container-44ba9503
keyring = /var/lib/ceph/radosgw/ceph-rgw.daisy-ceph-rgw-container-44ba9503/keyring
log file = /var/log/ceph/ceph-rgw-daisy-ceph-rgw-container-44ba9503.log
rgw frontends = civetweb port=192.168.122.219:8080 num_threads=100

[client.rgw.eric-ceph-rgw-container-ceae85c1]
host = eric-ceph-rgw-container-ceae85c1
keyring = /var/lib/ceph/radosgw/ceph-rgw.eric-ceph-rgw-container-ceae85c1/keyring
log file = /var/log/ceph/ceph-rgw-eric-ceph-rgw-container-ceae85c1.log
rgw frontends = civetweb port=192.168.122.191:8080 num_threads=100

[client.rgw.frank-ceph-rgw-container-9e419c91]
host = frank-ceph-rgw-container-9e419c91
keyring = /var/lib/ceph/radosgw/ceph-rgw.frank-ceph-rgw-container-9e419c91/keyring
log file = /var/log/ceph/ceph-rgw-frank-ceph-rgw-container-9e419c91.log
rgw frontends = civetweb port=192.168.122.190:8080 num_threads=100

[client.rgw.daisy-ceph-rgw-container-44ba9503]
host = daisy-ceph-rgw-container-44ba9503
rgw enable apis = swift
rgw keystone accepted roles = Member, member, _member_, admin, swiftoperator
rgw keystone admin domain = default
rgw keystone admin password = <password>
rgw keystone admin project = service
rgw keystone admin tenant = service
rgw keystone admin user = radosgw
rgw keystone api version = 3
rgw keystone revocation interval = 900
rgw keystone token cache size = 10000
rgw keystone url = http://192.168.122.101:5000
rgw swift account in url = true
rgw swift url prefix =
rgw swift versioning enabled = True

(4) Change your user_variables.yml:

ceph_conf_overrides_rgw:
  "client.rgw.{{ ansible_hostname }}":
    host: "{{ ansible_hostname }}"
    rgw keystone accepted roles: "Member, member, _member_, admin, swiftoperator"
    rgw keystone admin domain: default
    rgw keystone admin password: "{{ radosgw_admin_password }}"
    rgw keystone admin project: "{{ radosgw_admin_tenant }}"
    rgw keystone admin tenant: "{{ radosgw_admin_tenant }}"
    rgw keystone admin user: "{{ radosgw_admin_user }}"
    rgw keystone api version: 3
    rgw keystone revocation interval: 900
    rgw keystone token cache size: 10000
    rgw keystone url: "{{ keystone_service_adminuri }}"
    rgw swift account in url: "true"
    rgw enable apis: swift
    rgw swift url prefix: "/"
    rgw swift versioning enabled: true

(The only change is "rgw swift url prefix" going from "" to "/".)

(5) Run the ceph-rgw-install.yml playbook.

(6) Again, look into the generated ceph.conf in one of the rgw containers:

[client.rgw.]
host =
keyring = /var/lib/ceph/radosgw/ceph-rgw./keyring
log file = /var/log/ceph/ceph-rgw-frank-ceph-rgw-container-9e419c91.log
log file = /var/log/ceph/ceph-rgw-eric-ceph-rgw-container-ceae85c1.log
log file = /var/log/ceph/ceph-rgw-daisy-ceph-rgw-container-44ba9503.log
rgw frontends = civetweb port=192.168.122.190:8080 num_threads=100
rgw frontends = civetweb port=192.168.122.191:8080 num_threads=100
rgw frontends = civetweb port=192.168.122.219:8080 num_threads=100

[client.rgw.daisy-ceph-rgw-container-44ba9503]
host = daisy-ceph-rgw-container-44ba9503
rgw enable apis = swift
rgw keystone accepted roles = Member, member, _member_, admin, swiftoperator
rgw keystone admin domain = default
rgw keystone admin password = 306f87d512e8b36d316c96cc2c57303a3872f9417ea152cdd1
rgw keystone admin project = service
rgw keystone admin tenant = service
rgw keystone admin user = radosgw
rgw keystone api version = 3
rgw keystone revocation interval = 900
rgw keystone token cache size = 10000
rgw keystone url = http://192.168.122.101:5000
rgw swift account in url = true
rgw swift url prefix = /
rgw swift versioning enabled = True

# Ansible managed
[global]
cluster network = 192.168.155.0/24
fsid = 8c4846f1-8834-48e0-9fda-2cdeeb8914b0
mon host = 192.168.155.202,192.168.155.216,192.168.155.151
osd pool default min size = 1
osd pool default size = 2
public network = 192.168.155.0/24

Observe the configuration differences:

- [client.rgw.] section (as if the hostname was empty)
- keyring option pointing to a non-existing key file
- log file options for all 3 rgw hosts squished together
- rgw frontends options now completely useless
- correctly created host-specific section now lacks the rgw frontends option, meaning it's now listening on port 7480, not 8080.
- all haproxy backends are now dead, because haproxy still expects rgw to listen on port 8080
- the radosgw service, and hence the cloud's Swift API endpoint, is now non-functional.

(7) Try rolling back the change, by restoring "rgw swift url prefix" to "" in user_variables.yml and rerunning ceph-rgw-install.yml.

(8) Observe that the only change this generates is in the host-specific section, whereas the useless [client.rgw.] section remains, and the service is still non-functional.

[client.rgw.]
host =
keyring = /var/lib/ceph/radosgw/ceph-rgw./keyring
log file = /var/log/ceph/ceph-rgw-frank-ceph-rgw-container-9e419c91.log
log file = /var/log/ceph/ceph-rgw-eric-ceph-rgw-container-ceae85c1.log
log file = /var/log/ceph/ceph-rgw-daisy-ceph-rgw-container-44ba9503.log
rgw frontends = civetweb port=192.168.122.190:8080 num_threads=100
rgw frontends = civetweb port=192.168.122.191:8080 num_threads=100
rgw frontends = civetweb port=192.168.122.219:8080 num_threads=100

[client.rgw.daisy-ceph-rgw-container-44ba9503]
host = daisy-ceph-rgw-container-44ba9503
rgw enable apis = swift
rgw keystone accepted roles = Member, member, _member_, admin, swiftoperator
rgw keystone admin domain = default
rgw keystone admin password = 306f87d512e8b36d316c96cc2c57303a3872f9417ea152cdd1
rgw keystone admin project = service
rgw keystone admin tenant = service
rgw keystone admin user = radosgw
rgw keystone api version = 3
rgw keystone revocation interval = 900
rgw keystone token cache size = 10000
rgw keystone url = http://192.168.122.101:5000
rgw swift account in url = true
rgw swift url prefix =
rgw swift versioning enabled = True

So, it seems that making any change to the ceph_config_overrides_rgw after the initial deployment is enough to render the service non-functional.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to openstack-ansible (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/616479

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to openstack-ansible (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/616481

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to openstack-ansible (master)

Reviewed: https://review.openstack.org/616479
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=40812a7e46145129628ce020f30e6b5ccec1a39d
Submitter: Zuul
Branch: master

commit 40812a7e46145129628ce020f30e6b5ccec1a39d
Author: Florian Haas <email address hidden>
Date: Thu Nov 8 10:32:55 2018 +0100

    Track a stable branch, not master, for ceph-ansible

    http://docs.ceph.com/ceph-ansible/master/ states: "The master branch
    should be considered experimental and used with caution." So that
    really can't be considered a solid basis for building a cloud on. Use
    a stable branch instead.

    At the time of this patch, the stable-3.2 branch in ceph-ansible is at
    the v3.2.0rc1 tag, so not currently considered fully ready for
    production deployments either. However, per discussion with odyssey4me
    this is OK for the OSA master branch at this stage in the cycle.

    Change-Id: Ibd8f64d9889009cf6e80b92254fd580ff559b1be
    Related-Bug: 1802195

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to openstack-ansible (stable/rocky)

Reviewed: https://review.openstack.org/616481
Committed: https://git.openstack.org/cgit/openstack/openstack-ansible/commit/?id=cc9e9996c3863767330d8b10bd955a1d48923fd4
Submitter: Zuul
Branch: stable/rocky

commit cc9e9996c3863767330d8b10bd955a1d48923fd4
Author: Florian Haas <email address hidden>
Date: Thu Nov 8 10:32:55 2018 +0100

    Reset ceph-ansible to the current HEAD of stable-3.1

    * Instead of pointing at v3.2.0beta1-130, use the current HEAD of
      stable-3.1.
    * Add a release note detailing specific considerations
      related to this change.
    * Bump openstack_release to 18.1.0.

    Change-Id: Ibd8f64d9889009cf6e80b92254fd580ff559b1be
    Closes-Bug: 1802195

tags: added: in-stable-rocky
Mohammed Naser (mnaser)
Changed in openstack-ansible:
status: New → Confirmed
status: Confirmed → Fix Released
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/openstack-ansible 18.1.0

This issue was fixed in the openstack/openstack-ansible 18.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.