deployed_ceph cannot specify ceph cluster name which breaks DCN deployments

Bug #1966559 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
John Fulton

Bug Description

If I deploy a cluster using `openstack overcloud ceph deploy` and then deploy an overcloud which uses it where CephClusterName is set to "central", or anything but it's default ("ceph"), then the overcloud deployment fails with this:

000000000073 | FATAL | Assimilate configuration from tripleo_cephadm_assimilate_conf | oc0-controller-0 | error={"changed": false, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/etc/ceph:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_central.conf:/home/assimilate_central.conf:z", "--entrypoint", "ceph", "undercloud.ctlplane.mydomain.tld:8787/ceph/daemon:v6.0.7-stable-6.0-pacific-centos-stream8", "--fsid", "c7b1574d-40f6-5d6a-8a86-c387957696ed", "-c", "/etc/ceph/central.conf", "-k", "/etc/ceph/central.client.admin.keyring", "config", "assimilate-conf", "-i", "/home/assimilate_central.conf"], "delta": "0:00:00.448455", "end": "2022-03-26 18:14:40.294976", "msg": "non-zero return code", "rc": 1, "start": "2022-03-26 18:14:39.846521", "stderr": "Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)", "stderr_lines": ["Error initializing cluster client: ObjectNotFound('RADOS object not found (error calling conf_read_file)',)"], "stdout": "", "stdout_lines": []}

The above failure comes from the following command being run during overcloud deployment:

podman run --rm --net=host --ipc=host --volume /etc/ceph:/etc/ceph:z --volume /home/ceph-admin/assimilate_central.conf:/home/assimilate_central.conf:z --entrypoint ceph undercloud.ctlplane.mydomain.tld:8787/ceph/daemon:v6.0.7-stable-6.0-pacific-centos-stream8 --fsid c7b1574d-40f6-5d6a-8a86-c387957696ed -c /etc/ceph/central.conf -k /etc/ceph/central.client.admin.keyring config assimilate-conf -i /home/assimilate_central.conf

Because neither /etc/ceph/central.client.admin.keyring nor /etc/ceph/central.conf exist. If I change the name in the above command to:

podman run --rm --net=host --ipc=host --volume /etc/ceph:/etc/ceph:z --volume /home/ceph-admin/assimilate_central.conf:/home/assimilate_central.conf:z --entrypoint ceph undercloud.ctlplane.mydomain.tld:8787/ceph/daemon:v6.0.7-stable-6.0-pacific-centos-stream8 --fsid c7b1574d-40f6-5d6a-8a86-c387957696ed -c /etc/ceph/ceph.conf -k /etc/ceph/ceph.client.admin.keyring config assimilate-conf -i /home/assimilate_central.conf

Then it's fine. However, we need the ability to deploy differently named conf files and cephx keys when deploying DCN as we'll have multiple conf files and cephx keys on DCN nodes. Even though the FSID in the path keeps overwrites from happening, it will break the behavior of CephExternalMultiConfig.

To address this we should give the deployed ceph user a --cluster-name option which overrides the tripleo_cephadm_cluster variable in tripleo_ansible so that the naming convention can follow the needs of the CephExternalMultiConfig paramter.

https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/defaults/main.yml#L17-L19

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/835372

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-operator-ansible (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/835372
Committed: https://opendev.org/openstack/tripleo-ansible/commit/c4136fc8f5e3a50668a6107aee1e5fc3535a6781
Submitter: "Zuul (22348)"
Branch: master

commit c4136fc8f5e3a50668a6107aee1e5fc3535a6781
Author: John Fulton <email address hidden>
Date: Sat Mar 26 16:12:19 2022 -0400

    Pass CephClusterName in deployed Ceph template

    If the user overrides the tripleo_cephadm_cluster variable,
    then we should pass the same value through to the deployed
    Ceph template to ensure that it is consistent during overcloud
    deployment.

    Also, make ApplyCephConfigOverridesOnUpdate the last parameter
    in the template so that it is easier for users to see and follow
    the commented recommendation.

    Related-Bug: #1966559
    Change-Id: I4daab327c0bdac2bcc569cc1d8d629c648ddf292

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (master)

Reviewed: https://review.opendev.org/c/openstack/python-tripleoclient/+/835376
Committed: https://opendev.org/openstack/python-tripleoclient/commit/ebe72e7056ddebdc8304bb18b1c9ba9ac82f0a77
Submitter: "Zuul (22348)"
Branch: master

commit ebe72e7056ddebdc8304bb18b1c9ba9ac82f0a77
Author: John Fulton <email address hidden>
Date: Sat Mar 26 16:54:34 2022 -0400

    Allow user to override Ceph cluster name

    Add --cluster (default 'ceph') option to the
    'openstack overcloud ceph deploy' command.
    Whatever string is passed via this option will
    be used to override the tripleo_cephadm_cluster
    variable when the cli-deployed-ceph.yaml playboook
    is called.

    Closes-Bug: #1966559
    Depends-On: I4daab327c0bdac2bcc569cc1d8d629c648ddf292
    Change-Id: I07dfbd819e57f26cc4798b0d58ffacb3ba73fdb2

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/835507

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/python-tripleoclient/+/835508

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-docs (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-docs/+/835757

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-docs (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-docs/+/835757
Committed: https://opendev.org/openstack/tripleo-docs/commit/7791a6bb4c7efd41ac4e87add7653f7c46a2115f
Submitter: "Zuul (22348)"
Branch: master

commit 7791a6bb4c7efd41ac4e87add7653f7c46a2115f
Author: John Fulton <email address hidden>
Date: Tue Mar 29 15:59:54 2022 -0400

    Document the --cluster option for deployed ceph

    This includes documenting a helpful tip for those who use this
    option but then have difficulty using 'cephadm shell'.

    Also, fix typo in Ceph Spec Options.

    Change-Id: Ib2f6f059a0873c8160f804755ddfd481977a3e9d
    Related-Bug: #1966559

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/835507
Committed: https://opendev.org/openstack/tripleo-ansible/commit/665450045824726296d3c9b7a0b4e138231d42f9
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 665450045824726296d3c9b7a0b4e138231d42f9
Author: John Fulton <email address hidden>
Date: Sat Mar 26 16:12:19 2022 -0400

    Pass CephClusterName in deployed Ceph template

    If the user overrides the tripleo_cephadm_cluster variable,
    then we should pass the same value through to the deployed
    Ceph template to ensure that it is consistent during overcloud
    deployment.

    Also, make ApplyCephConfigOverridesOnUpdate the last parameter
    in the template so that it is easier for users to see and follow
    the commented recommendation.

    Related-Bug: #1966559
    Change-Id: I4daab327c0bdac2bcc569cc1d8d629c648ddf292
    (cherry picked from commit c4136fc8f5e3a50668a6107aee1e5fc3535a6781)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/python-tripleoclient/+/835508
Committed: https://opendev.org/openstack/python-tripleoclient/commit/34243123b9f54988960c0020012a7a5e5a908c48
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit 34243123b9f54988960c0020012a7a5e5a908c48
Author: John Fulton <email address hidden>
Date: Sat Mar 26 16:54:34 2022 -0400

    Allow user to override Ceph cluster name

    Add --cluster (default 'ceph') option to the
    'openstack overcloud ceph deploy' command.
    Whatever string is passed via this option will
    be used to override the tripleo_cephadm_cluster
    variable when the cli-deployed-ceph.yaml playboook
    is called.

    Conflicts:
    - tripleoclient/tests/v2/overcloud_ceph/test_overcloud_ceph.py
    - tripleoclient/v2/overcloud_ceph.py

    Closes-Bug: #1966559
    Depends-On: I4daab327c0bdac2bcc569cc1d8d629c648ddf292
    Change-Id: I07dfbd819e57f26cc4798b0d58ffacb3ba73fdb2
    (cherry picked from commit ebe72e7056ddebdc8304bb18b1c9ba9ac82f0a77)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-operator-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-operator-ansible/+/835377
Committed: https://opendev.org/openstack/tripleo-operator-ansible/commit/31394ed8f9f46c983aba7b708d71bf3abc24388f
Submitter: "Zuul (22348)"
Branch: master

commit 31394ed8f9f46c983aba7b708d71bf3abc24388f
Author: John Fulton <email address hidden>
Date: Sat Mar 26 17:29:22 2022 -0400

    Add --cluster to tripleo_ceph_deploy role

    Add tripleo_ceph_deploy_cluster variable so that
    'openstack overcloud ceph deploy --cluster $CLUSTER'
    may be called from tripleo_ceph_deploy role.

    Change-Id: I0d632e3bb2b3e54caa495fd1bcf0c504ee2b46a5
    Related-Bug: #1966559
    Depends-On: I07dfbd819e57f26cc4798b0d58ffacb3ba73fdb2

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 19.0.0

This issue was fixed in the openstack/python-tripleoclient 19.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.