deploy w/ external ceph and local ganesha fails while enabling ceph-admin user

Bug #1986988 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
John Fulton

Bug Description

In wallaby while deploying with the folloiwng as per the doc [1]

-e /usr/share/openstack-tripleo-heat-templates/environments/external-ceph.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsganesha-config.yaml \

The overcloud deployment fails during config-download step 2:

2022-08-18 16:36:29.317298 | 5254002e-fef3-4802-b4dc-00000000848e | OK | Notify user about upcoming cephadm execution(s) | undercloud | result={
    "changed": false,
    "msg": "Running 1 cephadm playbook(s) (immediate log at /home/stack/overcloud-deploy/overcloud/config-download/overcloud/cephadm/cephadm_command.log)"
}
2022-08-18 16:36:30.564099 | 5254002e-fef3-4802-b4dc-000000008490 | FATAL | search triple_run_cephadm_output of cephadm run(s) non-zero return codes | undercloud | error={"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}

Inspection the cephadm_command.log [2] shows:

2022-08-18 16:36:30,391 p=104780 u=stack n=ansible | [WARNING]: Could not match supplied host pattern, ignoring: ceph_mon

The command which failed [3] shows that the following playbook ran:

https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/playbooks/cephadm.yml#L17

This playbook uses the following hosts:

 hosts: ceph_mon[0]

No hosts were matched because there is no ceph_mon in the ansible inventory because this is for an external ceph monitor.

This playbook execution shouldn't be necessary. A problem in the logic as resulted in internal ceph deployment playbooks running in an external ceph deployment scenario.

[1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/ceph_external.html#deployment-of-an-overcloud-with-external-ceph

[2] https://paste.opendev.org/show/b5hIGJ5FfuRgYtmmmDDC/
[3] https://paste.opendev.org/show/bCv5bRGQ7hr5LyvmX6fL/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
John Fulton (jfulton-org) wrote :

The THT patch [1] rendered config-download ansible [2] which avoids the bootstrap code.
However, the deployment then fails during client configuration [3] when running podman commands [4].

[1] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/853699
[2] https://paste.opendev.org/show/bclgJhgfBMS6pd4tOfNH/
[3] https://paste.opendev.org/show/bB2djLtxQ45IQ9tZl6VF/
[4] https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_cephadm/tasks/nfs.yaml#L36-L42

Revision history for this message
Giulio Fidente (gfidente) wrote :

I think the problem is that commands at [1] and [2] are assuming tripleo_cephadm_ceph_cli var to work but when Ceph is external we don't have the cephadm cli container

The ganesha deployment (up to wallaby) should always be tripleo managed not cephadm managed, regardless of if the Ceph cluster is tripleo deployed or external ... hence I think we want to fix the two commands to launch "rados" within the ganesha container instead, and also pass "-n client.manila" similarily to how ceph-ansible was doing it [3]

1. https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_cephadm/tasks/nfs.yaml#L28
2. https://github.com/openstack/tripleo-ansible/blob/stable/wallaby/tripleo_ansible/roles/tripleo_cephadm/tasks/nfs.yaml#L38
3. https://github.com/ceph/ceph-ansible/blob/main/roles/ceph-nfs/tasks/start_nfs.yml#L5

Revision history for this message
Francesco Pantano (fmount) wrote :

Hi,
we don't use cephadm shell -- commands in tripleo-ansible, but podman is run against Ceph to build the right cmd [0].
I have to investigate further this bug, but looking at the error in [1]:

```
2022-08-18 20:47:10.018126 | 5254002e-fef3-d891-1c37-00000000851c | FATAL | create an empty rados index object | undercloud | error={"changed": true, "cmd": ["podman", "run", "--rm", "--net=host", "--ipc=host", "--volume", "/etc/ceph:/etc/ceph:z", "--volume", "/home/ceph-admin/assimilate_ceph.conf:/home/assimilate_ceph.conf:z", "--entrypoint", "rados", "undercloud-0.ctlplane.redhat.local:8787/rh-osbs/rhceph:5-270", "--fsid", "cdf24729-bd06-46c2-ba47-7018ae220197", "
-c", "/etc/ceph/ceph.conf", "-k", "/etc/ceph/ceph.client.manila.keyring", "-n", "client.manila", "-p", "cephfs.manila.data", "--cluster", "ceph", "put", "ganesha-export-index", "/dev/null"], "delta": "0:00:00.083149", "end": "2022-08-18 20:47:09.996850", "msg": "non-zero return code", "rc": 125, "start": "2022-08-18 20:47:09.913701", "stderr": "Error: statfs /etc/ceph: no such file or directory", "stderr_lines": ["Error: statfs /etc/ceph: no such file or directory"], "st
dout": "", "stdout_lines": []}
```
looks like the client role (which is supposed to be run before the nfs.yaml set of tasks) has not created the ceph.conf which required by the command and mounted at podman level.

I'm going to update the bug as long as I have more info.

[0] https://github.com/openstack/tripleo-ansible/blob/master/tripleo_ansible/roles/tripleo_cephadm/tasks/ceph_cli.yaml
[1] https://paste.opendev.org/show/bB2djLtxQ45IQ9tZl6VF/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ansible (master)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/853791
Committed: https://opendev.org/openstack/tripleo-ansible/commit/96afe1302ca0959f80558bf0c135a9978bc65b0b
Submitter: "Zuul (22348)"
Branch: master

commit 96afe1302ca0959f80558bf0c135a9978bc65b0b
Author: Francesco Pantano <email address hidden>
Date: Fri Aug 19 08:19:45 2022 +0200

    Add missing keyrings when ganesha is deployed standalone

    When Ceph is external but ganesha is deployed by TripleO the Ceph cli
    should be created accordingly and the keyring should be rendered in the
    expected location. This patch adds the missing steps to copy and render
    the manila keyring in the right locations and fixes the cli generation
    to properly handle this use case.

    Related-Bug: #1986988
    Change-Id: I2651c2850debd8110da93df2adc5fd8768a00db0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ansible (stable/wallaby)

Related fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/854021

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (stable/wallaby)

Fix proposed to branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/854022

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ansible (stable/wallaby)

Change abandoned by "Goutham Pacha Ravi <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-ansible/+/854021
Reason: Upset zuul with this cherry-pick; will restore when the THT change merges on master

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (stable/wallaby)

Change abandoned by "Goutham Pacha Ravi <email address hidden>" on branch: stable/wallaby
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/854022
Reason: Abandoning temporarily to allow the master change to merge

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/853699
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/e031058520ea36feff7e1956ae0d146ef5b19817
Submitter: "Zuul (22348)"
Branch: master

commit e031058520ea36feff7e1956ae0d146ef5b19817
Author: John Fulton <email address hidden>
Date: Thu Aug 18 16:09:07 2022 -0400

    Introduce ExternalCeph boolean

    Add explicit parameter for when external ceph is used.
    This parameter defaults to false but is true if the
    deployment uses -e environments/external-ceph.yaml.

    When external ceph is used with Ganesha, the ceph_mon
    group is empty but the ceph_nfs group is not. When
    internal ceph is used with Ganesha, both the ceph_mon
    and ceph_nfs groups are non-empty. Rather than solve
    the related bug by adding another condition based on
    these groups which is compatible existing logic, it's
    safer to have an explicit parameter for when external
    ceph is used.

    Change-Id: Id3e397d81dbca9a48d0456588784bcc20737093f
    Depends-On: I2651c2850debd8110da93df2adc5fd8768a00db0
    Closes-Bug: #1986988

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ansible (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ansible/+/854021
Committed: https://opendev.org/openstack/tripleo-ansible/commit/a335aeb940b5ea0c55487ed05984f7e64bb438d2
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit a335aeb940b5ea0c55487ed05984f7e64bb438d2
Author: Francesco Pantano <email address hidden>
Date: Fri Aug 19 08:19:45 2022 +0200

    Add missing keyrings when ganesha is deployed standalone

    When Ceph is external but ganesha is deployed by TripleO the Ceph cli
    should be created accordingly and the keyring should be rendered in the
    expected location. This patch adds the missing steps to copy and render
    the manila keyring in the right locations and fixes the cli generation
    to properly handle this use case.

    Related-Bug: #1986988
    Change-Id: I2651c2850debd8110da93df2adc5fd8768a00db0
    (cherry picked from commit 96afe1302ca0959f80558bf0c135a9978bc65b0b)

tags: added: in-stable-wallaby
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (stable/wallaby)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/854022
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/a08f4b1d297c915f4b4c7e5286af0e7a2177c99f
Submitter: "Zuul (22348)"
Branch: stable/wallaby

commit a08f4b1d297c915f4b4c7e5286af0e7a2177c99f
Author: John Fulton <email address hidden>
Date: Thu Aug 18 16:09:07 2022 -0400

    Introduce ExternalCeph boolean

    Add explicit parameter for when external ceph is used.
    This parameter defaults to false but is true if the
    deployment uses -e environments/external-ceph.yaml.

    When external ceph is used with Ganesha, the ceph_mon
    group is empty but the ceph_nfs group is not. When
    internal ceph is used with Ganesha, both the ceph_mon
    and ceph_nfs groups are non-empty. Rather than solve
    the related bug by adding another condition based on
    these groups which is compatible existing logic, it's
    safer to have an explicit parameter for when external
    ceph is used.

    Change-Id: Id3e397d81dbca9a48d0456588784bcc20737093f
    Depends-On: I2651c2850debd8110da93df2adc5fd8768a00db0
    Closes-Bug: #1986988
    (cherry picked from commit e031058520ea36feff7e1956ae0d146ef5b19817)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-heat-templates 17.0.0

This issue was fixed in the openstack/tripleo-heat-templates 17.0.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.