Imposible to deploy manila-cephfsganesha-config due to a problem with ceph-nfs

Bug #1882554 reported by Juan Badia Payno
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Tom Barron

Bug Description

I deploy tripleo with tripleo-quickstart: 3Cont-2Comp-3Ceph

Firstly I deployed without manila-ganesha and with ceph. I needed to workaround through https://bugs.launchpad.net/bugs/1880579

The overcloud deploy include all these regarding ceph:

-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-ansible.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-mds.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/ceph-ansible/ceph-rgw.yaml \
-e /usr/share/openstack-tripleo-heat-templates/environments/manila-cephfsganesha-config.yaml \
-n /usr/share/openstack-tripleo-heat-templates/network_data_ganesha.yaml

The error that shows up is:

2020-06-08 12:25:29,217 p=698084 u=root n=ansible | TASK [ceph-nfs : set_fact container_exec_cmd_nfs] ******************************
2020-06-08 12:25:29,218 p=698084 u=root n=ansible | Monday 08 June 2020 12:25:29 +0000 (0:00:00.364) 0:09:47.108 ***********
2020-06-08 12:25:29,286 p=698084 u=root n=ansible | ok: [overcloud-controller-0] => changed=false
  ansible_facts:
    exec_cmd_nfs: podman run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=rados 192.168.24.1:8787/ceph/daemon:v4.0.10-stable-4.0-nautilus-centos-7-x86_64 -n client..rgw.overcloud-controller-0 -k /var/lib/ceph/radosgw/overcloud-controller-0
2020-06-08 12:25:29,337 p=698084 u=root n=ansible | ok: [overcloud-controller-1] => changed=false
  ansible_facts:
    exec_cmd_nfs: podman run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=rados 192.168.24.1:8787/ceph/daemon:v4.0.10-stable-4.0-nautilus-centos-7-x86_64 -n client..rgw.overcloud-controller-1 -k /var/lib/ceph/radosgw/overcloud-controller-1
2020-06-08 12:25:29,366 p=698084 u=root n=ansible | ok: [overcloud-controller-2] => changed=false
  ansible_facts:
    exec_cmd_nfs: podman run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=rados 192.168.24.1:8787/ceph/daemon:v4.0.10-stable-4.0-nautilus-centos-7-x86_64 -n client..rgw.overcloud-controller-2 -k /var/lib/ceph/radosgw/overcloud-controller-2
2020-06-08 12:25:29,407 p=698084 u=root n=ansible | TASK [ceph-nfs : check if rados index object exists] ***************************
2020-06-08 12:25:29,407 p=698084 u=root n=ansible | Monday 08 June 2020 12:25:29 +0000 (0:00:00.189) 0:09:47.298 ***********
2020-06-08 12:25:31,781 p=698084 u=root n=ansible | ok: [overcloud-controller-0] => changed=false
  cmd: podman run --rm --net=host -v /etc/ceph:/etc/ceph:z -v /var/lib/ceph/:/var/lib/ceph/:z -v /var/log/ceph/:/var/log/ceph/:z --entrypoint=rados 192.168.24.1:8787/ceph/daemon:v4.0.10-stable-4.0-nautilus-centos-7-x86_64 -n client..rgw.overcloud-controller-0 -k /var/lib/ceph/radosgw/overcloud-controller-0 -p manila_data --cluster ceph ls|grep ganesha-export-index
  delta: '0:00:01.941098'
  end: '2020-06-08 12:26:13.826030'
  failed_when_result: false
  msg: non-zero return code
  rc: 1
  start: '2020-06-08 12:26:11.884932'
  stderr: |- 2020-06-08 12:26:13.667 7f46f471a880 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/overcloud-controller-0: (2) No such file or directory
    2020-06-08 12:26:13.667 7f46f471a880 -1 AuthRegistry(0x55ba1902a7f8) no keyring found at /var/lib/ceph/radosgw/overcloud-controller-0, disabling cephx
    2020-06-08 12:26:13.668 7f46f471a880 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/overcloud-controller-0: (2) No such file or directory
    2020-06-08 12:26:13.668 7f46f471a880 -1 AuthRegistry(0x7ffd58c3dd98) no keyring found at /var/lib/ceph/radosgw/overcloud-controller-0, disabling cephx
    2020-06-08 12:26:13.673 7f46e503b700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    2020-06-08 12:26:13.679 7f46e4039700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    202:0-06-08 12:26:13.710 7f46e483a700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    2020-06-08 12:26:13.710 7f46f471a880 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
    failed to fetch mon config (--no-mon-config to skip)
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-06-08 12:25:31,829 p=698084 u=root n=ansible | TASK [ceph-nfs : create an empty rados index object] ***************************
2020-06-08 12:25:31,829 p=698084 u=root n=ansible | Monday 08 June 2020 12:25:31 +0000 (0:00:02.422) 0:09:49.720 ***********
2020-06-08 12:25:33,995 p=698084 u=root n=ansible | fatal: [overcloud-controller-0]: FAILED! => changed=true
  cmd:
  - podman
  - run
  - --rm
  - --net=host
  - -v
  - /etc/ceph:/etc/ceph:z
  - -v
  - /var/lib/ceph/:/var/lib/ceph/:z
  - -v
  - /var/log/ceph/:/var/log/ceph/:z
  - --entrypoint=rados
  - 192.168.24.1:8787/ceph/daemon:v4.0.10-stable-4.0-nautilus-centos-7-x86_64
  - -n
  - client..rgw.overcloud-controller-0
  - -k
  - /var/lib/ceph/radosgw/overcloud-controller-0
  - -p
  - manila_data
 - --cluster
  - ceph
  - put
  - ganesha-export-index
  - /dev/null
  delta: '0:00:01.724827'
  end: '2020-06-08 12:26:16.037111'
  msg: non-zero return code
  rc: 1
  start: '2020-06-08 12:26:14.312284'
  stderr: |-
    2020-06-08 12:26:15.852 7f549fdcb880 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/overcloud-controller-0: (2) No such file or directory
    2020-06-08 12:26:15.852 7f549fdcb880 -1 AuthRegistry(0x5632aef0c818) no keyring found at /var/lib/ceph/radosgw/overcloud-controller-0, disabling cephx
    2020-06-08 12:26:15.875 7f549fdcb880 -1 auth: unable to find a keyring on /var/lib/ceph/radosgw/overcloud-controller-0: (2) No such file or directory
    2020-06-08 12:26:15.875 7f549fdcb880 -1 AuthRegistry(0x7ffc33f18688) no keyring found at /var/lib/ceph/radosgw/overcloud-controller-0, disabling cephx
    2020-06-08 12:26:15.880 7f548f6ea700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    2020-06-08 12:26:15.884 7f54906ec700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    2020-06-08 12:26:15.884 7f548feeb700 -1 monclient(hunting): handle_auth_bad_method server allowed_methods [2] but i only support [1]
    2020-06-08 12:26:15.884 7f549fdcb880 -1 monclient: authenticate NOTE: no keyring found; disabled cephx authentication
    failed to fetch mon config (--no-mon-config to skip)
  stderr_lines: <omitted>
  stdout: ''
  stdout_lines: <omitted>
2020-06-08 12:25:33,995 p=698084 u=root n=ansible | NO MORE HOSTS LEFT *************************************************************
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | PLAY RECAP *********************************************************************
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-cephstorage-0 : ok=135 changed=6 unreachable=0 failed=0 skipped=215 rescued=0 ignored=0
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-cephstorage-1 : ok=126 changed=4 unreachable=0 failed=0 skipped=207 rescued=0 ignored=0
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-cephstorage-2 : ok=136 changed=8 unreachable=0 failed=0 skipped=207 rescued=0 ignored=0
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-controller-0 : ok=404 changed=19 unreachable=0 failed=1 skipped=438 rescued=0 ignored=0
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-controller-1 : ok=347 changed=8 unreachable=0 failed=0 skipped=387 rescued=0 ignored=0
2020-06-08 12:25:33,998 p=698084 u=root n=ansible | overcloud-controller-2 : ok=347 changed=8 unreachable=0 failed=0 skipped=387 rescued=0 ignored=0
2020-06-08 12:25:33,999 p=698084 u=root n=ansible | overcloud-novacompute-0 : ok=96 changed=5 unreachable=0 failed=0 skipped=222 rescued=0 ignored=0
2020-06-08 12:25:33,999 p=698084 u=root n=ansible | overcloud-novacompute-1 : ok=86 changed=3 unreachable=0 failed=0 skipped=214 rescued=0 ignored=0
2020-06-08 12:25:33,999 p=698084 u=root n=ansible | INSTALLER STATUS ***************************************************************
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph Monitor : Complete (0:01:01)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph Manager : Complete (0:00:55)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph OSD : Complete (0:01:51)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph MDS : Complete (0:01:21)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph RGW : Complete (0:00:48)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph NFS : In Progress (0:00:57)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | This phase can be restarted by running: roles/ceph-nfs/tasks/main.yml
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Install Ceph Client : Complete (0:00:43)
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | Monday 08 June 2020 12:25:34 +0000 (0:00:02.172) 0:09:51.893 ***********
2020-06-08 12:25:34,002 p=698084 u=root n=ansible | ===============================================================================
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | gather and delegate facts ---------------------------------------------- 17.04s
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | ceph-client : create cephx key(s) -------------------------------------- 14.41s
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | ceph-osd : generate keys ----------------------------------------------- 10.92s
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | ceph-osd : assign application to pool(s) -------------------------------- 8.76s
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | ceph-osd : list existing pool(s) ---------------------------------------- 8.41s
2020-06-08 12:25:34,004 p=698084 u=root n=ansible | ceph-config : create ceph initial directories --------------------------- 7.97s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-osd : copy ceph key(s) if needed ----------------------------------- 7.63s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-config : create ceph initial directories --------------------------- 7.33s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-osd : set pg_autoscale_mode value on pool(s) ----------------------- 7.26s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-mds : create filesystem pools -------------------------------------- 7.23s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-config : create ceph initial directories --------------------------- 7.13s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-config : create ceph initial directories --------------------------- 6.73s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-osd : customize pool size ------------------------------------------ 6.54s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-mds : set pg_autoscale_mode value on pool(s) ----------------------- 6.14s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-mon : check if monitor initial keyring already exists -------------- 6.09s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-config : create ceph initial directories --------------------------- 5.72s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-osd : systemd start osd -------------------------------------------- 5.66s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-mgr : create ceph mgr keyring(s) on a mon node --------------------- 5.43s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-osd : get keys from monitors --------------------------------------- 5.18s
2020-06-08 12:25:34,005 p=698084 u=root n=ansible | ceph-mgr : get keys from monitors --------------------------------------- 4.73s

Revision history for this message
Juan Badia Payno (jbadiapa) wrote :

Adding some clarification:

Firstly I deployed the 3Contr-2Comp-3Ceph nodes and the problem described above was when I was trying to add the manila-gansesha to the deployment.

Revision history for this message
Tom Barron (tpb) wrote :

The command run by ceph-ansible to determine if there's a rados object with the export index for ceph-nfs (ganesha) is failing.

Inspection of the file with the command, roles/ceph-nfs/tasks/start_nfs.yml, shows that it is at
commit ea2b654d951f0ddb4abed3d4e96d66458baf80f8. This commit introduced two issues with the rados index check, one fixed by cf460274c7489940968fed176c113ad473b22f4d and one by 8a890306ad870f0174f76c6445644d7f8db6396e.

We need to get these into the ceph-ansible package used here. It is currently:

(undercloud) [stack@undercloud ~]$ sudo yum list installed ceph-ansible
Installed Packages
ceph-ansible.noarch 4.0.19-1.el8 @quickstart-centos-ceph-nautilus

Here is the git log showing the three commits in question:

$ git log roles/ceph-nfs/tasks/start_nfs.yml
commit 8a890306ad870f0174f76c6445644d7f8db6396e
Author: Dimitri Savineau <email address hidden>
Date: Wed May 6 09:31:34 2020 -0400

    ceph-nfs: fix internal ganesha deployment

    Since ea2b654d9 we're not running the rados command from the monitor
    nodes but from the ganesha node. Unfortunately we don't have the
    required keyring on that node to run the rados command as we don't
    import the right keyring.
    This commit restores the workflow for internal ganesha deployment like
    before ea2b654d9 but keeps the rados commands from the ganesha node for
    external deployment until we have a better design.

    Signed-off-by: Dimitri Savineau <email address hidden>

commit cf460274c7489940968fed176c113ad473b22f4d
Author: Guillaume Abrioux <email address hidden>
Date: Thu Apr 30 16:21:14 2020 +0200

    nfs: fix 2 typo

    The condition is missing an index here which makes the playbook failing.

    Typical error:
    ```
    The conditional check 'not item.get('skipped', False)' failed. The error was: error while evaluating conditional (not item.get('skipped', False)): 'list object' has no attribute 'get'",
    ```

    Also, adds the missing '/keyring' on the `exec_cmd_nfs` fact.

    Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1831342

    Signed-off-by: Guillaume Abrioux <email address hidden>

commit ea2b654d951f0ddb4abed3d4e96d66458baf80f8
Author: Guillaume Abrioux <email address hidden>
Date: Fri Apr 10 11:05:25 2020 +0200

    nfs: create empty rados index object for nfs standalone

    This commit creates an empty rados index object even when deploying
    standalone nfs-ganesha.

    Closes: https://bugzilla.redhat.com/show_bug.cgi?id=1822328

    Signed-off-by: Guillaume Abrioux <email address hidden>

Revision history for this message
Tom Barron (tpb) wrote :

The rados command using an rgw key for ceph-nfs is curious, to say the least, and is inappropriate in any case for an integrated deployment (such as this one) where the ceph daemons run in OpenStack. Commit 8a890306ad870f0174f76c6445644d7f8db6396e restored the internal deployment form of the rados command to check for the export-index.

We need to get the CentOS Storage SIG to update the ceph-ansible package.

Revision history for this message
Juan Badia Payno (jbadiapa) wrote :

Workaround:
  Using the ceph-ansible/v4.0.23 instead of the packaged one fixed the issue.

(undercloud) [stack@undercloud ~]$ sudo rm -rf /usr/share/ceph-ansible
(undercloud) [stack@undercloud ~]$ git clone https://github.com/ceph/ceph-ansible.git
(undercloud) [stack@undercloud ~]$ git checkout v4.0.23
(undercloud) [stack@undercloud ~]$ sudo ln -s /home/stack/ceph-ansible/ /usr/share/ceph-ansible

Tom Barron (tpb)
Changed in tripleo:
status: New → In Progress
importance: Undecided → Medium
assignee: nobody → Tom Barron (tpb)
milestone: none → victoria-1
Revision history for this message
Tom Barron (tpb) wrote :

ceph-ansible 4.0.23 is now available in the quickstart-centos-ceph-nautilus repository so I m closing this one out.

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.