msg: 'Failed to get information on remote file (/home/stack/config-download/overcloud/octavia-ansible/group_vars/octavia_vars.yaml): Permission denied'

Bug #1881420 reported by Vasileios Baousis
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Undecided
Brendan Shephard

Bug Description

Description
===========
The problem that exists for the ceph-ansible deployment with ceph external cluster (see https://bugs.launchpad.net/tripleo/+bug/1880579) exists also for the ceph-octavia deployment. There are permission issues with the ansible user which prevent the w/r access to the /home/stack/config-download/overcloud/octavia-ansible/

Steps to reproduce
==================
Deploy overcloud with Octavia.

Environment
==================
(undercloud) [stack@under-ussuri01 ~]$ rpm -aq | grep octavia
python3-octaviaclient-2.0.1-0.20200429054432.1783650.el8.noarch
puppet-octavia-16.3.1-0.20200518065149.35e4432.el8.noarch

(undercloud) [stack@under-ussuri01 ~]$ rpm -aq | grep ansible
ansible-role-openstack-operations-0.0.1-0.20200507053741.274739e.el8.noarch
python3-ansible-runner-1.4.5-1.1.el8.noarch
ansible-role-chrony-1.0.2-0.20200507053030.03e7fbe.el8.noarch
ansible-role-atos-hsm-0.1.1-0.20200526161950.e51c244.el8.noarch
ansible-2.9.7-1.el8.noarch
ansible-tripleo-ipa-0.2.1-0.20200521150732.79862dd.el8.noarch
ceph-ansible-4.0.19-1.el8.noarch
python3-heat-agent-ansible-2.0.1-0.20200526185440.b639e78.el8.noarch
ansible-role-thales-hsm-0.2.1-0.20200526163944.99b3d39.el8.noarch
ansible-role-tripleo-modify-image-1.2.0-0.20200521172644.bb6f78d.el8.noarch
ansible-tripleo-ipsec-9.3.0-0.20200521172422.0c8693c.el8.noarch
ansible-pacemaker-1.0.4-0.20200526160932.5847167.el8.noarch
tripleo-ansible-1.4.1-0.20200526191928.af95b95.el8.noarch
ansible-config_template-1.1.1-0.20200526122433.8e18f42.el8.noarch
ansible-role-container-registry-1.2.0-0.20200521173118.7eca2dd.el8.noarch
ansible-freeipa-0.1.8-2.el8.noarch

(undercloud) [stack@under-ussuri01 ~]$ cat/home/stack/templates/environments/services/octavia.yaml
resource_registry:
  OS::TripleO::Services::OctaviaApi: ../../deployment/octavia/octavia-api-container-puppet.yaml
  OS::TripleO::Services::OctaviaHousekeeping: ../../deployment/octavia/octavia-housekeeping-container-puppet.yaml
  OS::TripleO::Services::OctaviaHealthManager: ../../deployment/octavia/octavia-health-manager-container-puppet.yaml
  OS::TripleO::Services::OctaviaWorker: ../../deployment/octavia/octavia-worker-container-puppet.yaml
  OS::TripleO::Services::OctaviaDeploymentConfig: ../../deployment/octavia/octavia-deployment-config.yaml

parameter_defaults:
    NeutronEnableForceMetadata: true

    # This flag enables internal generation of certificates for communication
    # with amphorae. Use OctaviaCaCert, OctaviaCaKey, OctaviaCaKeyPassphrase,
    # OctaviaClient and OctaviaServerCertsKeyPassphrase cert to configure
    # secure production environments.
    OctaviaGenerateCerts: true
    NeutronEnableForceMetadata: true
    OctaviaCaCert: |
      -----BEGIN CERTIFICATE-----
       REMOVED
      -----END CERTIFICATE-----

    OctraviaCaKey: |
      -----BEGIN RSA PRIVATE KEY-----
     REMOVED
      -----END RSA PRIVATE KEY-----

    OctaviaClientCert: |
      -----BEGIN RSA PRIVATE KEY-----
   REMOVED
   -----END RSA PRIVATE KEY-----
      -----BEGIN CERTIFICATE-----
       REMOVED
      -----END CERTIFICATE-----

    OctaviaCaKeyPassphrase: ******
    # This flag enables internal generation of certificates for communication
    # with amphorae. Use OctaviaCaCert, OctaviaCaKey, OctaviaCaKeyPassphrase,
    # OctaviaClient and OctaviaServerCertsKeyPassphrase cert to configure
    # secure production environments.
    OctaviaGenerateCerts: true

Logs
==================

TASK [Make needed directories on the undercloud] *******************************
Saturday 30 May 2020 16:22:22 +0000 (0:00:00.092) 0:45:18.096 **********
changed: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible)
changed: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible/local_dir)
changed: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible/group_vars)

TASK [Write group_vars file] ***************************************************
Saturday 30 May 2020 16:22:22 +0000 (0:00:00.652) 0:45:18.749 **********
fatal: [undercloud]: FAILED! =>
  msg: 'Failed to get information on remote file (/home/stack/config-download/overcloud/octavia-ansible/group_vars/octavia_vars.yaml): Permission denied'

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
overcloud-controller-0 : ok=1043 changed=244 unreachable=0 failed=0 skipped=713 rescued=0 ignored=0
overcloud-controller-1 : ok=1001 changed=235 unreachable=0 failed=0 skipped=697 rescued=0 ignored=0
overcloud-controller-2 : ok=1001 changed=235 unreachable=0 failed=0 skipped=697 rescued=0 ignored=0
overcloud-novacompute-0 : ok=459 changed=98 unreachable=0 failed=0 skipped=307 rescued=0 ignored=0
overcloud-novacompute-1 : ok=455 changed=98 unreachable=0 failed=0 skipped=307 rescued=0 ignored=0
overcloud-novacompute-2 : ok=455 changed=98 unreachable=0 failed=0 skipped=307 rescued=0 ignored=0
overcloud-novacompute-3 : ok=464 changed=178 unreachable=0 failed=0 skipped=277 rescued=0 ignored=0
overcloud-novacompute-4 : ok=464 changed=178 unreachable=0 failed=0 skipped=277 rescued=0 ignored=0
undercloud : ok=80 changed=31 unreachable=0 failed=1 skipped=100 rescued=0 ignored=0
Saturday 30 May 2020 16:22:23 +0000 (0:00:00.264) 0:45:19.014 **********
===============================================================================
Pre-fetch all the containers ------------------------------------------- 63.74s
tripleo_container_manage : Check podman create status ------------------ 36.47s
tripleo_container_image_prepare : Run tripleo_container_image_prepare logged to: /var/log/tripleo-container-image-prepare.log -- 28.41s
tripleo_container_manage : Check podman create status ------------------ 26.88s
tripleo_container_manage : Check podman create status ------------------ 26.87s
tripleo_container_manage : Check podman create status ------------------ 26.83s
Run NetworkConfig script ----------------------------------------------- 26.42s
tripleo_container_manage : Check podman create status ------------------ 21.76s
tripleo_container_manage : Check podman create status ------------------ 21.71s
tripleo_container_manage : Create systemd services files --------------- 21.46s
tripleo_firewall : Manage firewall rules ------------------------------- 21.26s
Write kolla config json files ------------------------------------------ 20.98s
tripleo_container_manage : Check podman create status ------------------ 20.94s
tripleo_container_manage : Start or restart systemd services ----------- 19.76s
Creating container startup configs for step_4 -------------------------- 17.68s
Pre-fetch all the containers ------------------------------------------- 16.94s
tripleo_container_manage : Check podman create status ------------------ 16.63s
tripleo_container_manage : Check podman create status ------------------ 16.62s
tripleo_container_manage : Check podman create status ------------------ 16.56s
tripleo_container_manage : Check podman create status ------------------ 16.55s

Revision history for this message
Vasileios Baousis (bbaous) wrote :

Openstack version : ussuri
OS : CentOS 8

tags: added: ansible octavia
Revision history for this message
Brendan Shephard (bshephar) wrote :

This first task works to create all of the directories because we're using become: True
https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia/octavia-deployment-config.j2.yaml#L281-L290

TASK [Make needed directories on the undercloud] *******************************
Saturday 06 June 2020 21:15:50 +1000 (0:00:00.381) 0:51:53.073 *********
ok: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible)
ok: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible/local_dir)
ok: [undercloud] => (item=/home/stack/config-download/overcloud/octavia-ansible/group_vars)

But then we fail at the next play because tripleo-admin doesn't have permissions for /home/stack:
TASK [Write group_vars file] ***************************************************
Saturday 06 June 2020 21:15:52 +1000 (0:00:01.665) 0:51:54.739 *********
fatal: [undercloud]: FAILED! =>
  msg: 'Failed to get information on remote file (/home/stack/config-download/overcloud/octavia-ansible/group_vars/octavia_vars.yaml): Permission denied'

I assume this is an issue now because we were previously using /var/lib/mistral. But since the move to tripleo-ansible, using the home directory has broken it. Additionally, looking at tripleo-ci, we're not picking this up because it's only running this test against a standalone deployment, it also appears to be skipping tasks around this time. So there must be something different with the directories and what is required in a standalone deployment.

bc764e20-0ff8-80b0-1cbd-00000000016d | SKIPPED | Make needed directories on the undercloud | undercloud
 bc764e20-0ff8-80b0-1cbd-00000000016e | TASK | Write group_vars file
bc764e20-0ff8-80b0-1cbd-00000000016d | TIMING | Make needed directories on the undercloud | 0:11:49.657 | 0.10s
bc764e20-0ff8-80b0-1cbd-00000000016e | SKIPPED | Write group_vars file | undercloud

I have already tried adding become: true to that entire block, but I still haven't got a successful deployment just yet.

Revision history for this message
Brendan Shephard (bshephar) wrote :

After adding a become: true to the block here: https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia/octavia-deployment-config.j2.yaml#L277

I get further,

    PLAY [octavia_nodes[0]] ********************************************************

    TASK [Gathering Facts] *********************************************************
    fatal: [overcloud-controller-0]: UNREACHABLE! => {"changed": false, "msg": "Failed to connect to the host via ssh: tripleo-admin@192.168.24.21: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).", "unreachable": true}

    PLAY RECAP *********************************************************************
    overcloud-controller-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
    undercloud-0 : ok=7 changed=3 unreachable=0 failed=0 skipped=6 rescued=0 ignored=0
  stdout_lines: <omitted>

Looks like it's probably missing something from here now:
https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/octavia/octavia-deployment-config.j2.yaml#L334-L342

Revision history for this message
Brendan Shephard (bshephar) wrote :

So it's trying to run this as the playbook command:
ANSIBLE_CONFIG="/home/stack/config-download/overcloud/ansible.cfg" ansible-playbook -i "/home/stack/config-download/overcloud/octavia-ansible/inventory.yaml" --extra-vars @/home/stack/config-download/overcloud/octavia-ansible/grou
p_vars/octavia_vars.yaml /usr/share/ansible/tripleo-playbooks/octavia-files.yaml

If I run that manually, it's failing to verify the SSL cert for the overcloud:
TASK [octavia_undercloud : upload pub key to overcloud] *************************************************************************************************************************************************************************************
fatal: [undercloud-0]: FAILED! => {"changed": true, "cmd": "openstack keypair show octavia-ssh-key || openstack keypair create --public-key /tmp/ansible.tj8091u6 octavia-ssh-key", "delta": "0:00:05.703551", "end": "2020-06-07 09:50:28.455648", "msg": "non-zero return code", "rc": 1, "start": "2020-06-07 09:50:22.752097", "stderr": "Failed to discover available identity versions when contacting https://172.20.10.25:13000/v3. Attempting to parse version from URL.\nSSL exception connecting to https://172.20.10.25:13000/v3/auth/tokens: HTTPSConnectionPool(host='172.20.10.25', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))\nFailed to discover available identity versions when contacting https://172.20.10.25:13000/v3. Attempting to parse version from URL.\nSSL exception connecting to https://172.20.10.25:13000/v3/auth/tokens: HTTPSConnectionPool(host='172.20.10.25', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))", "stderr_lines": ["Failed to discover available identity versions when contacting https://172.20.10.25:13000/v3. Attempting to parse version from URL.", "SSL exception connecting to https://172.20.10.25:13000/v3/auth/tokens: HTTPSConnectionPool(host='172.20.10.25', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))", "Failed to discover available identity versions when contacting https://172.20.10.25:13000/v3. Attempting to parse version from URL.", "SSL exception connecting to https://172.20.10.25:13000/v3/auth/tokens: HTTPSConnectionPool(host='172.20.10.25', port=13000): Max retries exceeded with url: /v3/auth/tokens (Caused by SSLError(SSLError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:897)'),))"], "stdout": "", "stdout_lines": []}

Revision history for this message
Brendan Shephard (bshephar) wrote :

So that last issue is because we don't pass OS_CACERT in the "upload pub key to overcloud" task:

- name: upload pub key to overcloud
  shell: |-
    openstack keypair show {{ amp_ssh_key_name }} || \
      openstack keypair create --public-key {{ amp_ssh_key_path_final }} {{ amp_ssh_key_name }}
  environment:
    OS_USERNAME: "{{ auth_username }}"
    OS_PASSWORD: "{{ auth_password }}"
    OS_PROJECT_NAME: "{{ auth_project_name }}"

Adding it in like this makes the playbook work:
- name: upload pub key to overcloud
  shell: |-
    openstack keypair show {{ amp_ssh_key_name }} || \
      openstack keypair create --public-key {{ amp_ssh_key_path_final }} {{ amp_ssh_key_name }}
  environment:
    OS_USERNAME: "{{ auth_username }}"
    OS_PASSWORD: "{{ auth_password }}"
    OS_PROJECT_NAME: "{{ auth_project_name }}"
    OS_CACERT: "/home/stack/certs/overcloud-cacert.pem"

We might make that one a separate bug and work out what logic is required to add in the OS_CACERT env variable.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733974

Changed in tripleo:
assignee: nobody → Brendan Shephard (bshephar)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/733975

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Brendan Shephard (<email address hidden>) on branch: master
Review: https://review.opendev.org/733974
Reason: This patch seems to have missed some of the newer changes to the file. Abandoning this one and submitting a clean one.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/733975
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=96e26ef81280d108d5c26ede1fd6b5a908ecd27e
Submitter: Zuul
Branch: master

commit 96e26ef81280d108d5c26ede1fd6b5a908ecd27e
Author: Brendan <email address hidden>
Date: Sun Jun 7 10:42:52 2020 +1000

    Elevated privs are required to access files in home directory

    Now that the config-download playbooks are located in the home
    directory. We (tripleo-admin) needs elevated privs on the tasks
    that require access to those file. This patch adds the required
    become: true statements to those tasks.

    Additionally, the current playbook looks for a specific ssh key,
    if that key doesn't exist, it fails to ssh to the overcloud nodes.
    To fix this, we can reference the ENV variable we're already
    setting during the config-download deployment.

    Change-Id: Ia97a5b0054bed697fc0390674fe4dba8317386a1
    Closes-Bug: #1881420

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.