Non root login prevented on overcloud machines

Bug #1873892 reported by Amol Kahat
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Critical
Amol Kahat

Bug Description

Description
===========

Non root login prevented on the overcloud machines
causes failure to run ansible playbook.

Actual Results
==============

2020-04-20 11:54:38 | 2020-04-20 11:54:34Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_IN_PROGRESS state changed
2020-04-20 11:55:21 | 2020-04-20 11:54:34Z [overcloud.AlWarning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts.
2020-04-20 11:55:21 | System is booting up. See pam_nologin(8)
2020-04-20 11:55:21 | Authentication failed.
2020-04-20 11:55:21 | Couldn't not import keys to one of [u'192.168.24.12', u'192.168.24.30', u'192.168.24.9']. Check if the user/ip are corrects.
2020-04-20 11:55:21 |
2020-04-20 11:55:22 | Waiting for messages on queue 'tripleo' with no timeout.
2020-04-20 12:00:24 | lNodesDeploySteps.ComputeExtraConfigPost]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:35Z [overcloud.AllNodesDeploySteps.ExternalDeployTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:35Z [overcloud.AllNodesDeploySteps.ExternalDeployTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ExternalUpgradeTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ExternalUpgradeTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:37Z [overcloud.AllNodesDeploySteps.ExternalUpdateTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:37Z [overcloud.AllNodesDeploySteps.ExternalUpdateTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:38Z [overcloud.AllNodesDeploySteps.ScaleTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:38Z [overcloud.AllNodesDeploySteps.ScaleTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:39Z [overcloud.AllNodesDeploySteps.BootstrapServerId]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:39Z [overcloud.AllNodesDeploySteps.BootstrapServerId]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps.ExternalPostDeployTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps.ExternalPostDeployTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:41Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Stack overcloud/de9dc7cd-648a-49e9-aa8e-0e36c29dc9a5 CREATE_COMPLETE
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Deploying overcloud configuration
2020-04-20 12:00:24 | Enabling ssh admin (tripleo-admin) for hosts:
2020-04-20 12:00:24 | 192.168.24.12 192.168.24.30 192.168.24.9
2020-04-20 12:00:24 | Using ssh user heat-admin for initial connection.
2020-04-20 12:00:24 | Using ssh key at /home/zuul/.ssh/id_rsa for initial connection.
2020-04-20 12:00:24 | Inserting TripleO short term key for 192.168.24.12
2020-04-20 12:00:24 | Removing short term keys locally
2020-04-20 12:00:24 | Config downloaded at /var/lib/mistral/overcloud
2020-04-20 12:00:24 | Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml
2020-04-20 12:00:24 | Running ansible playbook at /var/lib/mistral/overcloud/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ...
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Using /var/lib/mistral/overcloud/ansible.cfg as config file
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | PLAY [Gather facts from undercloud] ********************************************
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | PLAY [Gather facts from overcloud] *********************************************
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | TASK [Gathering Facts] *********************************************************
2020-04-20 12:00:24 | Monday 20 April 2020 11:56:27 +0000 (0:00:00.229) 0:00:00.229 **********
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | controller-0: Failed to connect to the host via ssh: Warning: Permanently added
2020-04-20 12:00:24 | '192.168.24.12' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 | novacompute-1: Failed to connect to the host via ssh: Warning: Permanently
2020-04-20 12:00:24 | added '192.168.24.9' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 | novacompute-0: Failed to connect to the host via ssh: Warning: Permanently
2020-04-20 12:00:24 | added '192.168.24.30' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | fatal: [overcloud-controller-0]: UNREACHABLE! => {
2020-04-20 12:00:24 |
2020-04-20 12:00:27 | "chOvercloud configuration failed.
2020-04-20 12:00:27 | anged": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.12". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | fatal: [overcloud-novacompute-0]: UNREACHABLE! => {
2020-04-20 12:00:27 | "changed": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.30". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.30' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | fatal: [overcloud-novacompute-1]: UNREACHABLE! => {
2020-04-20 12:00:27 | "changed": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.9". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.9' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | NO MORE HOSTS LEFT *************************************************************
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | PLAY RECAP *********************************************************************
2020-04-20 12:00:27 | overcloud-controller-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 | overcloud-novacompute-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 | overcloud-novacompute-1 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Monday 20 April 2020 12:00:22 +0000 (0:03:54.925) 0:03:55.154 **********
2020-04-20 12:00:27 | ===============================================================================

Build link: https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset021-train

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

seems undercloud wants to use "tripleo-admin" while it's unknown on the overcloud nodes:
Apr 22 00:52:15 overcloud-controller-0 sshd[9121]: input_userauth_request: invalid user tripleo-admin [preauth]
Apr 22 00:52:15 overcloud-controller-0 sshd[9121]: Connection closed by 192.168.24.1 port 39806 [preauth]

Fun fact, right after those failures, we have:
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: Accepted publickey for heat-admin from 192.168.24.1 port 41042 ssh2: RSA SHA256:fRwfBVrOBcn9OpHTu4Z1lhfJTfHtP0uw/12WLiTpmmM
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: pam_unix(sshd:session): session opened for user heat-admin by (uid=0)
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: pam_unix(sshd:session): session closed for user heat-admin

but on the undercloud, there's the timeout at 00:52 - guess the heat-admin connection is used by zuul|CI in order to get the logs and things.

Revision history for this message
Amol Kahat (amolkahat) wrote :

Undercloud trying to connect via tripleo-admin instead of heat-admin as per the logs (/var/log/secure)

Revision history for this message
Harald Jensås (harald-jensas) wrote :

2020-04-22 00:52:16 | Deploying overcloud configuration
2020-04-22 00:52:16 | Enabling ssh admin (tripleo-admin) for hosts:
2020-04-22 00:52:16 | 192.168.24.27 192.168.24.14 192.168.24.26
2020-04-22 00:52:16 | Using ssh user heat-admin for initial connection.
2020-04-22 00:52:16 | Using ssh key at /home/zuul/.ssh/id_rsa for initial connection.
2020-04-22 00:52:16 | Inserting TripleO short term key for 192.168.24.27
2020-04-22 00:52:16 | Removing short term keys locally
2020-04-22 00:52:16 | Config downloaded at /var/lib/mistral/overcloud
2020-04-22 00:52:16 | Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml
2020-04-22 00:52:16 | Running ansible playbook at /var/lib/mistral/overcloud/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ...

Why is it only "inserting short therm key for 192.168.24.27" ?

Don't we expect that to happen for all the hosts?
https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/workflows/deployment.py#L253-L263

There is a big try: -> finally: block that would mask errors?
https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/workflows/deployment.py#L234-L314

wes hayutin (weshayutin)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
assignee: nobody → amolkahat (amolkahat)
milestone: none → ussuri-rc3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/723343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/train
Review: https://review.opendev.org/723824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/723343
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=5627d8d9cce2ea8f803cccd46e9eeede5cae5e9a
Submitter: Zuul
Branch: stable/train

commit 5627d8d9cce2ea8f803cccd46e9eeede5cae5e9a
Author: Rabi Mishra <email address hidden>
Date: Mon Apr 27 14:12:14 2020 +0530

    [stable-only] Raise error for temp_ssh_key import failure

    We should not be ignoring the error as we would skip the
    admin-enablement workflow and running ansible playbooks would
    fail later.

    Change-Id: I69c79b3bfec4467210ca6c480f61e14cbf1d0a44
    Partial-Bug: #1873892

tags: added: in-stable-train
tags: added: promotion-blocker
tags: added: alert
Changed in tripleo:
importance: High → Critical
milestone: ussuri-rc3 → ussuri-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/723824
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=da9dc6eb25d4edb2f3b5d4929d429399187fd663
Submitter: Zuul
Branch: stable/train

commit da9dc6eb25d4edb2f3b5d4929d429399187fd663
Author: Rabi Mishra <email address hidden>
Date: Tue Apr 28 12:45:34 2020 +0530

    [stable-only] Add retry for inserting temp_ssh_key

    For slow nodes, we don't wait for node to boot completely after the
    heat stack is CREATE/UPDATE_COMPLETE. This would ensure we try a
    few times before failing.

    Adds tenacity to requirements and bumps lower-constraints.

    Change-Id: Iee8f3200a3c108375c7ca296734db1a51914cd69
    Closes-Bug: #1873892

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers