Non root login prevented on overcloud machines

Bug #1873892 reported by Amol Kahat
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Amol Kahat

Bug Description

Description
===========

Non root login prevented on the overcloud machines
causes failure to run ansible playbook.

Actual Results
==============

2020-04-20 11:54:38 | 2020-04-20 11:54:34Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_IN_PROGRESS state changed
2020-04-20 11:55:21 | 2020-04-20 11:54:34Z [overcloud.AlWarning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts.
2020-04-20 11:55:21 | System is booting up. See pam_nologin(8)
2020-04-20 11:55:21 | Authentication failed.
2020-04-20 11:55:21 | Couldn't not import keys to one of [u'192.168.24.12', u'192.168.24.30', u'192.168.24.9']. Check if the user/ip are corrects.
2020-04-20 11:55:21 |
2020-04-20 11:55:22 | Waiting for messages on queue 'tripleo' with no timeout.
2020-04-20 12:00:24 | lNodesDeploySteps.ComputeExtraConfigPost]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:35Z [overcloud.AllNodesDeploySteps.ExternalDeployTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:35Z [overcloud.AllNodesDeploySteps.ExternalDeployTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerExtraConfigPost]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ExternalUpgradeTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ExternalUpgradeTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ComputePostConfig]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:36Z [overcloud.AllNodesDeploySteps.ControllerPostConfig]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:37Z [overcloud.AllNodesDeploySteps.ExternalUpdateTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:37Z [overcloud.AllNodesDeploySteps.ExternalUpdateTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:38Z [overcloud.AllNodesDeploySteps.ScaleTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:38Z [overcloud.AllNodesDeploySteps.ScaleTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:39Z [overcloud.AllNodesDeploySteps.BootstrapServerId]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:39Z [overcloud.AllNodesDeploySteps.BootstrapServerId]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps.ExternalPostDeployTasks]: CREATE_IN_PROGRESS state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps.ExternalPostDeployTasks]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE Stack CREATE completed successfully
2020-04-20 12:00:24 | 2020-04-20 11:54:40Z [overcloud.AllNodesDeploySteps]: CREATE_COMPLETE state changed
2020-04-20 12:00:24 | 2020-04-20 11:54:41Z [overcloud]: CREATE_COMPLETE Stack CREATE completed successfully
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Stack overcloud/de9dc7cd-648a-49e9-aa8e-0e36c29dc9a5 CREATE_COMPLETE
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Deploying overcloud configuration
2020-04-20 12:00:24 | Enabling ssh admin (tripleo-admin) for hosts:
2020-04-20 12:00:24 | 192.168.24.12 192.168.24.30 192.168.24.9
2020-04-20 12:00:24 | Using ssh user heat-admin for initial connection.
2020-04-20 12:00:24 | Using ssh key at /home/zuul/.ssh/id_rsa for initial connection.
2020-04-20 12:00:24 | Inserting TripleO short term key for 192.168.24.12
2020-04-20 12:00:24 | Removing short term keys locally
2020-04-20 12:00:24 | Config downloaded at /var/lib/mistral/overcloud
2020-04-20 12:00:24 | Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml
2020-04-20 12:00:24 | Running ansible playbook at /var/lib/mistral/overcloud/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ...
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | Using /var/lib/mistral/overcloud/ansible.cfg as config file
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | PLAY [Gather facts from undercloud] ********************************************
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | PLAY [Gather facts from overcloud] *********************************************
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | TASK [Gathering Facts] *********************************************************
2020-04-20 12:00:24 | Monday 20 April 2020 11:56:27 +0000 (0:00:00.229) 0:00:00.229 **********
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 |
2020-04-20 12:00:24 | controller-0: Failed to connect to the host via ssh: Warning: Permanently added
2020-04-20 12:00:24 | '192.168.24.12' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 | novacompute-1: Failed to connect to the host via ssh: Warning: Permanently
2020-04-20 12:00:24 | added '192.168.24.9' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | [WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
2020-04-20 12:00:24 | novacompute-0: Failed to connect to the host via ssh: Warning: Permanently
2020-04-20 12:00:24 | added '192.168.24.30' (ECDSA) to the list of known hosts. Permission denied
2020-04-20 12:00:24 | (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:24 | fatal: [overcloud-controller-0]: UNREACHABLE! => {
2020-04-20 12:00:24 |
2020-04-20 12:00:27 | "chOvercloud configuration failed.
2020-04-20 12:00:27 | anged": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.12". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.12' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | fatal: [overcloud-novacompute-0]: UNREACHABLE! => {
2020-04-20 12:00:27 | "changed": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.30". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.30' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | fatal: [overcloud-novacompute-1]: UNREACHABLE! => {
2020-04-20 12:00:27 | "changed": false,
2020-04-20 12:00:27 | "unreachable": true
2020-04-20 12:00:27 | }
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | MSG:
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Data could not be sent to remote host "192.168.24.9". Make sure this host can be reached over ssh: Warning: Permanently added '192.168.24.9' (ECDSA) to the list of known hosts.
2020-04-20 12:00:27 | Permission denied (publickey,gssapi-keyex,gssapi-with-mic).
2020-04-20 12:00:27 |
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | NO MORE HOSTS LEFT *************************************************************
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | PLAY RECAP *********************************************************************
2020-04-20 12:00:27 | overcloud-controller-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 | overcloud-novacompute-0 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 | overcloud-novacompute-1 : ok=0 changed=0 unreachable=1 failed=0 skipped=0 rescued=0 ignored=0
2020-04-20 12:00:27 |
2020-04-20 12:00:27 | Monday 20 April 2020 12:00:22 +0000 (0:03:54.925) 0:03:55.154 **********
2020-04-20 12:00:27 | ===============================================================================

Build link: https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-1ctlr_2comp-featureset021-train

Revision history for this message
Cédric Jeanneret (cjeanner) wrote :

seems undercloud wants to use "tripleo-admin" while it's unknown on the overcloud nodes:
Apr 22 00:52:15 overcloud-controller-0 sshd[9121]: input_userauth_request: invalid user tripleo-admin [preauth]
Apr 22 00:52:15 overcloud-controller-0 sshd[9121]: Connection closed by 192.168.24.1 port 39806 [preauth]

Fun fact, right after those failures, we have:
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: Accepted publickey for heat-admin from 192.168.24.1 port 41042 ssh2: RSA SHA256:fRwfBVrOBcn9OpHTu4Z1lhfJTfHtP0uw/12WLiTpmmM
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: pam_unix(sshd:session): session opened for user heat-admin by (uid=0)
Apr 22 00:53:04 overcloud-controller-0 sshd[9124]: pam_unix(sshd:session): session closed for user heat-admin

but on the undercloud, there's the timeout at 00:52 - guess the heat-admin connection is used by zuul|CI in order to get the logs and things.

Revision history for this message
Amol Kahat (amolkahat) wrote :

Undercloud trying to connect via tripleo-admin instead of heat-admin as per the logs (/var/log/secure)

Revision history for this message
Harald Jensås (harald-jensas) wrote :

2020-04-22 00:52:16 | Deploying overcloud configuration
2020-04-22 00:52:16 | Enabling ssh admin (tripleo-admin) for hosts:
2020-04-22 00:52:16 | 192.168.24.27 192.168.24.14 192.168.24.26
2020-04-22 00:52:16 | Using ssh user heat-admin for initial connection.
2020-04-22 00:52:16 | Using ssh key at /home/zuul/.ssh/id_rsa for initial connection.
2020-04-22 00:52:16 | Inserting TripleO short term key for 192.168.24.27
2020-04-22 00:52:16 | Removing short term keys locally
2020-04-22 00:52:16 | Config downloaded at /var/lib/mistral/overcloud
2020-04-22 00:52:16 | Inventory generated at /var/lib/mistral/overcloud/tripleo-ansible-inventory.yaml
2020-04-22 00:52:16 | Running ansible playbook at /var/lib/mistral/overcloud/deploy_steps_playbook.yaml. See log file at /var/lib/mistral/overcloud/ansible.log for progress. ...

Why is it only "inserting short therm key for 192.168.24.27" ?

Don't we expect that to happen for all the hosts?
https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/workflows/deployment.py#L253-L263

There is a big try: -> finally: block that would mask errors?
https://opendev.org/openstack/python-tripleoclient/src/branch/stable/train/tripleoclient/workflows/deployment.py#L234-L314

wes hayutin (weshayutin)
Changed in tripleo:
status: New → Triaged
importance: Undecided → High
assignee: nobody → amolkahat (amolkahat)
milestone: none → ussuri-rc3
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to python-tripleoclient (stable/train)

Fix proposed to branch: stable/train
Review: https://review.opendev.org/723343

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Fix proposed to branch: stable/train
Review: https://review.opendev.org/723824

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/723343
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=5627d8d9cce2ea8f803cccd46e9eeede5cae5e9a
Submitter: Zuul
Branch: stable/train

commit 5627d8d9cce2ea8f803cccd46e9eeede5cae5e9a
Author: Rabi Mishra <email address hidden>
Date: Mon Apr 27 14:12:14 2020 +0530

    [stable-only] Raise error for temp_ssh_key import failure

    We should not be ignoring the error as we would skip the
    admin-enablement workflow and running ansible playbooks would
    fail later.

    Change-Id: I69c79b3bfec4467210ca6c480f61e14cbf1d0a44
    Partial-Bug: #1873892

tags: added: in-stable-train
tags: added: promotion-blocker
tags: added: alert
Changed in tripleo:
importance: High → Critical
milestone: ussuri-rc3 → ussuri-rc1
Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/723824
Committed: https://git.openstack.org/cgit/openstack/python-tripleoclient/commit/?id=da9dc6eb25d4edb2f3b5d4929d429399187fd663
Submitter: Zuul
Branch: stable/train

commit da9dc6eb25d4edb2f3b5d4929d429399187fd663
Author: Rabi Mishra <email address hidden>
Date: Tue Apr 28 12:45:34 2020 +0530

    [stable-only] Add retry for inserting temp_ssh_key

    For slow nodes, we don't wait for node to boot completely after the
    heat stack is CREATE/UPDATE_COMPLETE. This would ensure we try a
    few times before failing.

    Adds tenacity to requirements and bumps lower-constraints.

    Change-Id: Iee8f3200a3c108375c7ca296734db1a51914cd69
    Closes-Bug: #1873892

wes hayutin (weshayutin)
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient 12.4.0

This issue was fixed in the openstack/python-tripleoclient 12.4.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.