train OVB jobs are failing overcloud deployment "couldn't open temporary file /home/heat-admin/.ssh/sedSAD1k3: Permission denied"

Bug #1942356 reported by Ronelle Landy
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

OVB jobs in the train integration and tripleo component lines are failing to deploy the overcloud with the following error:

021-08-31 22:00:54 | sed: couldn't open temporary file /home/heat-admin/.ssh/sedSAD1k3: Permission denied
2021-08-31 22:00:54 | Could not import keys to one of ['192.168.24.13', '192.168.24.19', '192.168.24.26', '192.168.24.18']. Original error message: Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'PasswordAuthentication=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', '/home/zuul/.ssh/id_rsa', '-l', 'heat-admin', '192.168.24.13', "sed -i -e '/TripleO split stack short term key/d' $HOME/.ssh/authorized_keys"]' returned non-zero exit status 4.
2021-08-31 22:00:54 |
2021-08-31 22:00:55 | Host 10.0.0.5 not found in /home/zuul/.ssh/known_hosts
2021-08-31 22:00:55 | Exception occured while running the command
2021-08-31 22:00:55 | Traceback (most recent call last):
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 32, in run
2021-08-31 22:00:55 | super(Command, self).run(parsed_args)
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
2021-08-31 22:00:55 | return super(Command, self).run(parsed_args)
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/cliff/command.py", line 185, in run
2021-08-31 22:00:55 | return_code = self.take_action(parsed_args) or 0
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1179, in take_action
2021-08-31 22:00:55 | raise(deploy_trace)
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 1121, in take_action
2021-08-31 22:00:55 | parsed_args.overcloud_ssh_port_timeout
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 204, in get_hosts_and_enable_ssh_admin
2021-08-31 22:00:55 | enable_ssh_timeout, enable_ssh_port_timeout)
2021-08-31 22:00:55 | File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 324, in enable_ssh_admin
2021-08-31 22:00:55 | subprocess.check_call(rm_tmp_key_command, stderr=subprocess.STDOUT)
2021-08-31 22:00:55 | File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call
2021-08-31 22:00:55 | raise CalledProcessError(retcode, cmd)
2021-08-31 22:00:55 | subprocess.CalledProcessError: Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'PasswordAuthentication=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', '/home/zuul/.ssh/id_rsa', '-l', 'heat-admin', '192.168.24.13', "sed -i -e '/TripleO split stack short term key/d' $HOME/.ssh/authorized_keys"]' returned non-zero exit status 4.
2021-08-31 22:00:55 | Command '['ssh', '-o', 'ConnectionAttempts=6', '-o', 'ConnectTimeout=30', '-o', 'StrictHostKeyChecking=no', '-o', 'PasswordAuthentication=no', '-o', 'UserKnownHostsFile=/dev/null', '-i', '/home/zuul/.ssh/id_rsa', '-l', 'heat-admin', '192.168.24.13', "sed -i -e '/TripleO split stack short term key/d' $HOME/.ssh/authorized_keys"]' returned non-zero exit status 4.

Example logs are below:

https://logserver.rdoproject.org/openstack-component-tripleo/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-tripleo-train/b80c17c/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

https://logserver.rdoproject.org/openstack-periodic-integration-stable4/opendev.org/openstack/tripleo-ci/9d5ff817ea8d75d837d26cf34d1dd6b949d7b4e0/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-train/a6299bf/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

The failure in the tripleo component line started on 08/28:
periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-tripleo-train

Ronelle Landy (rlandy)
Changed in tripleo:
milestone: none → xena-3
importance: Undecided → Critical
status: New → Triaged
tags: added: ci promotion-blocker
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
Douglas Viroel (dviroel) wrote :

The overcloud nodes have 'root' user as owner of '.ssh' directory:

[heat-admin@overcloud-novacompute-0 ~]$ ls -lha /home/heat-admin/
total 12K
drwx------. 3 heat-admin heat-admin 74 Sep 1 19:02 .
drwxr-xr-x. 3 root root 24 Sep 1 19:02 ..
-rw-r--r--. 1 heat-admin heat-admin 18 Jul 27 14:21 .bash_logout
-rw-r--r--. 1 heat-admin heat-admin 141 Jul 27 14:21 .bash_profile
-rw-r--r--. 1 heat-admin heat-admin 376 Jul 27 14:21 .bashrc
drwxr-xr-x. 2 root root 29 Sep 1 19:02 .ssh
[heat-admin@overcloud-novacompute-0 ~]$ touch .ssh/temp_file
touch: cannot touch '.ssh/temp_file': Permission denied

Revision history for this message
Douglas Viroel (dviroel) wrote :

Cloud-init is setting different ownership in '.ssh' dir.:

Failed job:
[DEBUG]: Changing the ownership of /home/heat-admin/.ssh to 0:0
https://logserver.rdoproject.org/openstack-component-tripleo/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-tripleo-train/cbc1d80/logs/overcloud-controller-0/var/log/cloud-init.log.txt.gz
Cloud-init version:
cloud-init.noarch 21.1-6.el8 @appstream

Succeeded job:
[DEBUG]: Changing the ownership of /home/heat-admin/.ssh to 1000:1001
https://logserver.rdoproject.org/openstack-component-tripleo/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-tripleo-train/a284598/logs/overcloud-controller-1/var/log/cloud-init.log.txt.gz
Cloud-init version:
cloud-init.noarch 21.1-3.el8 @appstream

Revision history for this message
Douglas Viroel (dviroel) wrote :
Revision history for this message
Ronelle Landy (rlandy) wrote :
Revision history for this message
xiaoyi chen (xiachen-rh) wrote :

bug fix in
https://bugs.launchpad.net/cloud-init/+bug/1940233

It is believed to be fixed in cloud-init in version 21.3.
If this is still a problem for you, please make a comment and set the state of LP#1940233 back to New.

thanks

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to python-tripleoclient (stable/train)

Reviewed: https://review.opendev.org/c/openstack/python-tripleoclient/+/806993
Committed: https://opendev.org/openstack/python-tripleoclient/commit/ffa2440ee13be9b53a2f7fe216dc7558a2dcad02
Submitter: "Zuul (22348)"
Branch: stable/train

commit ffa2440ee13be9b53a2f7fe216dc7558a2dcad02
Author: Douglas Viroel <email address hidden>
Date: Wed Sep 1 18:31:45 2021 -0300

    Fix edit authorized_keys file in-place

    This fix is to avoid the issue described in #1942356
    Cloud-init may not give permissions to the entire .ssh directory
    which breaks sed --in-place operation, since it always create a
    temporary file in the same directory.
    New releases already use ssh-admin playbook instead of shell
    commands.

    Closes-Bug: #1942356
    Change-Id: If2db45f09eface56b5b26646b97e1fdbfb7b5a3e
    Signed-off-by: Douglas Viroel <email address hidden>

tags: added: in-stable-train
Revision history for this message
wes hayutin (weshayutin) wrote :
Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/python-tripleoclient train-eol

This issue was fixed in the openstack/python-tripleoclient train-eol release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.