[queens promotion] fs001 fails overcloud deploy with 'Authentication failed'

Bug #1750874 reported by Ronelle Landy
20
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Alex Schultz

Bug Description

In the latest queens promotion run, featureset001 test failed to deploy the overcloud with :

2018-02-21 16:26:21 | 2018-02-21 16:26:14Z [overcloud.AllNodesDeplStarting workflow to create ssh admin on deployed servers.
2018-02-21 16:26:21 | SSH user: heat-admin
2018-02-21 16:26:21 | SSH key file: /home/jenkins/.ssh/id_rsa
2018-02-21 16:26:21 | Hosts: 192.168.24.9 192.168.24.11 192.168.24.16 192.168.24.15
2018-02-21 16:26:21 |
2018-02-21 16:26:21 | Inserting TripleO short term key for 192.168.24.9
2018-02-21 16:26:21 | Warning: Permanently added '192.168.24.9' (ECDSA) to the list of known hosts.
2018-02-21 16:28:21 | Authentication failed.
2018-02-21 16:28:21 | END return value: 1

The full overcloud deploy log is:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/778d44f/undercloud/home/jenkins/overcloud_deploy.log.txt.gz

See error log:

https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/778d44f/undercloud/var/log/extra/errors.txt.gz

Note that we were still w/o https://review.openstack.org/#/c/546574/ when this promotion kicked.

Ronelle Landy (rlandy)
tags: added: ci promotion-blocker
Changed in tripleo:
milestone: none → queens-rc1
importance: Undecided → Critical
status: New → Triaged
Revision history for this message
wes hayutin (weshayutin) wrote :
tags: added: alert
Revision history for this message
Alan Pevec (apevec) wrote :

> Note that we were still w/o https://review.openstack.org/#/c/546574/ when this promotion kicked.

It was included via temp. patch in RPM https://github.com/rdo-packages/heat-distgit/commit/625bf99fccc27af9b51baae9401b20697b33483e
which was reverted in the meantime when upstream merged.

Revision history for this message
Thomas Herve (therve) wrote :

I think it's an infra issue, all the sub nodes looks inaccessible.

Revision history for this message
yatin (yatinkarel) wrote :

It reproduced again in current pipeline run: https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/4a63131/undercloud/home/jenkins/overcloud_deploy.log.txt.gz

I locally also reproduced it, but currently i don't have environment.
INFO: I was able to SSH to overcloud from undercloud after sometime(just after failure not able to access) of the failure reported. Just to confirm it again i tried to reproduce but no success, it get's stuck at overcloud-prep-containers.

Revision history for this message
yatin (yatinkarel) wrote :
Revision history for this message
wes hayutin (weshayutin) wrote :

2018-02-22 16:11:02 | SSH user: heat-admin
2018-02-22 16:11:02 | SSH key file: /home/zuul/.ssh/id_rsa
2018-02-22 16:11:02 | Hosts: 192.168.24.18 192.168.24.17 192.168.24.12 192.168.24.8
2018-02-22 16:11:02 |
2018-02-22 16:11:04 | Inserting TripleO short term key for 192.168.24.18
2018-02-22 16:11:04 | Warning: Permanently added '192.168.24.18' (ECDSA) to the list of known hosts.
2018-02-22 16:13:04 | Authentication failed.
2018-02-22 16:13:04 | /usr/share/openstack-tripleo-heat-templates/deployed-server/scripts/enable-ssh-admin
.sh failed.

(undercloud) [zuul@undercloud ~]$ nc -v 192.168.24.18 22
Ncat: Version 6.40 ( http://nmap.org/ncat )
Ncat: Connected to 192.168.24.18:22.
SSH-2.0-OpenSSH_7.4

Protocol mismatch.

Revision history for this message
wes hayutin (weshayutin) wrote :

<mwhahaha> yea the issue is the newer docker package i think
<mwhahaha> https://logs.rdoproject.org/openstack-periodic/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-queens/4a63131/rpm-qa.txt.gz
<mwhahaha> docker-1.13.1-47.2.gitf43d177.el7.x86_64
<weshay> aye.. the one that worked was
<weshay> docker.x86_64 2:1.12.6-68.gitec8512b.el7.centos @extras
<mwhahaha> right so we have a fix pending
<mwhahaha> waiting on ovb jobs and i'll merge those in
<weshay> +1

Revision history for this message
wes hayutin (weshayutin) wrote :

hrm.. I added a test network to the deployment and attached each node, and associated a floating ip to each node in the overcloud. Now I'm able to ssh in via the float or the ctrlplane ip

Revision history for this message
Ronelle Landy (rlandy) wrote :

Also showing up in fs042 ...

2018-02-22 14:43:25 | SSH user: heat-admin
2018-02-22 14:43:25 | SSH key file: /home/jenkins/.ssh/id_rsa
2018-02-22 14:43:25 | Hosts: 192.168.24.11 192.168.24.16 192.168.24.15 192.168.24.6
2018-02-22 14:43:25 |
2018-02-22 14:43:25 | Inserting TripleO short term key for 192.168.24.11
2018-02-22 14:43:25 | Warning: Permanently added '192.168.24.11' (ECDSA) to the list of known hosts.
2018-02-22 14:45:25 | Authentication failed.
2018-02-22 14:45:25 | /usr/share/openstack-tripleo-heat-templates/deployed-server/scripts/enable-ssh-admin.sh failed.
2018-02-22 14:45:25 | END return value: 1

https://logs.rdoproject.org/37/545837/2/openstack-check/gate-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset042-master-tht/Z9c215563864040be80b9c56ce2f9cded/undercloud/home/jenkins/overcloud_deploy.log.txt.gz

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to instack-undercloud (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/547221

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (master)

Fix proposed to branch: master
Review: https://review.openstack.org/547281

Changed in tripleo:
assignee: nobody → Alex Schultz (alex-schultz)
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on instack-undercloud (master)

Change abandoned by Alex Schultz (<email address hidden>) on branch: master
Review: https://review.openstack.org/547221
Reason: https://review.openstack.org/547281

Changed in tripleo:
assignee: Alex Schultz (alex-schultz) → Harald Jensås (harald-jensas)
Changed in tripleo:
assignee: Harald Jensås (harald-jensas) → Alex Schultz (alex-schultz)
Revision history for this message
Ronelle Landy (rlandy) wrote :

We are seeing a similar trace on tripleo-ci-centos-7-containers-multinode:

2018-02-26 15:51:30 | 2018-0Please set $OVERCLOUD_HOSTS
2018-02-26 15:51:30 | /usr/share/openstack-tripleo-heat-templates/deployed-server/scripts/enable-ssh-admin.sh failed.
2018-02-26 15:51:30 | END return value: 1

http://logs.openstack.org/53/547153/4/check/tripleo-ci-centos-7-containers-multinode/7acf21e/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2018-02-26_15_51_30

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (master)

Reviewed: https://review.openstack.org/547281
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=bfb758b5e792c83e5cde9847bcad424fcfaf071d
Submitter: Zuul
Branch: master

commit bfb758b5e792c83e5cde9847bcad424fcfaf071d
Author: Alex Schultz <email address hidden>
Date: Thu Feb 22 23:01:49 2018 -0700

    Fix bootstrap NAT

    Docker will switch the FORWARD filter to DROP if it sets the ip_forward
    to 1. Previously we were doing this in a post configuration element
    rather than in the puppet run itself. This change moves the ip_forward=1
    to puppet so it runs prior to docker being installed. Additionally we
    are ensuring that the full set of network rules are being added to the
    FORWARD filter because previously we were only setting half of them.
    This would allow us to actually not have to use ACCEPT as the default
    for the FORWARD filter but this would require additional testing.

    Previously we had tried switching the default policy back to ACCEPT,
    however given that docker is not configuring the iptables rule until
    it's installed and started, the puppet rules do not actually apply on
    the installation of the undercloud. The puppet management of the
    defaults for the FORWARD chain only gets updated on a subsequent run of
    the installer which will not work.

    Change-Id: Ieae6a74f7269bd64606fd80a2a08b2058c24d2c5
    Closes-Bug: #1750194
    Closes-Bug: #1750874

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (stable/pike)

Fix proposed to branch: stable/pike
Review: https://review.openstack.org/548616

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (stable/pike)

Reviewed: https://review.openstack.org/548616
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=50217d7a93dce7fdc17c0dfbb04260f86fd3ac7d
Submitter: Zuul
Branch: stable/pike

commit 50217d7a93dce7fdc17c0dfbb04260f86fd3ac7d
Author: Alex Schultz <email address hidden>
Date: Thu Feb 22 23:01:49 2018 -0700

    Fix bootstrap NAT

    Docker will switch the FORWARD filter to DROP if it sets the ip_forward
    to 1. Previously we were doing this in a post configuration element
    rather than in the puppet run itself. This change moves the ip_forward=1
    to puppet so it runs prior to docker being installed. Additionally we
    are ensuring that the full set of network rules are being added to the
    FORWARD filter because previously we were only setting half of them.
    This would allow us to actually not have to use ACCEPT as the default
    for the FORWARD filter but this would require additional testing.

    Conflicts:
     instack_undercloud/tests/test_undercloud.py
     instack_undercloud/undercloud.py

    Change-Id: Ieae6a74f7269bd64606fd80a2a08b2058c24d2c5
    Closes-Bug: #1750194
    Closes-Bug: #1750874
    (cherry picked from commit bfb758b5e792c83e5cde9847bcad424fcfaf071d)

tags: added: in-stable-pike
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 8.3.0

This issue was fixed in the openstack/instack-undercloud 8.3.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 7.4.10

This issue was fixed in the openstack/instack-undercloud 7.4.10 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/551335

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (stable/newton)

Fix proposed to branch: stable/newton
Review: https://review.openstack.org/551340

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :
Revision history for this message
Alex Schultz (alex-schultz) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (stable/newton)

Reviewed: https://review.openstack.org/551340
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=663dad2a37fed795e78c6911e0338c03977c66e6
Submitter: Zuul
Branch: stable/newton

commit 663dad2a37fed795e78c6911e0338c03977c66e6
Author: Alex Schultz <email address hidden>
Date: Thu Feb 22 23:01:49 2018 -0700

    Fix bootstrap NAT

    Docker will switch the FORWARD filter to DROP if it sets the ip_forward
    to 1. Previously we were doing this in a post configuration element
    rather than in the puppet run itself. This change moves the ip_forward=1
    to puppet so it runs prior to docker being installed. Additionally we
    are ensuring that the full set of network rules are being added to the
    FORWARD filter because previously we were only setting half of them.
    This would allow us to actually not have to use ACCEPT as the default
    for the FORWARD filter but this would require additional testing.

    Conflicts:
     elements/puppet-stack-config/puppet-stack-config.yaml.template
     elements/undercloud-install/os-refresh-config/post-configure.d/98-undercloud-setup

    Change-Id: Ieae6a74f7269bd64606fd80a2a08b2058c24d2c5
    Closes-Bug: #1750194
    Closes-Bug: #1750874
    (cherry picked from commit bfb758b5e792c83e5cde9847bcad424fcfaf071d)
    (cherry picked from commit 50217d7a93dce7fdc17c0dfbb04260f86fd3ac7d)

tags: added: in-stable-newton
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (stable/ocata)

Reviewed: https://review.openstack.org/551335
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=256fecbf508f0753175835e9c685e3e49399b88d
Submitter: Zuul
Branch: stable/ocata

commit 256fecbf508f0753175835e9c685e3e49399b88d
Author: Alex Schultz <email address hidden>
Date: Thu Feb 22 23:01:49 2018 -0700

    Fix bootstrap NAT

    Docker will switch the FORWARD filter to DROP if it sets the ip_forward
    to 1. Previously we were doing this in a post configuration element
    rather than in the puppet run itself. This change moves the ip_forward=1
    to puppet so it runs prior to docker being installed. Additionally we
    are ensuring that the full set of network rules are being added to the
    FORWARD filter because previously we were only setting half of them.
    This would allow us to actually not have to use ACCEPT as the default
    for the FORWARD filter but this would require additional testing.

    Conflicts:
     elements/puppet-stack-config/puppet-stack-config.yaml.template
     elements/undercloud-install/os-refresh-config/post-configure.d/98-undercloud-setup

    Change-Id: Ieae6a74f7269bd64606fd80a2a08b2058c24d2c5
    Closes-Bug: #1750194
    Closes-Bug: #1750874
    (cherry picked from commit bfb758b5e792c83e5cde9847bcad424fcfaf071d)
    (cherry picked from commit 50217d7a93dce7fdc17c0dfbb04260f86fd3ac7d)

tags: added: in-stable-ocata
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 6.1.6

This issue was fixed in the openstack/instack-undercloud 6.1.6 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 5.3.8

This issue was fixed in the openstack/instack-undercloud 5.3.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.