OVB fs001 in centos8 master fails to push certificates contents to controllers

Bug #1873770 reported by Gabriele Cerami
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Committed
Critical
Gabriele Cerami

Bug Description

periodic job periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master fails to push certificate contents to controllers during servers deployments.

In logs at

https://logserver.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/66f723d/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

We see

2020-04-20 04:12:51 | TASK [push certificate content] ************************************************
2020-04-20 04:12:51 | Monday 20 April 2020 04:12:51 +0000 (0:00:00.108) 0:02:16.736 **********
2020-04-20 04:12:51 | fatal: [overcloud-controller-0]: FAILED! => changed=false
2020-04-20 04:12:51 | censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result'
2020-04-20 04:12:51 | fatal: [overcloud-controller-1]: FAILED! => changed=false
2020-04-20 04:12:51 | censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result'
2020-04-20 04:12:51 | fatal: [overcloud-controller-2]: FAILED! => changed=false
2020-04-20 04:12:51 | censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result'

Logs are hidden for security reason, but the task is a simple copy from inline contents found here

https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/deployment/haproxy/haproxy-public-tls-inject.yaml

so the error is either a missing target directory, or one of the var in inline content is not defined

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/721247

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Related fix proposed to branch: master
Review: https://review.opendev.org/721255

Revision history for this message
Gabriele Cerami (gcerami) wrote :

Investigating with Cedric to offer complete logs in CI to debug this.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (master)

Reviewed: https://review.opendev.org/721247
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=cabed543fa74014d17a6ac603bc2239237864b21
Submitter: Zuul
Branch: master

commit cabed543fa74014d17a6ac603bc2239237864b21
Author: Cédric Jeanneret <email address hidden>
Date: Mon Apr 20 14:02:21 2020 +0200

    Introduce new HideSensitiveLogs parameter

    This one toggles the no_log parameter. Directly related to #1873770 in
    order to allow a deeper debug within CI.

    Change-Id: I27f677467263c0e6cc78d775edff55b3811fec1f
    Related-Bug: #1873770

Revision history for this message
yatin (yatinkarel) wrote :
Download full text (8.0 KiB)

With debug enabled following Error is reported:-
2020-04-23 08:24:59 | TASK [push certificate content] ************************************************
2020-04-23 08:24:59 | Thursday 23 April 2020 08:24:59 +0000 (0:00:00.116) 0:02:20.602 ********
2020-04-23 08:25:00 | fatal: [overcloud-controller-0]: FAILED! => changed=false
2020-04-23 08:25:00 | checksum: 63c1ca987a881f8562930669e09f8053fcd47750
2020-04-23 08:25:00 | msg: Destination directory /etc/pki/tls/private does not exist
2020-04-23 08:25:00 | fatal: [overcloud-controller-1]: FAILED! => changed=false
2020-04-23 08:25:00 | checksum: 63c1ca987a881f8562930669e09f8053fcd47750
2020-04-23 08:25:00 | msg: Destination directory /etc/pki/tls/private does not exist
2020-04-23 08:25:00 | fatal: [overcloud-controller-2]: FAILED! => changed=false
2020-04-23 08:25:00 | checksum: 63c1ca987a881f8562930669e09f8053fcd47750
2020-04-23 08:25:00 | msg: Destination directory /etc/pki/tls/private does not exist

logs:- https://logserver.rdoproject.org/12/26712/1/check/periodic-tripleo-ci-centos-8-ovb-3ctlr_1comp-featureset001-master/bbd78ab/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

/etc/pki/tls/private directory is part of openssl-libs package and should exist by default. The issue(deletion of directories) is same as https://bugs.launchpad.net/tripleo/+bug/1867602.

Collected following info from latest overcloud image which has issue:-
$ curl -O https://images.rdoproject.org/centos8/master/rdo_trunk/724c195e098a6a85afd12fffc11ffbc0/overcloud-full.tar
$ tar -xvf overcloud-full.tar
$ sudo LIBGUESTFS_BACKEND=direct guestmount -i -a overcloud-full.qcow2 /mnt

$ sudo chroot /mnt rpm -Va openssl-libs
missing /etc/pki/tls/misc
missing /etc/pki/tls/private

^^ these directories are removed somehow.

Also the issue is not specific to openssl-libs, it's affecting other directories as well:-

$ sudo chroot /mnt rpm -Va |grep missing
missing /var/lib/iscsi/ifaces
missing /var/lib/iscsi/isns
missing /var/lib/iscsi/nodes
missing /var/lib/iscsi/send_targets
missing /var/lib/iscsi/slp
missing /var/lib/iscsi/static
missing /etc/qemu-ga/fsfreeze-hook.d
missing /var/log/qemu-ga
missing /usr/lib64/libxslt-plugins
missing /usr/share/kdump
missing /var/crash
missing /usr/share/doc/python3-pycurl/tests/tmp
missing /usr/share/i18n
missing /usr/share/i18n/charmaps
missing /usr/share/i18n/locales
missing /etc/NetworkManager/conf.d
missing /etc/NetworkManager/dispatcher.d/no-wait.d
missing /etc/NetworkManager/dispatcher.d/pre-down.d
missing /etc/NetworkManager/dispatcher.d/pre-up.d
missing /etc/NetworkManager/dnsmasq-shared.d
missing /etc/NetworkManager/dnsmasq.d
missing /etc/NetworkManager/system-connections
missing /usr/lib/NetworkManager
missing /usr/lib/NetworkManager/VPN
missing /usr/lib/NetworkManager/conf.d
missing /usr/lib/NetworkManager/system-connections
missing /var/lib/NetworkManager
missing /etc/modules-load.d
missing /usr/lib/systemd/system-sleep
missing /var/empty/sshd
missing /etc/sudoers.d
missing /var/db/sudo
missing /var/db/sudo/lectured
...

Read more...

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/721255
Committed: https://git.openstack.org/cgit/openstack/tripleo-heat-templates/commit/?id=5eeadb2114016f1844565b77cab5555b51679190
Submitter: Zuul
Branch: master

commit 5eeadb2114016f1844565b77cab5555b51679190
Author: Gabriele Cerami <email address hidden>
Date: Mon Apr 20 13:20:39 2020 +0100

    Enable sensitive logs for OVB HA

    to debug correctly http://bugs.launchpad.net/tripleo/+bug/1873770

    Change-Id: I7f92b2a815b3fade3bdb933046ce611d0d68c020
    Related-Bug: #1873770

Revision history for this message
wes hayutin (weshayutin) wrote :

let's make sure this is consistently reproducible.. there is a prior bug and escalation on pretty much the same issue.

Changed in tripleo:
status: Triaged → Incomplete
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc1 → ussuri-rc3
wes hayutin (weshayutin)
Changed in tripleo:
milestone: ussuri-rc3 → victoria-1
Changed in tripleo:
milestone: victoria-1 → victoria-3
wes hayutin (weshayutin)
tags: removed: alert ci promotion-blocker
Changed in tripleo:
status: Incomplete → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-heat-templates (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803438

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-heat-templates (stable/train)

Reviewed: https://review.opendev.org/c/openstack/tripleo-heat-templates/+/803438
Committed: https://opendev.org/openstack/tripleo-heat-templates/commit/d2def42c539b78510ac8a6ba5c9850f91ef00d88
Submitter: "Zuul (22348)"
Branch: stable/train

commit d2def42c539b78510ac8a6ba5c9850f91ef00d88
Author: Cédric Jeanneret <email address hidden>
Date: Mon Apr 20 14:02:21 2020 +0200

    Introduce new HideSensitiveLogs parameter

    This one toggles the no_log parameter. Directly related to #1873770 in
    order to allow a deeper debug within CI.

    Change-Id: I27f677467263c0e6cc78d775edff55b3811fec1f
    Related-Bug: #1873770
    (cherry picked from commit cabed543fa74014d17a6ac603bc2239237864b21)

tags: added: in-stable-train
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.