Apply-failed for cert-manager

Bug #1875449 reported by Nicolae Jascanu
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Critical
Nicolae Jascanu

Bug Description

Brief Description
-----------------
Ansible playbook fails to apply cert-manager

Severity
--------
Critical: Provision crashes

Steps to Reproduce
------------------
Running Provision steps from daily sanity setup on layered image 20200426T020107Z

Expected Behavior
------------------
Provision should finalize properly

Actual Behavior
----------------
Provisioning fails with cert-manager apply-failed error

Reproducibility
---------------
100%

System Configuration
--------------------
Simplex, Duplex, Standard, Standard External - Baremetal
controller-0:~$ cat /etc/build.info
###
### StarlingX
### Built from master
###
OS="centos"
SW_VERSION="20.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20200426T020107Z"
JOB="STX_build_layer_flock_master_master"
<email address hidden>"
BUILD_NUMBER="96"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2020-04-26 02:01:07 +0000"
FLOCK_OS="centos"
FLOCK_JOB="STX_build_layer_flock_master_master"
<email address hidden>"
FLOCK_BUILD_NUMBER="96"
FLOCK_BUILD_HOST="starlingx_mirror"
FLOCK_BUILD_DATE="2020-04-26 02:01:07 +0000"

[sysadmin@controller-0 ~(keystone_admin)]$ cat /etc/platform/platform.conf
nodetype=controller
subfunction=controller,worker
system_type=All-in-one
security_profile=standard
management_interface=lo
http_port=8080
INSTALL_UUID=368ddab0-8ccd-4300-bf2c-87658e19fc2d
UUID=518834be-492d-4e95-a6fc-9599a4852953
sdn_enabled=no
region_config=no
system_mode=simplex
sw_version=20.01
security_feature="nopti nospectre_v2 nospectre_v1"
vswitch_type=none
region_config=False

Branch/Pull Time/Commit
-----------------------
Layered build taken from: http://mirror.starlingx.cengn.ca/mirror/starlingx/master/centos/flock/20200426T020107Z/outputs/iso/bootimage.iso

Last Pass
---------
This is the first try with the layered build, issue is not happening on master branch.

Timestamp/Logs

Full ansible log collected from SIMPLEX is at http://paste.openstack.org/show/792770/

Extract from log
--------------
2020-04-27 09:54:45,289 p=12140 u=sysadmin | fatal: [localhost]: FAILED! => {"attempts": 30, "changed": true, "cmd": "source /etc/platform/openrc; system application-show cert-manager --column status --format value", "delta": "0:00:01.708557", "end": "2020-04-27 09:54:45.272058", "rc": 0, "start": "2020-04-27 09:54:43.563501", "stderr": "", "stderr_lines": [], "stdout": "apply-failed", "stdout_lines": ["apply-failed"]}
2020-04-27 09:54:45,291 p=12140 u=sysadmin | PLAY RECAP *********************************************************************
2020-04-27 09:54:45,291 p=12140 u=sysadmin | localhost : ok=308 changed=175 unreachable=0 failed=1

Test Activity
-------------
Sanity (for layered build).

Revision history for this message
Frank Miller (sensfan22) wrote :

The log file is incomplete at: http://paste.openstack.org/show/792770/
It does not show where the failure is and only shows the first 592 lines which all are ok. Please provide a full set of logs.

Ghada Khalil (gkhalil)
summary: - Apply-failed for cert-manager
+ Layered Build: Apply-failed for cert-manager
Changed in starlingx:
status: New → Incomplete
Revision history for this message
Nicolae Jascanu (njascanu-intel) wrote : Re: Layered Build: Apply-failed for cert-manager

We are seeing the same issue on the sanity validation for image 20200427T233018Z.
I've collected debug info from SIMPLEX and DUPLEX and uploaded at: https://files.starlingx.kube.cengn.ca/launchpad/1875449

controller-0:~$ cat /etc/build.info
###
### StarlingX
### Built from master
###

OS="centos"
SW_VERSION="20.01"
BUILD_TARGET="Host Installer"
BUILD_TYPE="Formal"
BUILD_ID="20200427T233018Z"

JOB="STX_build_layer_flock_master_master"
<email address hidden>"
BUILD_NUMBER="98"
BUILD_HOST="starlingx_mirror"
BUILD_DATE="2020-04-27 23:30:18 +0000"

FLOCK_OS="centos"
FLOCK_JOB="STX_build_layer_flock_master_master"
<email address hidden>"
FLOCK_BUILD_NUMBER="98"
FLOCK_BUILD_HOST="starlingx_mirror"
FLOCK_BUILD_DATE="2020-04-27 23:30:18 +0000"

Revision history for this message
Nicolae Jascanu (njascanu-intel) wrote :
Ghada Khalil (gkhalil)
tags: added: stx.build
Revision history for this message
Frank Miller (sensfan22) wrote :

The collect files may be helpful but first we require the ansible.log file that was created when you ran the playbook for cert-mgr. Please provide the ansible.log file from /home/sysadmin if run on the controller. Or if ansible was run remotely then from the remote server used (user's home directory).

Ghada Khalil (gkhalil)
tags: added: stx.config
removed: stx.build
Frank Miller (sensfan22)
Changed in starlingx:
status: Incomplete → Invalid
assignee: nobody → Nicolae Jascanu (njascanu-intel)
Revision history for this message
Frank Miller (sensfan22) wrote :

The ansible failure is because the cert-manager images are not found in the private registry being used:

From sysinv.log:
sysinv 2020-04-28 08:26:08.397 118990 ERROR sysinv.conductor.kube_app [-] Image 192.168.100.60/jetstack/cert-manager-cainjector:v0.15.0-alpha.1 download failed from public/privateregistry: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-cainjector:v0.15.0-alpha.1 not found"): NotFound: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-cainjector:v0.15.0-alpha.1 not found")
sysinv 2020-04-28 08:26:08.405 118990 ERROR sysinv.conductor.kube_app [-] Image 192.168.100.60/jetstack/cert-manager-acmesolver:v0.15.0-alpha.1 download failed from public/privateregistry: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-acmesolver:v0.15.0-alpha.1 not found"): NotFound: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-acmesolver:v0.15.0-alpha.1 not found")
sysinv 2020-04-28 08:26:08.573 118990 ERROR sysinv.conductor.kube_app [-] Image 192.168.100.60/jetstack/cert-manager-webhook:v0.15.0-alpha.1 download failed from public/privateregistry: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-webhook:v0.15.0-alpha.1 not found"): NotFound: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-webhook:v0.15.0-alpha.1 not found")
sysinv 2020-04-28 08:26:08.690 118990 ERROR sysinv.conductor.kube_app [-] Image 192.168.100.60/jetstack/cert-manager-controller:v0.15.0-alpha.1 download failed from public/privateregistry: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-controller:v0.15.0-alpha.1 not found"): NotFound: 404 Client Error: Not Found ("manifest for 192.168.100.60/jetstack/cert-manager-controller:v0.15.0-alpha.1 not found")

Please update your private registry with the new images as indicated in email from Sabeel Ansari on April 28.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

The required images are:
quay.io/jetstack/cert-manager-controller: v0.15.0-alpha.1
quay.io/jetstack/cert-manager-webhook: v0.15.0-alpha.1
quay.io/jetstack/cert-manager-cainjector: v0.15.0-alpha.1
quay.io/jetstack/cert-manager-acmesolver: v0.15.0-alpha.1

summary: - Layered Build: Apply-failed for cert-manager
+ Apply-failed for cert-manager
Changed in starlingx:
importance: Undecided → Critical
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.