DC: create pod with image from external registry failed on subclouds

Bug #1891377 reported by Andy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Andy

Bug Description

Brief Description
-----------------
In a subcloud of DC system, try to create a pod with image from an external registry that has a CA signed certificate will fail, caused by image pulling error. Containerd complains that the registry's certificate is signed by unknown authority, even though the CA certificate is installed in the subcloud.

Severity
--------
Critical

Steps to Reproduce
------------------
- In the subcloud, install the CA certificate that signed the registry's certificate by:
  system certificate-install -m ssl_ca <CA cert>
- Create a pod with an image from the registry by commandline or yaml file

Expected Behavior
------------------
The pod launches successfully.

Actual Behavior
----------------
The pod is failed to launch:

- kubectl get pod -n <namespace> will show the pod is in "ErrImagePull" state
- kubectl describe pod <pod name> -n <namespace> will show error:
  x509: certificate signed by unknown authority

Reproducibility
---------------
100% reproducible

System Configuration
--------------------
Any

Branch/Pull Time/Commit
-----------------------
stx master

Last Pass
---------
Unknown

Timestamp/Logs
--------------

Warning Failed 57s (x4 over 2m34s) kubelet, controller-0 Failed to pull image "registry.central:9001/gwaines/hellokitty:v1.0": rpc error: code = Unknown desc = failed to pull and unpack image "registry.central:9001/gwaines/hellokitty:v1.0": failed to resolve reference "registry.central:9001/gwaines/hellokitty:v1.0": failed to do request: Head https://registry.central:9001/v2/gwaines/hellokitty/manifests/v1.0: x509: certificate signed by unknown authority

Note: The above log is from a DC system where I setup the registry.central as the external registry with a CA signed certificate.

Test Activity
-------------
Developer Testing

Workaround
----------
restart containerd after CA certificate is installed.
systemctl restart containerd

Andy (andy.wrs)
Changed in starlingx:
assignee: nobody → Andy (andy.wrs)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/745949

Ghada Khalil (gkhalil)
tags: added: stx.5.0 stx.distcloud
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority as there is a workaround

Changed in starlingx:
importance: Undecided → Medium
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/745949
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=b88aa4ed31267d046673d39e487e83552a7806f1
Submitter: Zuul
Branch: master

commit b88aa4ed31267d046673d39e487e83552a7806f1
Author: Andy Ning <email address hidden>
Date: Wed Aug 12 14:25:02 2020 -0400

    Restart containerd after CA certificate is installed

    containerd doesn't reload system trusted CA bundle after new CA
    certificates are installed. This will cause pod launching from the
    registry whose signing CA cert has been installed to fail due to
    image pull error, because it can't validate the registry's certificate.

    This change updated platform config runtime puppet manifest to restart
    containerd to reload the updated trust CA bundle after new CA
    certificates are installed.

    Change-Id: Ifbe2ec1190299fc5b6505347a37d4e4f232993be
    Closes-Bug: 1891377
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
Bart Wensley (bartwensley) wrote :

The fix caused sanity to fail and has been reverted.

Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/747469

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/747469
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=43646466b08eea34a969efd4d52536203e046633
Submitter: Zuul
Branch: master

commit 43646466b08eea34a969efd4d52536203e046633
Author: Andy Ning <email address hidden>
Date: Fri Aug 21 15:37:25 2020 -0400

    Restart containerd after CA certificate is installed

    This is updated version of commit
    b88aa4ed31267d046673d39e487e83552a7806f1. The docker.service is actually
    set up such that when containerd is restarted by systemd, docker is
    restarted as well.

    So this change removed docker restart, leaving only containerd restart.
    This fixes the containerd/docker continously restarting issue caused by
    the previous commit.

    Change-Id: I444849eb1aa6c86d79f32a1ec64964637fc4efb4
    Closes-Bug: 1891377
    Signed-off-by: Andy Ning <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.