User cannot launch pods on subcloud using image from registry.central

Bug #1887392 reported by Bart Wensley
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Jerry Sun

Bug Description

Brief Description
-----------------
User cannot launch pods on subcloud using image from registry.central. The kubelet fails to pull the image with the following error:
Failed to pull image "registry.central:9001/busybox:latest": rpc error: code = Unknown desc = failed to pull and unpack image "registry.central:9001/busybox:latest": failed to resolve reference "registry.central:9001/busybox:latest": failed to do request: Head https://registry.central:9001/v2/busybox/manifests/latest: x509: certificate signed by unknown authority

Severity
--------
Major: This functionality is completely broken.

Steps to Reproduce
------------------
1. Push an image to registry.central on the system controller.
2. Create a secret on a subcloud with the credentials for the registry.central.
3. Attempt to create a pod on a subcloud using the image from registry.central and the secret.

Expected Behavior
------------------
The pod launches successfully.

Actual Behavior
----------------
The pod gets stuck in ImagePullBackOff because it cannot pull the image.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
Distribute Cloud

Branch/Pull Time/Commit
-----------------------
Designer load built from starlingx master on July 10, 2020.

Last Pass
---------
Unknown - not sure if anyone has ever tested this before.

Timestamp/Logs
--------------
See above.

Test Activity
-------------
Developer Testing

Workaround
----------
Use the local registry instead of the central registry.

tags: added: stx.containers stx.distcloud
Revision history for this message
Bart Wensley (bartwensley) wrote :

Here is the yaml file I used:

# cat busybox-central.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-central
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  selector:
    matchLabels:
      run: busybox-central
  template:
    metadata:
      labels:
        run: busybox-central
    spec:
      containers:
      - args:
        - sh
        image: registry.central:9001/busybox:latest
        imagePullPolicy: Always
        name: busybox-central
        stdin: true
        tty: true
      restartPolicy: Always
      imagePullSecrets:
      - name: testuser-central-registry-secret

Here is the command I used to create the secret:

# kubectl create secret docker-registry testuser-central-registry-secret \
--docker-server=registry.central:9001 --docker-username=admin \
--docker-password=<PASSWORD> --<email address hidden>

Revision history for this message
Greg Waines (greg-waines) wrote :

Debugged this with Bart.

'docker pull registry.central:9001/busybox:latest' WORKS.
'crictl pull registry.central:9001/busybox:latest' does NOT work.

Docker client checks here for registry certs that should be TRUSTED.
/etc/docker/<hostname>[:<port>]/<name>.crt
could be several.
E.g. on subcloud,
/etc/docker/certs.d/registry.local\:9001/registry-cert.crt
/etc/docker/certs.d/registry.central\:9001/registry-cert.crt

crictl uses the entries in the config.toml file for specifying certs that should be TRUSTED:
e.g.
[plugins.cri.registry.configs."registry.local:9001".tls]
      ca_file = "/etc/docker/certs.d/registry.local:9001/registry-cert.crt"

However on subcloud there is ONLY an entry for registry.local ...
i.e. there is NO ENTRY for registry.central ... where there SHOULD be
e.g.
[plugins.cri.registry.configs."registry.central:9001".tls]
      ca_file = "/etc/docker/certs.d/registry.central:9001/registry-cert.crt"

Revision history for this message
Andy (andy.wrs) wrote :

I think this is because we stopped synchronizing docker_registry cert in DC system.

Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Jerry Sun (jerry-sun-u)
Revision history for this message
Ghada Khalil (gkhalil) wrote :

stx.5.0 / medium priority - the workaround is to use the local registry on the subclouds

Changed in starlingx:
importance: Undecided → Medium
status: New → Triaged
tags: added: stx.5.0
Jerry Sun (jerry-sun-u)
Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/743400

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/746790

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/743400
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=9a856058c25285fa38afeb7c0257f8fd82be2d58
Submitter: Zuul
Branch: master

commit 9a856058c25285fa38afeb7c0257f8fd82be2d58
Author: Jerry Sun <email address hidden>
Date: Mon Jul 27 16:23:27 2020 -0400

    Update containerd config with registry.central cert

    Registry.central's certificate is not in the containerd config file.
    This causes failures when deploying kubernetes pods that uses images
    from registry.central in a distributed cloud deployment.
    Registry.central's certificate could be self signed. This commit
    specifies registry.central's certificate to be trusted by containerd.

    Closes-bug: 1887392

    Change-Id: I8debac4fb8b994deafc02416c500a1d412a8aeca
    Signed-off-by: Jerry Sun <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/746790
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=4c88426af0c18b0a222dd2dd31f36de89205ec2e
Submitter: Zuul
Branch: master

commit 4c88426af0c18b0a222dd2dd31f36de89205ec2e
Author: Jerry Sun <email address hidden>
Date: Tue Aug 18 15:06:06 2020 -0400

    Update containerd config with registry.central cert

    Registry.central's certificate is not in the containerd config file.
    This causes failures when deploying kubernetes pods that uses images
    from registry.central in a distributed cloud deployment.
    Registry.central's certificate could be self signed. This commit
    updates the config file during ansible bootstrap.

    Depends-On: https://review.opendev.org/#/c/743400/
    Partial-bug: 1887392

    Change-Id: Id641e932f6f752339b318b3fd3e1360c4d278f00
    Signed-off-by: Jerry Sun <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.