Application apply failed at waiting for ingress pods due to fetch token failure

Bug #1881353 reported by Ghada Khalil
22
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
High
Bob Church

Bug Description

Brief Description
-----------------
Application apply failed at waiting for ingress pods due to fetch token failure

Severity
--------
Major

Steps to Reproduce
------------------
- Install and configure system with openstack
- Lock/unlock both controllers were performed
- Configure additional applications or attempt to re-apply openstack

Expected Behavior
------------------
The new apply works

Actual Behavior
----------------
The apply fails with this error in the logs:
Warning FailedCreatePodSandBox 4m37s (x70 over 19m) kubelet, controller-0 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to pull image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to pull and unpack image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to resolve reference "registry.local:9001/k8s.gcr.io/pause:3.2": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized

Reproducibility
---------------
Intermittent - but seen frequently

System Configuration
--------------------
any

Branch/Pull Time/Commit
-----------------------
2020-05-13_20-00-00 and more recent builds

Last Pass
---------
Unknown - this seems to be an issue with the # of images in the system

Timestamp/Logs
--------------
The logs show an image pull failure as follows:
Warning FailedCreatePodSandBox 4m37s (x70 over 19m) kubelet, controller-0 Failed to create pod sandbox: rpc error: code = Unknown desc = failed to get sandbox image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to pull image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to pull and unpack image "registry.local:9001/k8s.gcr.io/pause:3.2": failed to resolve reference "registry.local:9001/k8s.gcr.io/pause:3.2": failed to authorize: failed to fetch anonymous token: unexpected status: 401 Unauthorized

Test Activity
-------------
Testing

Workaround
----------
None

Ghada Khalil (gkhalil)
tags: added: stx. stx.containers
tags: added: stx.4.0
removed: stx.
Changed in starlingx:
assignee: nobody → Bob Church (rchurch)
description: updated
Changed in starlingx:
importance: Undecided → High
status: New → Triaged
Revision history for this message
Brent Rowsell (brent-rowsell) wrote :

The containerd config file, needs to be updated to add the admin credentials

config.toml

[plugins.cri.registry.configs."registry.local:9001".auth]
      username = ""
      password = ""

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/732724

Changed in starlingx:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/732726

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/732724
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=7f921608842190b41e31c1cbea805d2e45ae8436
Submitter: Zuul
Branch: master

commit 7f921608842190b41e31c1cbea805d2e45ae8436
Author: Robert Church <email address hidden>
Date: Mon Jun 1 20:27:25 2020 -0400

    Update containerd registry.local configuration

    As part of bootstrap, k8s.gcr.io/pause:3.2 is pulled via crictl from
    registry.local with explicitly provided credentials. If this image is
    manually removed or removed due to garbage collection, containerd is
    unable to pull it from registry.local.

    Provide a complete registry.local configuration by:
     - Completing the configuration values for the TLS configuration.
     - Adding the auth setting for the auth configuration.

    Change-Id: I3214f68ef0b38d7267428f8a21b34ffc681e1f85
    Partial-Bug: #1881353
    Signed-off-by: Robert Church <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/732726
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=f6b33a95d97e7314c1f817fef62ea94c3c6f9d73
Submitter: Zuul
Branch: master

commit f6b33a95d97e7314c1f817fef62ea94c3c6f9d73
Author: Robert Church <email address hidden>
Date: Mon Jun 1 21:36:38 2020 -0400

    Update containerd registry.local configuration

    As part of bootstrap, k8s.gcr.io/pause:3.2 is pulled via crictl from
    registry.local with explicitly provided credentials. If this image is
    manually removed or removed due to garbage collection, containerd is
    unable to pull it from registry.local.

    Lookup the registry credentials so that they can be applied to the
    registry.local auth configuration in containerd's config.toml. This will
    allow containerd pull access when needed.

    Change-Id: Ie29a797a09879d7dff28356a2335980ab6c49bed
    Depends-On: https://review.opendev.org/#/c/732724/
    Closes-Bug: #1881353
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message
Ghada Khalil (gkhalil) wrote :

These commits caused an issue on duplex systems:
https://bugs.launchpad.net/starlingx/+bug/1882251

The code was reverted. Re-opening this to re-work and re-submit the original fix

Changed in starlingx:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733941

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to ansible-playbooks (master)

Fix proposed to branch: master
Review: https://review.opendev.org/733942

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to stx-puppet (master)

Reviewed: https://review.opendev.org/733941
Committed: https://git.openstack.org/cgit/starlingx/stx-puppet/commit/?id=048a95bf156a66eced3fb1f4f308acdfb5929229
Submitter: Zuul
Branch: master

commit 048a95bf156a66eced3fb1f4f308acdfb5929229
Author: Robert Church <email address hidden>
Date: Mon Jun 1 20:27:25 2020 -0400

    Update containerd registry.local configuration

    As part of bootstrap, k8s.gcr.io/pause:3.2 is pulled via crictl from
    registry.local with explicitly provided credentials. If this image is
    manually removed or removed due to garbage collection, containerd is
    unable to pull it from registry.local.

    Provide a complete registry.local configuration by:
     - Completing the configuration values for the TLS configuration.
     - Adding the auth setting for the auth configuration.

    Change-Id: I52529bb42cda64612a1c202b250db9135241ccc0
    Partial-Bug: #1881353
    Signed-off-by: Robert Church <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to ansible-playbooks (master)

Reviewed: https://review.opendev.org/733942
Committed: https://git.openstack.org/cgit/starlingx/ansible-playbooks/commit/?id=d9f0a9896f697c6b4cebaceb919eefb726a188c2
Submitter: Zuul
Branch: master

commit d9f0a9896f697c6b4cebaceb919eefb726a188c2
Author: Robert Church <email address hidden>
Date: Mon Jun 1 21:36:38 2020 -0400

    Update containerd registry.local configuration

    As part of bootstrap, k8s.gcr.io/pause:3.2 is pulled via crictl from
    registry.local with explicitly provided credentials. If this image is
    manually removed or removed due to garbage collection, containerd is
    unable to pull it from registry.local.

    Lookup the registry credentials so that they can be applied to the
    registry.local auth configuration in containerd's config.toml. This will
    allow containerd pull access when needed.

    Change-Id: I5095abbe44c4e9bab36726a336654284482e44b4
    Depends-On: https://review.opendev.org/#/c/733941/
    Closes-Bug: #1881353
    Signed-off-by: Robert Church <email address hidden>

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to stx-puppet (f/centos8)

Fix proposed to branch: f/centos8
Review: https://review.opendev.org/762919

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.