SNMP docker images are not being downloaded correctly

Bug #1952654 reported by Takamasa Takenaka
14
This bug affects 2 people
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Medium
Takamasa Takenaka

Bug Description

Brief Description
-----------------
The system fails to apply snmp application. The snmp pod gets stuck on ImagePullBackOff trying to pull trap-agent and subagent images.
When container registry url is modified for docker.io, we expect all snmp containers from configured url.

Severity
--------
<Critical: System/Feature is not usable after the defect>

Steps to Reproduce
------------------
1. Upload snmp application:
system application-upload /usr/local/share/applications/helm/snmp-1.0-24.tgz

2. Apply the snmp application:
system application-apply snmp

Expected Behavior
------------------
Application should reach 'applied' status.

Actual Behavior
----------------
Application reaches 'apply-failed' status

Reproducibility
---------------
<Reproducible/Intermittent/Seen once>
100% reproducible on ipv6 systems. (it could happen in IPv4)

System Configuration
--------------------
Two node system (But it could happen any type of configuration)

Timestamp/Logs
--------------
Based on sysinv.log, I can see that one of the 3 images was pulled from configured registry:

sysinv 2021-11-25 12:05:44.169 1163158 INFO sysinv.conductor.kube_app [-] Remove image [configured url]/docker.io/starlingx/stx-snmp:stx.6.0-v1.0.1 after push to local registry.

But the other two are being pulled from docker.io instead:

kubectl describe pod -n kube-system ns-snmp-69bfddbbc7-8rsnz
Warning Failed 4m43s kubelet Failed to pull image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/91/9139a807109ca64d9dafef774e4d980586a5fe21b47c65e86bf3fee6b1326df7/data?verify=1637844987-8IFJT8qnyy0zCYyvjxH%2BzbcJ5yE%3D": dial tcp [2606:4700::6812:7a19]:443: connect: network is unreachable
  Warning Failed 4m30s kubelet Failed to pull image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/54/544d5d06968212fa25360cfb067505ad4edf8a1ffc312d874d49e26025158b9c/data?verify=1637845000-0iRgYFQHPXRSjaNKmaUXePhZMQA%3D": dial tcp [2606:4700::6812:7919]:443: connect: network is unreachable
  Warning Failed 4m18s (x2 over 4m43s) kubelet Error: ErrImagePull
  Warning Failed 4m18s kubelet Failed to pull image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/91/9139a807109ca64d9dafef774e4d980586a5fe21b47c65e86bf3fee6b1326df7/data?verify=1637845012-Po3pO%2B6C0OVmVEBLi32uOkRuoFs%3D": dial tcp [2606:4700::6812:7d19]:443: connect: network is unreachable
  Warning Failed 4m17s kubelet Error: ImagePullBackOff
  Normal BackOff 4m17s kubelet Back-off pulling image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2"
  Warning Failed 4m17s kubelet Error: ImagePullBackOff
  Normal Pulling 4m3s (x3 over 5m7s) kubelet Pulling image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3"
  Warning Failed 3m51s (x3 over 4m55s) kubelet Error: ErrImagePull
  Normal Pulling 3m51s (x3 over 4m55s) kubelet Pulling image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2"
  Warning Failed 3m51s kubelet Failed to pull image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/54/544d5d06968212fa25360cfb067505ad4edf8a1ffc312d874d49e26025158b9c/data?verify=1637845039-1G8cHlH9ZCGVEYdu9e4isWi3gCA%3D": dial tcp [2606:4700::6812:7c19]:443: connect: network is unreachable
  Warning Failed 3m39s kubelet Failed to pull image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": rpc error: code = Unknown desc = failed to pull and unpack image "docker.io/starlingx/stx-fm-trap-subagent:stx.6.0-v1.0.2": failed to copy: httpReaderSeeker: failed open: failed to do request: Get "https://production.cloudflare.docker.com/registry-v2/docker/registry/v2/blobs/sha256/91/9139a807109ca64d9dafef774e4d980586a5fe21b47c65e86bf3fee6b1326df7/data?verify=1637845051-MSxAq00j8OZn9ftddiI8Br3WcFM%3D": dial tcp [2606:4700::6812:7b19]:443: connect: network is unreachable
  Normal BackOff 15s (x11 over 4m17s) kubelet Back-off pulling image "docker.io/starlingx/stx-fm-subagent:stx.6.0-v1.0.3"

And it's failing to retrieve this images if there is no connection to docker.io.

Test Activity
-------------
Regression Testing

Workaround
----------
Update snmp helm chart directly with configured url for stx-fm-subagent and stx-trap-subagent

system helm-override-update --set image.repository_subagent=[configured url]/docker.io/starlingx/stx-fm-subagent --reuse-values snmp snmp kube-system
system helm-override-update --set image.repository_trap_subagent=[configured url]/docker.io/starlingx/stx-fm-trap-subagent --reuse-values snmp snmp kube-system
system application-apply snmp

Changed in starlingx:
assignee: nobody → Takamasa Takenaka (ttakenak)
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to snmp-armada-app (master)
Ghada Khalil (gkhalil)
Changed in starlingx:
importance: Undecided → Critical
importance: Critical → Medium
tags: added: stx.6.0 stx.apps
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to snmp-armada-app (master)

Reviewed: https://review.opendev.org/c/starlingx/snmp-armada-app/+/819698
Committed: https://opendev.org/starlingx/snmp-armada-app/commit/e9fa90a028af92d6cd4fcc5d59bd9b00309c2c5b
Submitter: "Zuul (22348)"
Branch: master

commit e9fa90a028af92d6cd4fcc5d59bd9b00309c2c5b
Author: Takamasa Takenaka <email address hidden>
Date: Mon Nov 29 15:55:02 2021 -0300

    Modify values.yml to match expected format

    When system application-apply, the script will download
    containers and upload them to the local registry.
    (This process includes repository re-written with
    system parameter configuration)

    Current in values.yaml of snmp armada app, only
    stx-snmp container configuration matched to the
    expected pattern. So that the script downloaded
    only stx-snmp container but no other two containers.

    This fix modifies the format in values.xml to
    match the expected pattern (No value changes for
    repositories nor tags)

    Test Plan:
    PASS: Apply snmp-armada-app and confirm the followings:
          - Status becomes "applied"
          - All 3 containers download from configured repository
          - All 3 containers are in the local repository

    Closes-bug: 1952654

    Signed-off-by: Takamasa Takenaka <email address hidden>
    Change-Id: I8b742a2e211717b343f459443b15e947e5c6bd92

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.fault
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Duplicates of this bug

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.