build-tools: docker image tagging script fails due to rate limit

Bug #2003898 reported by Davlet Panech
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Fix Released
Low
Davlet Panech

Bug Description

Brief Description
-----------------
The tagging script for docker images has started to fail recently. This doesn't affect daily builds, but prevents us from creating new docker image tags, other than manually.

The script is:

https://opendev.org/starlingx/root/src/branch/master/build-tools/build-docker-images/tag-management/retag-images.sh

It makes docker REST API calls to determine whether each tag exists, in quick succession. It uses docker's public un-authenticated API for that. It seems that recently docker added a rate limit to that API, so that one can only make a certain number of API calls per second. So the script starts to fail after the first few images and wrongly attempts to push tags that already exist in docker hub.

Most likely we need to update the script to read docker credentials from $HOME/.docker/config and pass them to curl. Better error handling would be great as well.

Severity
--------
Major

Steps to Reproduce
------------------
Run the script in dry-run mode:

./retag-images.sh --dryrun image-tags.yaml

Expected Behavior
------------------
It should output lines similar to:

Image tag exists: docker.io/starlingx/intel-fpga-admissionwebhook:stx.4.0-v0.11.0-103-g4f28657

for each image in the .yaml file

Actual Behavior
----------------

It outputs "Image tag exists ..." for the first few images, then attempts to push the rest of them. Apparently the "Does tag exist?" API call returns false due to a rate limit.

Reproducibility
---------------
Reproducible

System Configuration
--------------------
N/A

Branch/Pull Time/Commit
-----------------------
master/2023-01-25

Last Pass
---------
Unknown

Timestamp/Logs
--------------
N/A

Test Activity
-------------
N/A

Workaround
----------
Tag and push the image manually using raw docker commands.

Ghada Khalil (gkhalil)
tags: added: stx.build stx.tools
Ghada Khalil (gkhalil)
Changed in starlingx:
assignee: nobody → Davlet Panech (dpanech)
importance: Undecided → Low
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to root (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/starlingx/root/+/882485

Changed in starlingx:
status: New → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to root (master)

Reviewed: https://review.opendev.org/c/starlingx/root/+/882485
Committed: https://opendev.org/starlingx/root/commit/775ad108af1b882bc1aced30d17eb4d6b92d23a5
Submitter: "Zuul (22348)"
Branch: master

commit 775ad108af1b882bc1aced30d17eb4d6b92d23a5
Author: Davlet Panech <email address hidden>
Date: Fri May 5 16:57:39 2023 -0400

    docker-images: better registry error handling

    This commit enables better error detection when checking whether an
    image/tag exists in a remote registry. Current implementation sometimes
    falsely believes a remote tag is missing and attempts to (re-)push the
    images, potentially overwriting them.

    Examples:
    - Registry is not reachable due to a temporary network outage
    - With docker.io: we exceed the request rate limit. Original script
      looked for remote tags by enumerating all tags. This resulted in
      dozens of REST calls per image, occasionally exceeding Dockerhub's
      request limit.

    Solution: add new script that exits on connectivity errors, rather than
    returning false. Script requires an external tool, regctl:
      https://github.com/regclient/regclient

    TESTS
    ====================================
    - Test with missing/existing images in Harbor, DockerHub and
      AWS ECR registries, as well as various connectivity errors.
    - Run retag-images.sh and make sure it still works

    Closes-Bug: 2003898
    Change-Id: Id9dd0c30580748c0c4c4bfbbd520d4d38bdd2ec6
    Signed-off-by: Davlet Panech <email address hidden>

Changed in starlingx:
status: In Progress → Fix Released
Ghada Khalil (gkhalil)
tags: added: stx.9.0
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.