cs9 master container build job is failing while pushing containers to quay rdo registry

Bug #1999749 reported by chandan kumar
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-build-containers-ubi-9-quay-push-master&skip=0
started failing from 2022-12-14 00:20:12.
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-build-containers-ubi-9-quay-push-master/3899865/logs/build.log
```
Command: sudo -E buildah push --tls-verify=False quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:33364cd38e49e95a1eccba4ac2cdb166 docker://quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:33364cd38e49e95a1eccba4ac2cdb166
2022-12-14 19:36:52 | Exit code: 125
2022-12-14 19:36:52 | Stdout: ''
2022-12-14 19:36:52 | Stderr: 'Getting image source signatures\nCopying blob sha256:0e431ea2052de8e32a3e1aea4d844f6de74c21fbd7e8f86293eb206d16f3d1a7\nCopying blob sha256:ce30e79639ed7bcea95407e6642e31b7d634ace0ddbe515e6d9e618dea424d55\nCopying blob sha256:910cf1adb0bba60e3951be8478c697bf5ae2783aa35f87228cf21bf028a63a83\nCopying blob sha256:31fcaf884188949a00b8647c0ae879ae4e6582b08995b660808f83af7d24aa11\nCopying blob sha256:29224f33151994fc0f5be854f5788a3f79ea6a1b362241653f4043cc87a1542b\nCopying blob sha256:d83d3545d486e78412f93573fcb90b403d772b5233c1c95c0322106f75e33d6a\nCopying blob sha256:5214be01f3874bd729471110c309dd1d6bfd2218cabd88115e5d79c14f9f5972\nCopying blob sha256:1e90abc37649edc3b38c775ad02999ceb573b6df9298a7d5c3c77240578c3ddc\nCopying blob sha256:d956ae89574ebac50556ffbc93e95dd61a1856e883ec12367d9068e5108c3fc6\nCopying blob sha256:c493e50fc760b1a6addf57ee749b7e076680839f1aae26416e9209a88ea92332\nCopying blob sha256:f345fcf87f5283759dd15d3c503501c120b6f9225cbbb8d8be719dd8e0ccaebd\nCopying blob sha256:253793bd0072a587c601ea86d7883a47dd41f51fe6517a62ca26368c1dc571a2\nCopying blob sha256:4ae6f44d85ebc8868abb2970909c0a869f3c2240c98bfedfb639b4c1f732dbcf\nError: pushing image "quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:33364cd38e49e95a1eccba4ac2cdb166" to "docker://quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:33364cd38e49e95a1eccba4ac2cdb166": writing blob: initiating layer upload to /v2/tripleomastercentos9/openstack-tripleo-ansible-ee/blobs/uploads/ in quay.rdoproject.org: unauthorized: access to the requested resource is not authorized\n'
```
While comparing with the passing one:
https://logserver.rdoproject.org/openstack-periodic-integration-main/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-build-containers-ubi-9-quay-push-master/5c43dd3/logs/build.log

```
 sudo buildah push --tls-verify=False quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:684a6a1d7e5378dd94520991e3b20e2a docker://quay.rdoproject.org/tripleomastercentos9/openstack-tripleo-ansible-ee:684a6a1d7e5378dd94520991e3b20e2a
```
By taking a depper look,
In passing job, the command was "sudo buildah" but in failing one "sudo -E buildah" is used.

867080: Preserve environment variables with buildah | https://review.opendev.org/c/openstack/tripleo-common/+/867080 added the same. it broke the container build.

Revision history for this message
chandan kumar (chkumar246) wrote :

In order to fix this issue, we need to use authfile to buildah login
```
❯ buildah login --help
Login to a container registry on a specified server.

Usage:
  buildah login [flags]

Examples:
  buildah login quay.io

Flags:
      --authfile string path of the authentication file. Use REGISTRY_AUTH_FILE environment variable to override
      --cert-dir string use certificates at the specified path to access the registry
      --get-login return the current login user for the registry (default true)
  -h, --help help for login
  -p, --password string Password for registry
      --password-stdin Take the password from stdin
      --tls-verify require HTTPS and verify certificates when accessing the registry. TLS verification cannot be used when talking to an insecure registry. (default true)
  -u, --username string Username for registry
  -v, --verbose Write more detailed information to stdout
```
authfile support is not present in tripleo-ansible tripleo-podman role https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_podman/tasks/buildah_login.yml#L35

https://github.com/rdo-infra/review.rdoproject.org-config/blob/5caeb76df5feeee515163635334c4cc6dd7f8620/playbooks/tripleo-rdo-base/container-login.yaml#L88 is used in container build job.

So I think we need to revert the above patch and then add authfile support in tripleo-podman and re-revert the patch.

Revision history for this message
Amol Kahat (amolkahat) wrote :

I tried removing buildcontainer_venv: false[1], which won't work.

Review[2] is about to revert -E option from the tripleo-common, which won't get pulled in the job.

Further debugging lead that it is issue with buildah login. After holding node, I'm not able to find buildha logged in for quay.rdoproject.org.

[1] https://review.rdoproject.org/r/c/config/+/46222
[2] https://review.opendev.org/c/openstack/tripleo-common/+/867641

Revision history for this message
Ronelle Landy (rlandy) wrote :

We held a node and investigated the login.

This is a master only problem. When the registry is logged into manually from the node, the container push works - even with the -E.

Also verified that the master registry password is correct.

Looking at https://trunk.rdoproject.org/centos9-master/report.html - we also picked up https://review.opendev.org/c/openstack/tripleo-ansible/+/865047 that renamed the roles and https://github.com/openstack/tripleo-ansible/commit/3efd098300e4972578ec516bf1a205992cf9f72d to delete the old ones - although https://github.com/rdo-infra/review.rdoproject.org-config/commit/0fe66337e3817bf804d4e01a6b6e8bfdb9e5556c should have corrected for that

Revision history for this message
Ronelle Landy (rlandy) wrote :

Following up on Chandan's comment:

authfile support is not present in tripleo-ansible tripleo-podman role https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansible/roles/tripleo_podman/tasks/buildah_login.yml#L35 ... does the -E change then force authfile requirement?

This works in Zed ... https://github.com/openstack/tripleo-ansible/blob/stable/zed/tripleo_ansible/roles/tripleo_podman/tasks/tripleo_podman_buildah_login.yml#L40 - the only difference here is the --tls-verify.

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

If we still have access to the node, can we check the following items ?
 1. /home/zuul/containers/auth.json
 2. /root/containers/auth.json
 3. sudo printenv | grep REGISTRY_AUTH_FILE
 4. sudo -E printenv | grep REGISTRY_AUTH_FILE

I suspect the REGISTRY_AUTH_FILE environment is overridden in a wrong way and
the file created initially by the login user is somehow hidden.

Revision history for this message
chandan kumar (chkumar246) wrote :

Hello Takashi,

I hold the node and I have checked the following items
```
[zuul@node-0003383040 ~]$ ls /home/zuul/containers/auth.json
ls: cannot access '/home/zuul/containers/auth.json': No such file or directory
[zuul@node-0003383040 ~]$ ls /root/containers/auth.json
ls: cannot access '/root/containers/auth.json': Permission denied
[zuul@node-0003383040 ~]$ sudo ls /root/containers/auth.json
ls: cannot access '/root/containers/auth.json': No such file or directory
[zuul@node-0003383040 ~]$ sudo printenv | grep REGISTRY_AUTH_FILE
[zuul@node-0003383040 ~]$ sudo -E printenv | grep REGISTRY_AUTH_FILE
[zuul@node-0003383040 ~]$
```

Revision history for this message
chandan kumar (chkumar246) wrote :

Takashi is looking into the issue to find out why REGISTRY_AUTH_FILE is getting overriden.

There is no job in the upstream which exercises registry login. Periodic container build does the
registry login to push containers to registry that's why we detected it here.
In the mean time, i think we can merge the https://review.opendev.org/c/openstack/tripleo-common/+/867641 revert to unblock the master integration line.

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
Takashi Kajinami (kajinamit) wrote :

It seems buildah login generates the auth.json file in /run/user/<uid>/containers/ and if we use -E, the buildah command looks for /run/user/1000/containers instead of /run/user/1/containers.
(1000 is uid of the zuul user)

Revision history for this message
Takashi Kajinami (kajinamit) wrote :

Can we try this alternative approach to check whether my observation is correct ?
 https://review.opendev.org/c/openstack/tripleo-ansible/+/868057

We probably want to fix the original patch to override the specific environment, though.

Revision history for this message
chandan kumar (chkumar246) wrote (last edit ):

Thank you Takashi for digging deep into that, I think you are correct. I tried the buildah login on the box.
```
[zuul@node-0003383040 ~]$ buildah login -v -u ** quay.rdoproject.org
Password:
Used: /run/user/1000/containers/auth.json
Login Succeeded!
[zuul@node-0003383040 ~]$ sudo buildah login -v -u ** quay.rdoproject.org
Password:
Used: /run/containers/0/auth.json
Login Succeeded!
[zuul@node-0003383040 ~]$
```
In periodic container build job, we are running buildah using sudo, so, we need to point
REGISTRY_AUTH_FILE to /run/containers/0/auth.json.

```
[zuul@node-0003383040 ~]$ export REGISTRY_AUTH_FILE=/run/containers/0/auth.json
[zuul@node-0003383040 ~]$ sudo buildah login -v -u tripleo quay.rdoproject.org
Password:
[zuul@node-0003383040 ~]$ sudo -E buildah login -v quay.rdoproject.org
Authenticating with existing credentials for quay.rdoproject.org
Existing credentials are valid. Already logged in to quay.rdoproject.org
[zuul@node-0003383040 ~]$ sudo buildah login -v quay.rdoproject.org
Authenticating with existing credentials for quay.rdoproject.org
Existing credentials are valid. Already logged in to quay.rdoproject.org
```
I think we can fix it in config side by passing this env var and try it out.

Revision history for this message
Marios Andreou (marios-b) wrote :

thanks Takashi we are going with the revert since we are blocked for a few days now https://review.opendev.org/c/openstack/tripleo-common/+/867641/2#message-d6bf6a5148855b9974a335b451559d2726c79539

Then we can revert revert that and also include Takashi's fix at https://review.opendev.org/c/openstack/tripleo-ansible/+/868057 which we can merge without the pressure of a gate blocker.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-ci (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/868063

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "chandan kumar <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/868063
Reason: It does not work, abandoning it.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-common/+/867641
Committed: https://opendev.org/openstack/tripleo-common/commit/8d2f7e7a4213dfed1e7d1fefc157c38ed3439f3d
Submitter: "Zuul (22348)"
Branch: master

commit 8d2f7e7a4213dfed1e7d1fefc157c38ed3439f3d
Author: chandan kumar <email address hidden>
Date: Thu Dec 15 15:07:06 2022 +0000

    Revert "Preserve environment variables with buildah"

    This reverts commit 140fc48e736e9df5cb36fd30b0e683c79df5c333.

    Reason for revert:
    The original change breaks authentication without authfile.

    Closes-Bug: #1999749
    Change-Id: Ifbfba09c8ac552cc3c18adb482540541a282488b

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.