tripleo-ci-centos-9-scenario00{1,4}-ceph-nightly cannot pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel

Bug #1981329 reported by John Fulton
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Douglas Viroel

Bug Description

The nightly ceph jobs [1] have been failing to pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel [2]

[1] https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario001-ceph-nightly&job_name=tripleo-ci-centos-9-scenario004-ceph-nightly&skip=0

[2] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_175/periodic/opendev.org/openstack/tripleo-ci/master/tripleo-ci-centos-9-scenario004-ceph-nightly/1754ef1/logs/undercloud/home/zuul/ansible.log

2022-07-11 03:51:48,488 p=64590 u=root n=ansible | 2022-07-11 03:51:48.487781 | bc764e20-0814-29c3-e9b9-000000000070 | FATAL | Run cephadm bootstrap | standalone.localdomain | error={"changed": true, "cmd": "/usr/sbin/cephadm --image quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel \\bootstrap --skip-firewalld --ssh-private-key /home/ceph-admin/.ssh/id_rsa --ssh-public-key /home/ceph-admin/.ssh/id_rsa.pub --ssh-user ceph-admin --allow-fqdn-hostname --output-keyring /etc/ceph/ceph.client.admin.keyring --output-config /etc/ceph/ceph.conf --fsid 9dd4d899-0dcc-5f50-a744-1d88701e712a --config /home/ceph-admin/assimilate_ceph.conf \\--single-host-defaults \\--skip-monitoring-stack --skip-dashboard --log-to-file --skip-mon-network \\--mon-ip 192.168.42.1\n", "delta": "0:00:01.519565", "end": "2022-07-11 03:51:48.461089", "msg": "non-zero return code", "rc": 1, "start": "2022-07-11 03:51:46.941524", "stderr": "Verifying podman|docker is present...\nVerifying lvm2 is present...\nVerifying time synchronization is in place...\nUnit chronyd.service is enabled and running\nRepeating the final host check...\npodman (/bin/podman) version 4.1.1 is present\nsystemctl is present\nlvcreate is present\nUnit chronyd.service is enabled and running\nHost looks OK\nCluster fsid: 9dd4d899-0dcc-5f50-a744-1d88701e712a\nVerifying IP 192.168.42.1 port 3300 ...\nVerifying IP 192.168.42.1 port 6789 ...\nInternal network (--cluster-network) has not been provided, OSD replication will default to the public_network\nAdjusting default settings to suit single-host cluster...\nPulling container image quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel...\nNon-zero exit code 125 from /bin/podman pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel\n/bin/podman: stderr Trying to pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel...\n/bin/podman: stderr Error: initializing source docker://quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel: reading manifest latest-pacific-devel in quay.rdoproject.org/tripleomastercentos9/daemon: manifest unknown: manifest unknown\nERROR: Failed command: /bin/podman pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel", "stderr_lines": ["Verifying podman|docker is present...", "Verifying lvm2 is present...", "Verifying time synchronization is in place...", "Unit chronyd.service is enabled and running", "Repeating the final host check...", "podman (/bin/podman) version 4.1.1 is present", "systemctl is present", "lvcreate is present", "Unit chronyd.service is enabled and running", "Host looks OK", "Cluster fsid: 9dd4d899-0dcc-5f50-a744-1d88701e712a", "Verifying IP 192.168.42.1 port 3300 ...", "Verifying IP 192.168.42.1 port 6789 ...", "Internal network (--cluster-network) has not been provided, OSD replication will default to the public_network", "Adjusting default settings to suit single-host cluster...", "Pulling container image quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel...", "Non-zero exit code 125 from /bin/podman pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel", "/bin/podman: stderr Trying to pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel...", "/bin/podman: stderr Error: initializing source docker://quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel: reading manifest latest-pacific-devel in quay.rdoproject.org/tripleomastercentos9/daemon: manifest unknown: manifest unknown", "ERROR: Failed command: /bin/podman pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel"], "stdout": "", "stdout_lines": []}

Revision history for this message
John Fulton (jfulton-org) wrote :

09:22 <dviroel|rover> fpantano: hey there, fyi: we need to fix usptream ceph-nightly jobs later, the new ceph deploy changed some bits, the job is not getting nightly image from the correct registry/namespace
10:11 <fultonj> dviroel|rover: i see it couldn't pull quay.rdoproject.org/tripleomastercentos9/daemon:latest-pacific-devel
10:13 <fpantano> is that something related to the same issue we see a couple of weeks ago?
10:13 <fpantano> the container being purged ?
10:13 <dviroel|rover> fultonj: we don't push latest-pacific-devel to quay.rdoproject, we need to get it from ceph quay
10:14 <fpantano> fultonj: we pull it from https://quay.io/repository/ceph/daemon?tab=tags
10:14 <fpantano> dviroel|rover: and push it under tripleomastercentos9 ?
10:16 <dviroel|rover> fpantano: we dont push latest-devel one, nightly job pulls it from quay.io/ceph
10:17 <fpantano> dviroel|rover: ah I see, right, we don't push that container daily, is this issue related to the recent quickstart patches?
10:17 <dviroel|rover> ack
10:18 <dviroel|rover> fpantano: yes, most probably

Revision history for this message
John Fulton (jfulton-org) wrote :
Revision history for this message
John Fulton (jfulton-org) wrote :

Nightly jobs worked until June 24th and haven't since then [1]
On the 24th we merged the following to fix https://bugs.launchpad.net/tripleo/+bug/1978998

  https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/846231/10/roles/standalone/tasks/ceph-install.yml

So I suspect the above broke ceph periodic in upstream master.

We need to fix it without re-introducing LP 1978998.

[1] https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-9-scenario001-ceph-nightly&job_name=tripleo-ci-centos-9-scenario004-ceph-nightly&skip=0

Revision history for this message
John Fulton (jfulton-org) wrote :

We should be able to patch tripleo-quickstart extras and then add a depends-on to the patch in the following so that we can test the periodic job:

  https://review.rdoproject.org/r/c/testproject/+/36256

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)
Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
John Fulton (jfulton-org) wrote :

11:19 <dviroel|rover> fultonj: the ceph promotion jobs seems to be working properly: https://logserver.rdoproject.org/periodic-promote-ceph-daemon/opendev.org/openstack/tripleo-ci/master/tripleo-ci-centos-9-scenario001-standalone-ceph-updates-master/49f712a/logs/undercloud/var/log/ceph/cephadm.log.txt.gz
11:19 <dviroel|rover> right?
11:20 <dviroel|rover> the issue is only with nightly
11:28 <opendevreview> Douglas Viroel proposed openstack/tripleo-ci master: Enable container_ceph_updates for ceph nightly https://review.opendev.org/c/openstack/tripleo-ci/+/849551
11:35 <dviroel|rover> fultonj: trying a small fix on nightly jobs ^ - testing on rdo

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by "John Fulton <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/849545
Reason: Trying this instead https://review.opendev.org/c/openstack/tripleo-ci/+/849551

Revision history for this message
John Fulton (jfulton-org) wrote :
Changed in tripleo:
assignee: nobody → Douglas Viroel (dviroel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/849551
Committed: https://opendev.org/openstack/tripleo-ci/commit/9c2b0cbe9714be28a79d4d12a3c69f7a29957ebe
Submitter: "Zuul (22348)"
Branch: master

commit 9c2b0cbe9714be28a79d4d12a3c69f7a29957ebe
Author: Douglas Viroel <email address hidden>
Date: Tue Jul 12 12:22:24 2022 -0300

    Enable container_ceph_updates for ceph nightly

    This patch enables standalone_container_ceph_updates flag
    in ceph nightly jobs. It was working before [1] because consumer
    jobs were able to define different ceph tags and namespaces in
    standalone role [2], but after [1], we restricted that only to
    jobs that have that flag enabled.

    Closes-Bug: #1981329

    [1] https://review.opendev.org/c/openstack/tripleo-quickstart-extras/+/846231
    [2] https://opendev.org/openstack/tripleo-quickstart-extras/src/commit/7482df1efe87428b264fdea9b72ee4c7d383dbd2/roles/standalone/tasks/containers.yml#L128

    Change-Id: I1cb85435d58f17bb8157a2dc01a9ba021b669959
    Signed-off-by: Douglas Viroel <email address hidden>

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.