update/upgrades jobs are failing in check/gate after promotions - "Not found image:"

Bug #1946659 reported by Ronelle Landy
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Arx Cruz

Bug Description

update/upgrades jobs are failing in check/gate after promotions - when the current-tripleo hash changes.

Example: https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1b2/813332/1/gate/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-wallaby/1b24950/logs/undercloud/home/zuul/undercloud_install.log

2021-10-11 15:16:39 | 2021-10-11 15:16:39.440014 | bc764e10-15c2-b688-bae4-000000000dc5 | FATAL | Container image prepare | undercloud | error={"changed": false, "error": "Not found image: http://104.130.219.216:5001/v2/tripleowallaby/openstack-heat-api/manifests/c8b1d5f1870396b76e2993c4c38bcc4a", "msg": "Error running container image prepare: Not found image: http://104.130.219.216:5001/v2/tripleowallaby/openstack-heat-api/manifests/c8b1d5f1870396b76e2993c4c38bcc4a", "params": {}, "success": false}

The failure is actually in the undercloud install - and that install should reference the hash used on the content-provider job.

The job is looking for hash c8b1d5f1870396b76e2993c4c38bcc4a.
The content-provider job:

@/home/zuul/workspace/.quickstart/config/release/tripleo-ci/CentOS-8/wallaby.yml
  "container_build_id": "72ff71f4a1ce42926290f04e69c16ff4"

Revision history for this message
Ronelle Landy (rlandy) wrote :

current-tripleo/ 2021-10-11 14:25 -
72ff71f4a1ce42926290f04e69c16ff4/ 2021-10-10 13:56 -

Changed in tripleo:
milestone: none → xena-3
importance: Undecided → Critical
status: New → Triaged
assignee: nobody → Arx Cruz (arxcruz)
tags: added: ci
Revision history for this message
Ronelle Landy (rlandy) wrote :

On container-multinode job:

2021-10-11 15:03:59.718299 | primary | TASK [container-prep : echo container_build_id] ********************************
2021-10-11 15:03:59.718434 | primary | Monday 11 October 2021 15:03:59 +0000 (0:00:00.062) 0:05:33.523 ********
2021-10-11 15:03:59.751342 | primary | ok: [undercloud] => {
2021-10-11 15:03:59.751381 | primary | "container_build_id": "72ff71f4a1ce42926290f04e69c16ff4"
2021-10-11 15:03:59.751398 | primary | }

2021-10-11 14:58:40.304801 | primary | MSG:
2021-10-11 14:58:40.304816 | primary |
2021-10-11 14:58:40.304829 | primary | ['the tq release is: wallaby', 'the distro is: centos8', 'dlrn_hash is: 72ff71f4a1ce42926290f04e69c16ff4']

Revision history for this message
Ronelle Landy (rlandy) wrote :

On updates job:

2021-10-11 15:01:32.446752 | primary | TASK [repo-setup : print out dlrn, release, distro info] ***********************
2021-10-11 15:01:32.446847 | primary | Monday 11 October 2021 15:01:32 +0000 (0:00:00.061) 0:00:13.179 ********
2021-10-11 15:01:32.478415 | primary | ok: [subnode-1] => {}
2021-10-11 15:01:32.478475 | primary |
2021-10-11 15:01:32.478492 | primary | MSG:
2021-10-11 15:01:32.478507 | primary |
2021-10-11 15:01:32.478521 | primary | ['the tq release is: wallaby', 'the distro is: centos8', 'dlrn_hash is: c8b1d5f1870396b76e2993c4c38bcc4a']
2021-10-11 15:01:32.493641 | primary |

Revision history for this message
Ronelle Landy (rlandy) wrote :

On container-multinode:

2021-10-11 15:05:21.458735 | primary | TASK [repo-setup : print out dlrn, release, distro info] ***********************
2021-10-11 15:05:21.458782 | primary | Monday 11 October 2021 15:05:21 +0000 (0:00:00.066) 0:00:19.159 ********
2021-10-11 15:05:21.491643 | primary | ok: [subnode-1] => {}
2021-10-11 15:05:21.491736 | primary |
2021-10-11 15:05:21.491761 | primary | MSG:
2021-10-11 15:05:21.491797 | primary |
2021-10-11 15:05:21.491821 | primary | ['the tq release is: wallaby', 'the distro is: centos8', 'dlrn_hash is: 72ff71f4a1ce42926290f04e69c16ff4']

Ronelle Landy (rlandy)
tags: added: promotion-blocker
Douglas Viroel (dviroel)
Changed in tripleo:
milestone: xena-3 → yoga-1
Revision history for this message
Jiri Podivin (jpodivin) wrote :
Revision history for this message
chandan kumar (chkumar246) wrote :

Here is the updates:
Master current-tripleo containers got prunned from RDO registry due to letsencrypt cert validation issue (https://review.rdoproject.org/r/c/rdo-infra/rdo-infra-playbooks/+/36326 will fix it).

We are rebuilding and pushing the
master containers via this patch https://review.rdoproject.org/r/c/rdo-jobs/+/36327

Once the job finishes, Please recheck the affected patch.

Revision history for this message
Marios Andreou (marios-b) wrote :

https://review.opendev.org/c/openstack/tripleo-ci/+/813629/9..10#message-e0a0151ce11c43a71e3e03e7464dab7e99234d77

so as commented on that change and as discussed in yesterday's scrum, we will also need to address the '--install-hash-override' as well at some point ie. the version that is installed.

This change is addressing the '--upgrade-hash-override' i.e. the target/upgrade-to version.

FTR I found an example of the issue for the upgrade-from version this morning

        * https://e4cf1ab71b6ca5f5dc46-fe390436bababd65005a5c1c9412b532.ssl.cf5.rackcdn.com/815775/1/gate/tripleo-ci-centos-8-undercloud-upgrade/de8cbed/logs/undercloud/home/zuul/undercloud_install.log
        * 2021-10-28 20:43:10.182778 | bc764e10-04ed-6e6e-aa45-000000000da8 | FATAL | Container image prepare | undercloud | error={"changed": false, "error": "Not found image: http://104.130.219.215:5001/v2/tripleowallaby/openstack-rsyslog/manifests/8019012886459075100400fbcc95c36a", "msg": "Error running container image prepare: Not found image: http://104.130.219.215:5001/v2/tripleowallaby/openstack-rsyslog/manifests/8019012886459075100400fbcc95c36a", "params": {}, "success": false}

 09:55 < marios> arxcruz: just seen a case of missing containers for the 'install' version (not the 'upgrade to' version) there.

That being said, I think we should go ahead with this especially if we are seeing it more frequently hitting the target/upgrade-to version, but bearing in mind we aren't quite done until we can override the install version too.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/813629
Committed: https://opendev.org/openstack/tripleo-ci/commit/0b0aa53a9bc3298ca593ac43afe873e8b3c5dfe9
Submitter: "Zuul (22348)"
Branch: master

commit 0b0aa53a9bc3298ca593ac43afe873e8b3c5dfe9
Author: Arx Cruz <email address hidden>
Date: Tue Oct 12 15:18:24 2021 +0200

    Ensure upgrade job read the proper hash

    From time to time, due a promotion happening between the time
    that the content provider job executes, and the upgrade job
    the upgrade job fails because it can not find the hash in the
    content provider. This is because the upgrade job, instead of
    rely on the zuul job variable passed by the content provider,
    it reads in execution time, the current-tripleo hash from
    rdo registry. This patch ensure that if there is a zuul variable
    passed by content provider, it will use that hash, instead of
    read it from rdo registry.

    Related-Bug: #1946659
    Change-Id: I273243ced8b83aecec9da775e7f7a3c3865b4fc3

Revision history for this message
Marios Andreou (marios-b) wrote :

related patch at https://review.opendev.org/c/openstack/tripleo-ci/+/816991 Properly set the hash for undercloud upgrade jobs

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/821694

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-ci (master)

Change abandoned by "Marios Andreou <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/821694
Reason: was meant to update that https://review.opendev.org/c/openstack/tripleo-ci/+/816991/15

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

I have good results with https://review.opendev.org/c/openstack/tripleo-ci/+/816991/16#message-199f53afaae2f5a30fe54be0bdf3c57b348e8b54

> test looks good there https://review.rdoproject.org/r/c/testproject/+/37144/2#message-a17dedf7719520e6c68697ac973184fc73f1e981
>
> * https://logserver.rdoproject.org/44/37144/2/check/tripleo-ci-centos-8-undercloud-upgrade/55489ae/logs/quickstart_files/emit_releases_file.log
> Using hash override c167f95a3640d3950756d193c0a3dc25 for branch master
> Doing an undercloud upgrade
> Using hash override 4a9f570662f801b4340cb5e257eb6b5f for branch wallaby
>
> * https://logserver.rdoproject.org/44/37144/2/check/tripleo-ci-centos-8-undercloud-upgrade/55489ae/logs/quickstart_files/releases.sh
> export UNDERCLOUD_INSTALL_RELEASE="wallaby"
> export UNDERCLOUD_INSTALL_HASH="4a9f570662f801b4340cb5e257eb6b5f"
> export UNDERCLOUD_TARGET_RELEASE="master"
> export UNDERCLOUD_TARGET_HASH="c167f95a3640d3950756d193c0a3dc25"
>
> * https://logserver.rdoproject.org/44/37144/2/check/tripleo-ci-centos-8-undercloud-upgrade/55489ae/zuul-info/inventory.yaml
> provider_dlrn_hash_tag_branch: &id005
> master: c167f95a3640d3950756d193c0a3dc25
> wallaby: 4a9f570662f801b4340cb5e257eb6b5f
>

so trying to get https://review.opendev.org/c/openstack/tripleo-ci/+/816991/16 merged

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by "Ronelle Landy <email address hidden>" on branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/816991
Reason: https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_874/821597/1/gate/tripleo-ci-centos-8-scenario000-multinode-oooq-container-updates-train/87484a1/job-output.txt

fails gate jobs below:

2021-12-15 20:07:24.031174 | primary | File "/home/zuul/src/opendev.org/openstack/tripleo-ci/scripts/emit_releases_file/emit_releases_file.py", line 550, in <module>

2021-12-15 20:07:24.031401 | primary | AttributeError: 'NoneType' object has no attribute 'split'

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/816991
Committed: https://opendev.org/openstack/tripleo-ci/commit/4a6b1f0bcf57816890f36c3a6c385bfae9fd1965
Submitter: "Zuul (22348)"
Branch: master

commit 4a6b1f0bcf57816890f36c3a6c385bfae9fd1965
Author: Arx Cruz <email address hidden>
Date: Mon Nov 8 12:17:04 2021 +0100

    Allow override for install and target hash in emit releases undercloud

    As described in related-bug, this allows us to override the install
    and target delorean hash for use by consumer jobs. There was a previous
    attempt with [1] which added the ability to override the target hash.

    As commented in the bug (see comment #7) we need to override both target
    and install versions. Since this is nested bash/jinja/python :/ we do this
    by passing the content provider branches as a string
    "branch1:hash1;branch2:hash2" generated in the jinja templating
    which is then decoded on the emit-releases python side.

    [1] https://review.opendev.org/c/openstack/tripleo-ci/+/813629
    Co-Authored-By: Marios Andreou <email address hidden>
    Related-Bug: 1946659

    Change-Id: I9e0162f88cf262957234bf946ad3c013f6213891

Changed in tripleo:
status: Triaged → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-ci (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-ci/+/822482

Revision history for this message
Marios Andreou (marios-b) wrote :
Changed in tripleo:
status: Fix Released → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-ci (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-ci/+/822482
Committed: https://opendev.org/openstack/tripleo-ci/commit/4f84e22440f81e8a141d78cdb1d4f99d9506d0c3
Submitter: "Zuul (22348)"
Branch: master

commit 4f84e22440f81e8a141d78cdb1d4f99d9506d0c3
Author: Marios Andreou <email address hidden>
Date: Tue Dec 21 13:00:57 2021 +0200

    Allow hash override for emit releases script for minor update

    In [1] we addressed related-bug for the undercloud upgrade. This
    addresses the minor update case.

    Tests at [2] (DNM patches on tht branches) & [3] (rdo).

    [1] https://review.opendev.org/c/openstack/tripleo-ci/+/816991
    [2] https://review.opendev.org/q/topic:lp1946659
    [3] https://review.rdoproject.org/r/c/testproject/+/37144/2..3#message-f194606c5a1d3e495e65fe092673796250394437

    Related-Bug: 1946659
    Change-Id: Ib9b7d16d6e7888a16c9fae22a3035e48600519ab

Revision history for this message
Marios Andreou (marios-b) wrote :

moving fix released now that https://review.opendev.org/c/openstack/tripleo-ci/+/822482 merged
that addresses the issue for the minor update.

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.