RHEL 8 master fs01 fails while getting image expected checksum after multiple retry

Bug #1855826 reported by chandan kumar
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

RHEL8 master periodic job fails multiple time while Getting image expected checksum.http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-master/9dcb66d/job-output.txt

2019-12-09 14:25:12.130765 | primary | TASK [fetch-images : Get image expected checksum] ******************************
2019-12-09 14:25:12.148354 | primary | Monday 09 December 2019 14:25:12 -0500 (0:00:00.077) 0:05:12.933 *******
2019-12-09 14:25:12.576790 | primary | FAILED - RETRYING: Get image expected checksum (10 retries left).
2019-12-09 14:25:27.939287 | primary | FAILED - RETRYING: Get image expected checksum (9 retries left).
2019-12-09 14:25:43.376268 | primary | FAILED - RETRYING: Get image expected checksum (8 retries left).
2019-12-09 14:25:58.780638 | primary | FAILED - RETRYING: Get image expected checksum (7 retries left).
2019-12-09 14:26:14.196125 | primary | FAILED - RETRYING: Get image expected checksum (6 retries left).
2019-12-09 14:26:29.604925 | primary | FAILED - RETRYING: Get image expected checksum (5 retries left).
2019-12-09 14:26:44.994088 | primary | FAILED - RETRYING: Get image expected checksum (4 retries left).
2019-12-09 14:27:00.369601 | primary | FAILED - RETRYING: Get image expected checksum (3 retries left).
2019-12-09 14:27:15.769492 | primary | FAILED - RETRYING: Get image expected checksum (2 retries left).
2019-12-09 14:27:31.162452 | primary | FAILED - RETRYING: Get image expected checksum (1 retries left).
2019-12-09 14:27:46.581736 | primary | fatal: [undercloud]: FAILED! => {
2019-12-09 14:27:46.581810 | primary | "attempts": 10,
2019-12-09 14:27:46.581829 | primary | "changed": true,
2019-12-09 14:27:46.581843 | primary | "cmd": [
2019-12-09 14:27:46.581853 | primary | "curl",
2019-12-09 14:27:46.581858 | primary | "-skfL",
2019-12-09 14:27:46.581862 | primary | "http://38.145.34.141/rcm-guest/images/redhat8/master/rdo_trunk/bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536/overcloud-full.tar.md5"
2019-12-09 14:27:46.581877 | primary | ],
2019-12-09 14:27:46.581883 | primary | "delta": "0:00:00.011974",
2019-12-09 14:27:46.581888 | primary | "end": "2019-12-09 14:27:46.556492",
2019-12-09 14:27:46.581893 | primary | "rc": 22,
2019-12-09 14:27:46.581897 | primary | "start": "2019-12-09 14:27:46.544518"
2019-12-09 14:27:46.581917 | primary | }
2019-12-09 14:27:46.581924 | primary |

It is seen multiple times.
While trying it on rdo-cloud
[cloud-user@devbox ~]$ curl -skfL http://38.145.34.141/rcm-guest/images/redhat8/master/rdo_trunk/bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536/overcloud-full.tar.md5
[cloud-user@devbox ~]$ echo $?
22

It might happens that the image with this hash bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536 does not exists on rdo-cloud. We need to improve the task to show the actual failure stating the image does not exists.

Tags: alert ci
Revision history for this message
Marios Andreou (marios-b) wrote :

o/ chandan... adding for context this was filed from https://bugs.launchpad.net/tripleo/+bug/1853978/comments/16

I think there may be some issue with https://review.opendev.org/#/c/697423/ but I'm not sure yet.

I logged onto the rcm guest and indeed there is no overcloud-full:

        [centos@rcn-share images]$ ll redhat8/master/rdo_trunk/bd316fa91fad3df7c4b2e7847399d03e16625a42_3599f536/
        total 506604
        -rw-rw-r--. 1 centos centos 518758400 Dec 9 12:34 ironic-python-agent.tar
        -rw-rw-r--. 1 centos centos 58 Dec 9 12:34 ironic-python-agent.tar.md5
        [centos@rcn-share images]$

BUT before that change /#/c/697423/ it would have been 'tripleo-ci-testing' and there also no overcloud-full there either!

        * [centos@rcn-share rdo_trunk]$ ll tripleo-ci-testing/
total 506636
        -rw-rw-r--. 1 centos centos 518789120 Dec 9 18:33 ironic-python-agent.tar
        -rw-rw-r--. 1 centos centos 58 Dec 9 18:34 ironic-python-agent.tar.md5

current-tripleo and previous look OK

        [centos@rcn-share rdo_trunk]$ ll current-tripleo/
        total 1634800
        -rw-rw-r--. 1 centos centos 518799360 Nov 18 00:30 ironic-python-agent.tar
        -rw-rw-r--. 1 centos centos 58 Nov 18 00:30 ironic-python-agent.tar.md5
        -rw-rw-r--. 1 centos centos 1155225600 Nov 18 00:35 overcloud-full.tar
        -rw-rw-r--. 1 centos centos 53 Nov 18 00:35 overcloud-full.tar.md5
        [centos@rcn-share rdo_trunk]$ ll previous-current-tripleo/
        total 1626692
        -rw-rw-r--. 1 centos centos 518768640 Nov 14 13:59 ironic-python-agent.tar
        -rw-rw-r--. 1 centos centos 58 Nov 14 13:59 ironic-python-agent.tar.md5
        -rw-rw-r--. 1 centos centos 1146951680 Nov 14 14:03 overcloud-full.tar
        -rw-rw-r--. 1 centos centos 53 Nov 14 14:04 overcloud-full.tar.md5
        [centos@rcn-share rdo_trunk]$

Revision history for this message
Marios Andreou (marios-b) wrote :

fix from chkumar there "Fix RHEL8 periodic fs01 job dependencies" Change-Id: I3a806d4059a5f419a25e8ed9aa2bf9e195d6b199 * https://review.rdoproject.org/r/#/c/24058/

Revision history for this message
chandan kumar (chkumar246) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

looks good in latest run after https://review.rdoproject.org/r/#/c/24058/ merged:

http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-rhel-8-ovb-3ctlr_1comp-featureset001-master/083861e/job-output.txt

2019-12-10 20:23:04.560709 | primary | TASK [fetch-images : Get image expected checksum] ******************************
2019-12-10 20:23:04.575455 | primary | Tuesday 10 December 2019 20:23:04 -0500 (0:00:00.063) 0:05:17.421 ******
2019-12-10 20:23:04.988724 | primary | changed: [undercloud]

Changed in tripleo:
status: Confirmed → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

seeing this on the train job though just added comment at https://bugs.launchpad.net/tripleo/+bug/1853978/comments/24 referencing this bug

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.