tripleo-get-hash current-tripleo for train on centos7 "error": "No module named yaml"

Bug #1943968 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Marios Andreou

Bug Description

2021-09-17 02:32:58.502924 | primary | TASK [repo-setup : tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo] ***
2021-09-17 02:32:58.503173 | primary | Friday 17 September 2021 02:32:58 +0000 (0:00:00.063) 0:00:46.140 ******
2021-09-17 02:32:59.485937 | primary | FAILED - RETRYING: tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo (5 retries left).
2021-09-17 02:33:05.092522 | primary | FAILED - RETRYING: tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo (4 retries left).
2021-09-17 02:33:10.683214 | primary | FAILED - RETRYING: tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo (3 retries left).
2021-09-17 02:33:16.386850 | primary | FAILED - RETRYING: tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo (2 retries left).
2021-09-17 02:33:22.452867 | primary | FAILED - RETRYING: tripleo-get-hash current-tripleo for train on centos7 from http://mirror.gra1.ovh.opendev.org:8080/rdo (1 retries left).
2021-09-17 02:33:28.074908 | primary | fatal: [subnode-1]: FAILED! => {
2021-09-17 02:33:28.075123 | primary | "attempts": 5,
2021-09-17 02:33:28.075232 | primary | "changed": false,
2021-09-17 02:33:28.075360 | primary | "error": "No module named yaml",
2021-09-17 02:33:28.075451 | primary | "success": false
2021-09-17 02:33:28.075494 | primary | }
2021-09-17 02:33:28.075532 | primary |
2021-09-17 02:33:28.075580 | primary | MSG:
2021-09-17 02:33:28.075616 | primary |

https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_15c/809495/1/check/tripleo-ci-centos-7-containers-multinode/15ccf8c/job-output.txt

https://7e0affbd464b261bcc2a-f8803ca7587e43f6b66c9edae98e760b.ssl.cf5.rackcdn.com/809427/2/gate/tripleo-ci-centos-7-containers-multinode/cd5b7bd/job-output.txt

https://zuul.opendev.org/t/openstack/builds?job_name=tripleo-ci-centos-7-containers-multinode

Tags: alert ci
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/809670

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

We tested in the RDO testenv with [1] but it seems there is
some difference in the upstream which is causing this bug

testing the partial revert from comment #1 above with DNM patch @ [2]

[1] https://review.rdoproject.org/r/c/testproject/+/35492/4#message-69eb12e4dfc23d16b1e2d2b96e6fed54b4a191d6
[2] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809687

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart/+/809670
Committed: https://opendev.org/openstack/tripleo-quickstart/commit/7bc104915dbb4c972030e75936542edf6778c4ec
Submitter: "Zuul (22348)"
Branch: master

commit 7bc104915dbb4c972030e75936542edf6778c4ec
Author: Marios Andreou <email address hidden>
Date: Fri Sep 17 16:10:15 2021 +0300

    Partial revert for tripleo-quickstart/+/791486 for centos7

    This is a partial revert of [1] specifically for the centos7
    case. We tested in the RDO testenv with [2] but it seems there is
    some difference in the upstream which is causing related-bug.

    This will allow us to unblock train while we consider how we can
    use get-hash with centos7.

    [1] https://review.opendev.org/c/openstack/tripleo-quickstart/+/791486
    [2] https://review.rdoproject.org/r/c/testproject/+/35492

    Related-Bug: 1943968
    Change-Id: I010e57b9123d7694e35539619264d49d6a6cbc34

Revision history for this message
Marios Andreou (marios-b) wrote (last edit ):

we are no longer blocked by this since we merged the partial revert in comment #3

poking a bit at this today, can reproduce on my local system.

Comes down to it installing pyyaml for python3, but then running the module with python2.

        * https://7e0affbd464b261bcc2a-f8803ca7587e43f6b66c9edae98e760b.ssl.cf5.rackcdn.com/809427/2/gate/tripleo-ci-centos-7-containers-multinode/cd5b7bd/job-output.txt
        * 2021-09-17 10:04:00.530959 | primary | +(/home/zuul/src/opendev.org/openstack/tripleo-ci/toci_gate_test.sh:47): sudo /usr/bin/yum -y '--exclude=python2*' install python3-setuptools python3-requests python3-urllib3 python3-PyYAML

I have a workaround/hack for it by force installing python2 pyyaml i will post a review for it but likely tomorrow so we can see if it is viable in the upstream ci then we can decide if we want to use it or not. we can also try to explicitly set the ansible_python_interpreter for the job

we can always just continue using the 'legacy' get hash for centos7 (i.e. https://opendev.org/openstack/tripleo-quickstart/src/commit/e16d99a2888c9cb2461a3958eeebae70d8e3cd4b/roles/repo-setup/tasks/get-dlrn-hash.yml#L16-L36 )

Changed in tripleo:
status: Triaged → In Progress
assignee: nobody → Marios Andreou (marios-b)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/tripleo-quickstart/+/810410

Revision history for this message
Marios Andreou (marios-b) wrote :

confirmed the issue described in comment #4.

The root of the bug is there [0] where we set:

 interpreter_python = auto

From the ansible docs [1]:

"Detects the target OS platform, distribution, and version, then consults a table listing the correct Python interpreter and path for each platform/distribution/version"

Since the jobs are on centos7 it decides that python2 is the right python for the ansible execution. However we are using python3 in those jobs.

With the patch at [2] I explicitly set the interpreter for get-hash to be python3. Test results at [3] are good. Centos7 at [4] and [5]

        [4]
        * 2021-09-22 11:43:29.960743 | primary | TASK [repo-setup : tripleo-get-hash current-tripleo for train on centos7 from https://trunk.rdoproject.org] ***
        * https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/809687/3/check/tripleo-ci-centos-7-content-provider/e4a2e4e/job-output.txt
        * 2021-09-22 11:43:29.960743 | primary | TASK [repo-setup : tripleo-get-hash current-tripleo for train on centos7 from https://trunk.rdoproject.org] ***
        * 2021-09-22 11:43:32.405325 | primary | TASK [repo-setup : tripleo-get-hash current for train on centos7 from https://trunk.rdoproject.org] ***
        * 2021-09-22 11:43:33.846358 | primary | ['the tq release is: train', 'the distro is: centos7', 'dlrn_hash is: c17c316f23f45892665cd6cc958e54f5a7ebec39_5447c0c9']

       [5]
        * 2021-09-22 13:14:40.508154 | primary | TASK [repo-setup : tripleo-get-hash current-tripleo for train on centos7 from https://trunk.rdoproject.org] ***
        * 2021-09-22 13:14:43.754305 | primary | TASK [repo-setup : tripleo-get-hash current for train on centos7 from https://trunk.rdoproject.org] ***
        * 2021-09-22 13:14:45.160935 | primary | ['the tq release is: train', 'the distro is: centos7', 'dlrn_hash is: c17c316f23f45892665cd6cc958e54f5a7ebec39_5447c0c9']

[0] https://opendev.org/openstack/tripleo-quickstart/src/commit/156bbec5cf1aeeedb08439621faff3f6da9eb2de/ansible.cfg#L13

[1] https://docs.ansible.com/ansible/latest/reference_appendices/interpreter_discovery.html

[2] https://review.opendev.org/c/openstack/tripleo-quickstart/+/810410

[3] https://review.opendev.org/c/openstack/tripleo-heat-templates/+/809687/3#message-1af73fc66b50695ae80c32ab29f0838724126958

[4] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_e4a/809687/3/check/tripleo-ci-centos-7-content-provider/e4a2e4e/job-output.txt

[5] https://a4a2534e9b63b785667d-c90ace1b1592b04a13c7070fc984a3af.ssl.cf2.rackcdn.com/809687/3/check/tripleo-ci-centos-7-containers-multinode/3c33cf2/job-output.txt

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart (master)

Reviewed: https://review.opendev.org/c/openstack/tripleo-quickstart/+/810410
Committed: https://opendev.org/openstack/tripleo-quickstart/commit/bc637b1bbbe116a69037302bdb50981387a3a5e8
Submitter: "Zuul (22348)"
Branch: master

commit bc637b1bbbe116a69037302bdb50981387a3a5e8
Author: Marios Andreou <email address hidden>
Date: Wed Sep 22 13:51:03 2021 +0300

    Explicitly set python3 for ansible with get-hash to fix c7 jobs

    As described in related bug (see comments 4,6) the ansible python
    interpreter is set as 'auto' so for the centos7 jobs the python2
    interpreter is used which caused the bug. Explicitly setting to
    python3 addresses the issue and we can wireup get-hash without
    needing a special case for centos7.

    Related-Bug: 1943968

    Change-Id: I299bdb7c74fa5eb05b1bffc856abaeaca2497dca

Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.