centos-8 standalone-upgrade-ussuri fails build-test-packages issue creating /root/DLRN

Bug #1895138 reported by Marios Andreou
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
wes hayutin

Bug Description

The tripleo-ci-centos-8-standalone-upgrade-ussuri fails at [1][2][3] during build-test-packages with a trace like:

        2020-09-08 22:23:08.779092 | primary | TASK [build-test-packages : Ensure DLRN dir is present] ************************
        2020-09-08 22:23:08.779111 | primary | Tuesday 08 September 2020 22:23:08 +0000 (0:00:00.111) 0:01:35.467 *****
        2020-09-08 22:23:09.596004 | primary | fatal: [undercloud]: FAILED! => {
        2020-09-08 22:23:09.596072 | primary | "changed": false,
        2020-09-08 22:23:09.596110 | primary | "path": "/root/DLRN/"
        2020-09-08 22:23:09.596128 | primary | }
        2020-09-08 22:23:09.596145 | primary |
        2020-09-08 22:23:09.596162 | primary | MSG:
        2020-09-08 22:23:09.596178 | primary |
        2020-09-08 22:23:09.596194 | primary | There was an issue creating /root/DLRN as requested: [Errno 13] Permission denied: b'/root/DLRN'

The error originates in [4] and there is a fix in progress at [5] and test at [6].

[1] https://504e67c1d54aa801a3ba-eb4966d6f73a1195abf0bef83b33defe.ssl.cf2.rackcdn.com/750794/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/e68f8e5/job-output.txt
[2] https://46c50e344e3b02ed42fd-1e7408434efc5127489aebb82c7dd76e.ssl.cf5.rackcdn.com/750455/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/7a796e6/job-output.txt
[3] https://ed14f8e2080515123465-885ebcd8132e7c53c86923fd2056bf1b.ssl.cf2.rackcdn.com/750485/1/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/8e7518e/job-output.txt
[4] https://opendev.org/openstack/tripleo-quickstart-extras/src/commit/593fd63167daae9f0fd60bfa289e350f5af8fc23/playbooks/multinode-standalone-upgrade.yml#L47
[5] https://review.opendev.org/#/c/750443/
[6] https://review.opendev.org/739457

Changed in tripleo:
assignee: nobody → Marios Andreou (marios-b)
Revision history for this message
Marios Andreou (marios-b) wrote :

working on it there https://review.opendev.org/#/c/750443/ WIP pass build_repo_dir + fix tempest cloud name for standalone upgrade

Revision history for this message
Marios Andreou (marios-b) wrote :

as discussed in scrum i checked logstash [1] - make sure to add a bigger time range and select 'build_name' in the output - you will see it is always tripleo-ci-centos-8-standalone-upgrade & tripleo-ci-centos-8-standalone-upgrade-ussuri

[1] http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%5C%22issue%20creating%20%2Froot%2FDLRN%20as%20requested:%20%5BErrno%2013%5D%20Permission%20denied%5C%22

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart-extras (master)

Fix proposed to branch: master
Review: https://review.opendev.org/752013

Changed in tripleo:
assignee: Marios Andreou (marios-b) → Sagi (Sergey) Shnaidman (sshnaidm)
status: Triaged → In Progress
Revision history for this message
Marios Andreou (marios-b) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

i propose that we merge https://review.opendev.org/#/c/750443/ - for now it gets us past this bug [1]

        * 2020-09-15 13:54:35.141246 | primary | TASK [build-test-packages : Ensure DLRN dir is present] ************************
          2020-09-15 13:54:35.141263 | primary | Tuesday 15 September 2020 13:54:35 +0000 (0:00:00.106) 0:03:01.435 *****
          2020-09-15 13:54:35.988195 | primary | ok: [undercloud]

It might not be the "root" fix but we need this workaround as we are then hitting a new issue. I will file a new bug to track the new issue.

[1] https://storage.bhs.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_ee1/739457/27/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/ee18aa6/job-output.txt

Revision history for this message
Marios Andreou (marios-b) wrote :

new issue filed https://bugs.launchpad.net/tripleo/+bug/1895822 per comment #5 above

Revision history for this message
wes hayutin (weshayutin) wrote :

Found something..

https://storage.gra.cloud.ovh.net/v1/AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_1cb/752014/3/check/tripleo-ci-centos-8-standalone-upgrade/1cb93b5/logs/undercloud/var/log/extra/dump_variables_vars.json

"ansibleenv": {
"HOME": "/root",
"LANG": "en_US.UTF-8",
"LOGNAME": "root",
"MAIL": "/var/mail/root",
"PATH": "/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/bin/",
"PWD": "/home/zuul",
"SHELL": "/bin/bash",
"SHLVL": "1",
"SUDO_COMMAND": "/bin/sh -c echo BECOME-SUCCESS-nmgrowmddqgvfpurfzlejqmmlgwlogcr ; /usr/libexec/platform-python",
"SUDO_GID": "1000",
"SUDO_UID": "1000",
"SUDO_USER": "zuul",
"TERM": "unknown",
"USER": "root",
"": "/usr/libexec/platform-python"

Edit - Add Link as Attachment - Delete

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/750443
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=a1621e622614d5be8b6db32d3d66bb3dfff4e997
Submitter: Zuul
Branch: master

commit a1621e622614d5be8b6db32d3d66bb3dfff4e997
Author: Marios Andreou <email address hidden>
Date: Tue Sep 8 18:28:55 2020 +0300

    Pass build_repo_dir + fix tempest cloud name for standalone upgrade

    As discussed in related-bug we are failing during built-test-packages
    As a workaround we can explicitly pass the build_repo_dir.

    Also fixes the tempest cloud name making it the same as used in the
    tests running after deploy (and matching the standalone env).

    Related-Bug: 1895138
    Change-Id: I7e1868d5dea3863cfa15c5ef86b6666cac3b0370

Revision history for this message
Marios Andreou (marios-b) wrote :

so this is no longer blocking with the fix from https://review.opendev.org/750443

but per comment #7 above there is still ongoing investigation for the root cause.

Changed in tripleo:
assignee: Sagi (Sergey) Shnaidman (sshnaidm) → wes hayutin (weshayutin)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/752681
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=01bc68dd3ad795e6804b93a6c709d5aa02db9abd
Submitter: Zuul
Branch: master

commit 01bc68dd3ad795e6804b93a6c709d5aa02db9abd
Author: Wes Hayutin <email address hidden>
Date: Fri Sep 18 06:33:48 2020 -0600

    always gather facts in build-test-packages

    Ensure that facts are gathered so that
    the user and user_dir settings are correct
    and not using root.

    Closes-Bug: #1895138
    Change-Id: I3b44c75ca42579fdaca7559b0991aed9a24eda4f

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
Marios Andreou (marios-b) wrote :

after the gather facts fix merged https://review.opendev.org/752681 I posted a partial revert https://review.opendev.org/#/c/753034/ (for the fix merged @ https://review.opendev.org/#/c/750443/ ) for verification.

in testing at https://77fcbb6bcb190ff77af8-4535b5f6c3d3d8c9a7fe219932ac3358.ssl.cf2.rackcdn.com/739457/29/check/tripleo-ci-centos-8-standalone-upgrade-ussuri/015cefa/job-output.txt

you can see the 'ensure DLRN dir is present' task is OK:

        * 2020-09-21 23:28:48.477637 | primary | TASK [build-test-packages : Ensure DLRN dir is present] ************************
        2020-09-21 23:28:48.477762 | primary | Monday 21 September 2020 23:28:48 +0000 (0:00:00.042) 0:09:35.005 ******
        2020-09-21 23:28:49.087139 | primary | changed: [undercloud]
                * 2020-09-22 00:15:03.296054 | primary |
        2020-09-22 00:15:03.296096 | primary | TASK [build-test-packages : Ensure DLRN dir is present] ************************
        2020-09-22 00:15:03.296109 | primary | Tuesday 22 September 2020 00:15:03 +0000 (0:00:00.076) 0:01:35.537 *****
        2020-09-22 00:15:03.915046 | primary | ok: [undercloud]
        2020-09-22 00:15:03.928407 | primary |

we can merge the partial revert https://review.opendev.org/#/c/753034

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Sagi Shnaidman (<email address hidden>) on branch: master
Review: https://review.opendev.org/752013

Revision history for this message
Sandeep Yadav (sandeepyadav93) wrote :

Fix proposed to branch: master
Review: https://review.opendev.org/#/c/753540/

Remove "become" from baremetal-prep-virthost.yml

When running a playbook with become: true, it also runs the facts
module with sudo, so ansible_user_dir will have the value of the
root user, rather than the expected home directory of the
ansible_ssh_user.

Merge of patch[1] has uncovered this issue and below task is failing
during image build.
~~~
TASK [oooci-build-images : ironic-python-agent]
"Error when writing tar.tar archive at /root/ironic-python-agent.tar:
[Errno 13] Permission denied: '/root/ironic-python-agent.tar'"
~~~

Cause: During run of gather facts task in build-test-packages role -
ansible_user_dir fact got changed to "/root" because of become:true.

With this patch we are removing the not needed become:true.

[1] https://review.opendev.org/#/c/752681/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to tripleo-quickstart-extras (master)

Reviewed: https://review.opendev.org/753540
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=901a7428166da1188b711fabce4c19ebbd4b8715
Submitter: Zuul
Branch: master

commit 901a7428166da1188b711fabce4c19ebbd4b8715
Author: Sandeep Yadav <email address hidden>
Date: Wed Sep 23 13:35:43 2020 +0530

    Remove "become" from baremetal-prep-virthost.yml

    When running a playbook with become: true, it also runs the facts
    module with sudo, so ansible_user_dir will have the value of the
    root user, rather than the expected home directory of the
    ansible_ssh_user.

    Merge of patch[1] has uncovered this issue and below task is failing
    during image build.
    ~~~
    TASK [oooci-build-images : ironic-python-agent]
    "Error when writing tar.tar archive at /root/ironic-python-agent.tar:
    [Errno 13] Permission denied: '/root/ironic-python-agent.tar'"
    ~~~

    Cause: During run of gather facts task in build-test-packages role -
    ansible_user_dir fact got changed to "/root" because of become:true.

    With this patch we are removing the not needed become:true.

    [1] https://review.opendev.org/#/c/752681/

    Related-Bug: 1895138
    Change-Id: I7264cf9f3f91384b52bd91a1b57d9a23ff87d0b0

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Reviewed: https://review.opendev.org/753034
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart-extras/commit/?id=24989a57570769bb5d6e380b7eeac3260f6877d1
Submitter: Zuul
Branch: master

commit 24989a57570769bb5d6e380b7eeac3260f6877d1
Author: Marios Andreou <email address hidden>
Date: Mon Sep 21 18:28:28 2020 +0300

    Partial revert build-test-packages fix (build directory)

    Partial revert of [1] as discussed in related-bug this is no longer
    necessary after [2] merged.

    Related-Bug: 1895138
    [1] https://review.opendev.org/#/c/750443/
    [2] https://review.opendev.org/#/c/752681/
    Change-Id: I27755af71063f03d5f182782c6cc8a2f7475d3a5

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.