tqe build images failing on workaround for the old libguestfs

Bug #1716487 reported by wes hayutin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Attila Darazs

Bug Description

https://ci.centos.org/job/tripleo-quickstart-promote-pike-build-images/42/consoleFull

00:03:19.685 TASK [provision/local : Register fact for current user group] ******************
00:03:19.685 task path: /home/centos/workspace/tripleo-quickstart-promote-pike-build-images/tripleo-quickstart/roles/provision/local/tasks/main.yml:12
00:03:19.713 Monday 11 September 2017 19:57:31 +0000 (0:00:00.278) 0:00:08.040 ******
00:03:19.761 ok: [localhost] => {"ansible_facts": {"current_group_local": "centos"}, "changed": false}
00:03:19.779
00:03:19.779 TASK [provision/local : Ensure local working dir exists] ***********************
00:03:19.779 task path: /home/centos/workspace/tripleo-quickstart-promote-pike-build-images/tripleo-quickstart/roles/provision/local/tasks/main.yml:18
00:03:19.809 Monday 11 September 2017 19:57:31 +0000 (0:00:00.095) 0:00:08.135 ******
00:03:20.016 fatal: [localhost]: FAILED! => {"changed": false, "failed": true, "module_stderr": "sudo: a password is required\n", "module_stdout": "", "msg": "MODULE FAILURE"}

This is causing failures in the build image and collect log roles

Tags: ci quickstart
Revision history for this message
Alan Pevec (apevec) wrote :

* Last known good was
https://ci.centos.org/job/tripleo-quickstart-promote-pike-build-images/38/consoleFull
and it had "sudo: a password is required" in collect logs task too.

* First known bad was
https://ci.centos.org/job/tripleo-quickstart-promote-pike-build-images/39/consoleFull
where virt-customize failed with repo_setup script:

16:22:40 fatal: [172.19.2.187]: FAILED! => {"changed": true, "cmd": "virt-customize --run /var/lib/oooq-images/repo_setup.sh -a /var/lib/oooq-images/isolation-image.qcow2 > /home/stack/_var_lib_oooq-images_repo_setup.sh.log 2>&1", "delta": "0:00:31.307324", "end": "2017-09-11 17:22:40.863682", "failed": true, "rc": 1, "start": "2017-09-11 17:22:09.556358", "stderr": "", "stdout": "", "stdout_lines": [], "warnings": []}

Revision history for this message
Alan Pevec (apevec) wrote :

sudo issue in log collection is there as long as job history goes back:
https://ci.centos.org/job/tripleo-quickstart-promote-pike-build-images/13/console

Revision history for this message
Alan Pevec (apevec) wrote :

Latest failure in jobs #42 was the same virt-customize with repo_setup but since log collection is not working we cannot see why is it failing.
Let's fix log collection first and try running the script on the CI slave manually in parallel.

summary: - tq build images sudo required for centos user
+ tq build images log collection: sudo required for centos user
Revision history for this message
Alan Pevec (apevec) wrote : Re: tq build images log collection: sudo required for centos user

Until log collection is fixed, could we temp remove stdout/err redirection in repo_setup task to get the error message on the console?

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to tripleo-quickstart-extras (master)

Related fix proposed to branch: master
Review: https://review.openstack.org/502919

Revision history for this message
Alan Pevec (apevec) wrote : Re: tq build images log collection: sudo required for centos user
summary: - tq build images log collection: sudo required for centos user
+ tqe build images log collection: sudo required for centos user
Revision history for this message
Attila Darazs (adarazs) wrote : Re: tqe build images log collection: sudo required for centos user

The main problem is that the image build runs on the local machine, for which our usual log collection is not set up correctly as the localhost is skipped in the collect-logs.yml playbook:

https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/collect-logs.yml#L4

I can't just change this simply, because we will start failing on all other jobs where localhost sudo is not needed at all.

Revision history for this message
Attila Darazs (adarazs) wrote :

OK, I found the problem in the configuration.

The build_images.yml config file sets "artcl_collect: true" -- this is an internal variable for the script that it uses to have two passes for log collection: the first one collects from every machine except for localhost, and then the second round uses localhost to upload the collected logs.

Having "artcl_collect: true" overrode the default in the playbook and forced the play to collect from localhost (which we don't use normally, so there were sudo errors) and also prevented it from uploading the logs.

See:

https://github.com/openstack/tripleo-quickstart/blob/master/config/general_config/build_images.yml#L14

vs.

https://github.com/openstack/tripleo-quickstart-extras/blob/master/playbooks/collect-logs.yml#L11

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-quickstart (master)

Fix proposed to branch: master
Review: https://review.openstack.org/502984

Changed in tripleo:
assignee: nobody → Attila Darazs (adarazs)
status: Triaged → In Progress
Revision history for this message
wes hayutin (weshayutin) wrote : Re: tqe build images log collection: sudo required for centos user

OK.. originally we were seeing sudo issues in the main job and sudo issues in collect logs ( that was known)

The main job issues are fixed by the following
==============================================
Adding Javier. We noticed image build is running directly on
cloudslave05 where centos user is not sudoer.
Javier has now added centos to sudoers on cloudslave05, promotion
re-run is https://ci.centos.org/view/rdo/view/promotion-pipeline/job/rdo_trunk-promote-pike-current-tripleo/45/

==============================================

sudo issues and collect logs problems should be fixed by
https://review.openstack.org/502984

==============================================

One can get the logs from the workspace

https://ci.centos.org/job/tripleo-quickstart-promote-pike-build-images/ws/collected_files/172.19.3.97/

Revision history for this message
wes hayutin (weshayutin) wrote :

From the repo setup I see
=======================================

+ sudo yum repolist
Loaded plugins: fastestmirror, priorities
http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
To address this issue please refer to the below knowledge base article

https://access.redhat.com/articles/1320623

If above article doesn't help to resolve this issue please create a bug on https://bugs.centos.org/

http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
http://mirror.centos.org/centos/7/virt/x86_64/kvm-common/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
Trying other mirror.
repo id repo name status
base/7/x86_64 CentOS-7 - Base 9591
centos-ceph-jewel/7/x86_64 CentOS-7 - Ceph Jewel 0
delorean delorean 957
delorean-pike-testing/x86_64 dlrn-pike-testing 1756
extras/7/x86_64 CentOS-7 - Extras 254
rdo-qemu-ev/x86_64 RDO CentOS-7 - QEMU EV 0
updates/7/x86_64 CentOS-7 - Updates 638
repolist: 13196
+ sudo yum update -y
Loaded plugins: fastestmirror, priorities
http://mirror.centos.org/centos/7/storage/x86_64/ceph-jewel/repodata/repomd.xml: [Errno 14] HTTP Error 404 - Not Found
=====================================================================

Revision history for this message
wes hayutin (weshayutin) wrote :
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-quickstart-extras (master)

Change abandoned by Alan Pevec (<email address hidden>) on branch: master
Review: https://review.openstack.org/502919
Reason: Fix for log collection is https://review.openstack.org/502984

Revision history for this message
Alan Pevec (apevec) wrote : Re: tqe build images log collection: sudo required for centos user

Actual error was libguestfs workaround for EL 7.3
https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/build-images/templates/overcloud-image-build.sh.j2#L6-L13

Last known good was with libguestfs-1.32.7-3.el7_3.3.x86_64, and started failing after EL 7.4 content was enabled in ci.centos, and we got libguestfs-1.36.3-6.el7_4.3.x86_64:

+ mkdir /dev/pts
mkdir: cannot create directory '/dev/pts': File exists
virt-customize: error:
/var/lib/oooq-images/overcloud_image_build_script.sh: command exited with
an error

Revision history for this message
Alan Pevec (apevec) wrote :

libguestfs workaround adjustment for new libguestfs https://review.openstack.org/503048

summary: - tqe build images log collection: sudo required for centos user
+ tqe build images libguestfs workaround needs update for 7.4
summary: - tqe build images libguestfs workaround needs update for 7.4
+ tqe build images failing on workaround for the old libguestfs
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-quickstart (master)

Reviewed: https://review.openstack.org/502984
Committed: https://git.openstack.org/cgit/openstack/tripleo-quickstart/commit/?id=9c576ff5369131716e07ba23c26a7848ba723418
Submitter: Jenkins
Branch: master

commit 9c576ff5369131716e07ba23c26a7848ba723418
Author: Attila Darazs <email address hidden>
Date: Tue Sep 12 14:56:03 2017 +0200

    build_images: remove some collect-logs related config

    * "artcl_collect" variable should not be specified in any config, as it
      is used internally by the role, this caused the bug
    * "artcl_collect_dir" value is the default for the role and so it is not
      necessary to specify
    * "artcl_gzip_only" and "artcl_tar_gz" variables should be specific to
      the environment and should not be part of a job config

    Closes-bug: #1716487
    Change-Id: Ic26455dffd5a16f9f5d3ee7fd17dc0ef4fa8b7ee

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-quickstart 2.1.1

This issue was fixed in the openstack/tripleo-quickstart 2.1.1 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.