TripleO-Quickstart hanging on virt-customize call during execution of modify-image role

Bug #1718965 reported by Harry Rybacki on 2017-09-22
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Medium
Unassigned

Bug Description

Description
===========

Vanilla run of TripleO-Quickstart is hanging on virt-customize call that runs repo_setup.sh on the overcloud image indefinitely[1].

After letting a run hang overnight I attempted to call virt-customize on the overcloud image manually, in verbose mode and found that virt-customize is completing execution of the script and then hanging after the appearance of a mystery '"'

  [stack@ipa test]$ virt-customize --run repo_setup.sh --verbose -a overcloud-full.qcow2
  [ 0.0] Examining the guest ...
  libguestfs: launch: program=virt-customize

  <snip>

  sudo yum remove -y rdo-release centos-release-ceph-* centos-release-openstack-* || true;
  sudo rm -rf /etc/yum.repos.d/CentOS-OpenStack-*.repo /etc/yum.repos.d/CentOS-Ceph-*.repo
  /etc/yum.repos.d/CentOS-QEMU-EV.repo;
  sudo rm -rf /etc/yum.repos.d/*.rpmsave;
  sudo yum repolist;
  sudo yum update -y

  ### --stop_docs

  " <-- this is the point of hanging

I modified the virt-customize call within quickstart-extras[2] and ran a vanilla deployment against the master-tripleo-ci release as a sanity check. It hung at the same point with the same output.

Here is a copy of the repo_setup.sh[3] script created during execution.

It appears to be getting stuck right after the final `### --stop_docs` marker. Perhaps this an issue with virt-customize?

Steps to reproduce
==================

1. Call quickstart in a vanilla fashion against master-tripleo-ci

Expected result
===============

Quickstart deploys as expected

Actual result
=============

Quickstart gets hung up during the modify-image role's execution

Environment
===========

From virthost:

libguestfs-tools-c-1.36.3-6.el7_4.3.x86_64
libguestfs-1.36.3-6.el7_4.3.x86_64
libguestfs-tools-1.36.3-6.el7_4.3.noarch
python-libguestfs-1.36.3-6.el7_4.3.x86_64
[stack@ipa ~]$ virt-customize --version
virt-customize 1.36.3rhel=7,release=6.el7_4.3,libvirt

[stack@ipa ~]$ cat /etc/redhat-release
CentOS Linux release 7.4.1708 (Core)

[1] - https://paste.fedoraproject.org/paste/59x96nb~kC8iI3WP1K9i3Q
[2] - https://github.com/openstack/tripleo-quickstart-extras/blob/master/roles/modify-image/tasks/libguestfs.yml#L42

Harry Rybacki (hrybacki-h) wrote :

Attempting to run quickstart against master rather than master-tripleo-ci. Maybe the latter's image was somehow corrupt.

Harry Rybacki (hrybacki-h) wrote :

Hash for the overlcloud image in question: 835667355e69ca32390c60f8fde2d676

Harry Rybacki (hrybacki-h) wrote :

This only seems to be affecting deployments against the master-tripleo-ci release. Run against master worked fine. Earlier releases are probably not affected.

Changed in tripleo:
status: New → Triaged
importance: Undecided → Medium
milestone: none → queens-1
John Trowbridge (trown) wrote :

This was resolved by manually yum updating the image in CI so that it does not need to update from 7.3->7.4.

I tested master-tripleo-ci release in quickstart this morning and it works fine.

Changed in tripleo:
status: Triaged → Fix Released
Harry Rybacki (hrybacki-h) wrote :

Thanks for the speedy resolution, @trown!

We've got this in tripleo-quickstart-extras-gate-master-tripleo-ci-delorean-full-minimal_pacemaker job and I can notice a lot of [Errno 5] [Errno 12] Cannot allocate memory. Moreover, it ends with this error when running the `yum update`:

--> Finished Dependency Resolution
Error: Package: 1:grub2-efi-modules-2.02-0.44.el7.centos.x86_64 (@base)
           Requires: grub2-tools = 1:2.02-0.44.el7.centos
           Removing: 1:grub2-tools-2.02-0.44.el7.centos.x86_64 (installed)
               grub2-tools = 1:2.02-0.44.el7.centos
           Obsoleted By: 1:grub2-tools-minimal-2.02-0.64.el7.centos.x86_64 (quickstart-centos-base)
               Not found
           Installing: 1:grub2-tools-2.02-0.64.el7.centos.x86_64 (quickstart-centos-base)
               grub2-tools = 1:2.02-0.64.el7.centos
 You could try using --skip-broken to work around the problem
** Found 293 pre-existing rpmdb problem(s), 'yum check' output follows:

After a 240 min timeouts, here is the log of the repo_setup.sh script:
https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-extras-gate-master-tripleo-ci-delorean-full-minimal_pacemaker-3376/172.19.2.24/home/stack/_home_stack_repo_setup.sh.log.gz

John Trowbridge (trown) wrote :

I uploaded new overcloud-full image, and tested a local run.

Not entirely sure what happened between yesterday and today, but this should be resolved.

Adam Huffman (adam-huffman) wrote :

I'm seeing this today, with the image:

94f385d5742f61448587f21efcc56ddc.tar

which is from

https://images.rdoproject.org/master/rdo_trunk/3a65b17da7b98c83dbfda432af88bb56d3501de9_dba04735/overcloud-full.tar

Adam Huffman (adam-huffman) wrote :

It's a Skylake node, running CentOS 7.5, kernel 3.10.0-862.3.3

Adam Huffman (adam-huffman) wrote :

Appears to be related to the recent problems with libguestfs and 7.5 kernels, which for me are also apparently causing the undercloud VM not to boot properly.

Kashyap Chamarthy (kashyapc) wrote :

Hi,

I'd appreciate if someone can provide a 'virt-customize' (with its full command-line) only reproducer here?

Adam Huffman says he's running the kernel: kernel 3.10.0-862.3.3. And alludes to "recent problems with libguestfs and 7.5". I just talked to libguestfs upstream folks, and they said, "there were some problems with initial Meltdown / Spectre fixes; but if you're running the latest CentOS kernel then you should be good".

What is the missing piece here?

    * * *

That said, I'm unable to reproduce the problem (in a RHEL environment that was provided by Michal Pryc from Red Hat; and here's the downstream bug: https://bugzilla.redhat.com/show_bug.cgi?id=1655542). And the following are three attempts (after some trial-and-error) at reproducers. All of them succeed at `yum update`.

Method-A1
---------

Get a fresh copy of 'overcloud-full.qcow2', and now try to run (notice
the "--skip-broken"):

    $ time virt-customize -v -x --selinux-relabel \
        --run-command 'yum -y update --skip-broken -d 10 -v' \
        -a v4-overcloud-full.qcow2 \
        |& tee method-A1.txt

The `yum update` update succeeds. The full log is in attachment.

Method-A2
---------

Again, get a fresh copy of 'overcloud-full.qcow2', and now try to run
(here we run *without* the "--skip-broken"):

    $ time virt-customize -v -x --selinux-relabel \
        --run-command 'yum -y update -d 10 -v' \
        -a v4-overcloud-full.qcow2 \
        |& tee method-A2.txt

Here, too. `yum update` update succeeds. The full log is in
attachment.

Method-B
--------

Trying to make `yum` run manually from the same OverCloud image in the
environment:

(1) On the UnderCloud environment, navigate to the "backup-overcloud"
    directory, and copy the QCOW2 file, 'vmlinuz' and 'initrd' into a
    directory called "v3"

    $ mkdir /home/stack/v3
    $ cd /home/stack/backup-overcloud/
    $ cp overcloud-full.qcow2 ~/v3/v3-overcloud-full.qcow2
    $ cp overcloud-full.vmlinuz overcloud-full.initrd ~/v3/

(2) Reset the root password to "empty" on the 'v3-overcloud-full.qcow2'

    $ cd /home/stack/v3/
    $ virt-edit -a v3-overcloud-full.qcow2 /etc/passwd -e 's/^root:.*?:/root::/'

(2) Then, import the 'v3-overcloud-full.qcow2' image (with the given
    'kernel' and 'initrd' into libvirt:

    $ sudo virt-install --name v3-overcloud-full-with-kernel \
    --ram 2048 --disk path=./v3-overcloud-full.qcow2,format=qcow2 \
    --machine q35 --os-variant fedora27 --cpu host-passthrough \
    --nographics --network default \
    --boot kernel=`pwd`/overcloud-full.vmlinuz,initrd=`pwd`/overcloud-full.initrd,kernel_args="panic=1 console=ttyS0 root=/dev/vda selinux=0" \
    --import

(3) Then, SSH into the guest, and run:

    $ yum update -y -d 10 -v

The `yum update` succeeds.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.