OVB jobs failing in overcloud-prep-images with timeout on introspection

Bug #1829468 reported by Marios Andreou on 2019-05-17
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Sorin Sbarnea

Bug Description

Many examples across branches like [1][2][3] (master/stein/rocky) looks like it affects all ovb jobs. Trace from overcloud_prep_images.log.txt.gz like:

    2019-05-16 17:12:33 | + openstack overcloud node introspect --all-manageable
    2019-05-16 17:12:36 | Waiting for messages on queue 'tripleo' with no timeout.
    2019-05-16 17:14:26 | Waiting for introspection to finish...
    2019-05-16 17:14:26 |
    2019-05-16 17:14:26 | Introspection completed.
    2019-05-16 17:14:26 | + openstack overcloud node provide --all-manageable
    2019-05-16 17:14:29 | Waiting for messages on queue 'tripleo' with no timeout.

And nothing else happens. I see these in errors [4]:

    2019-05-16 21:06:47.682 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR nova.virt.ironic.driver [req-5877e70e-8475-4fd3-8d4b-f85800271e5f - - - - -] An unknown error has occurred when trying to get the list of nodes from the Ironic inventory. Error: StrictVersion instance has no attribute 'version'
    2019-05-16 22:08:06.931 ERROR /var/log/containers/nova/nova-compute.log: 8 ERROR oslo_service.periodic_task raise exception.VirtDriverNotReady()

[1] http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f42ce71/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[2] http://logs.rdoproject.org/openstack-periodic-latest-released/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-stein/cdfb1e9/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[3] http://logs.rdoproject.org/openstack-periodic-24hr/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset020-rocky/b7e3274/logs/undercloud/home/zuul/overcloud_prep_images.log.txt.gz
[4] http://logs.rdoproject.org/openstack-periodic-master/opendev.org/openstack/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-1ctlr_1comp-featureset002-master-upload/f42ce71/logs/undercloud/var/log/extra/errors.txt.txt.gz

Tags: ci Edit Tag help
Marios Andreou (marios-b) wrote :

    (ykarel) Master/stein also affected, check https://review.opendev.org/#/c/653279/1 (possible breakage), https://review.opendev.org/#/c/659592/ (possible fix) and https://review.opendev.org/#/c/659612/1 (blocking invalid tags), we can proactively do this in rdoinfo

tags: added: promotion-blocker
Marios Andreou (marios-b) wrote :

we have that to pin us until the fixes become available. Fixes at https://review.opendev.org/#/q/I3b25f4fb170aa93159ffa8074dc74fa6f50671b7

we are pinning ironicclient in rdo with https://review.rdoproject.org/r/20787

Marios Andreou (marios-b) wrote :

this should no longer be blocking us with the pin we should not be seeing this in ci jobs.

Unfortunately looks like ironicclient is not released very often looking at https://pypi.org/project/python-ironicclient/#history

We will need to keep the pin https://review.rdoproject.org/r/20787 until a v 2.7.2 becomes available

tags: removed: promotion-blocker
Marios Andreou (marios-b) wrote :

removed promotion-blocker see comment #3

Changed in tripleo:
milestone: none → train-1
wes hayutin (weshayutin) on 2019-05-20
Changed in tripleo:
status: Triaged → Incomplete
Rafael Folco (rafaelfolco) wrote :

Ruck/rover can verify if this is valid bug.

Changed in tripleo:
milestone: train-1 → train-2
Changed in tripleo:
assignee: nobody → Sorin Sbarnea (ssbarnea)
Changed in tripleo:
milestone: train-2 → train-3
