[ocata promotion] phase1 (ci.centos) job tripleo-quickstart-promote-ocata-rdo_trunk-minimal fails introspection/deploy "No valid host found"

Bug #1774079 reported by Ronelle Landy
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
yatin

Bug Description

https://ci.centos.org/job/tripleo-quickstart-promote-ocata-rdo_trunk-minimal/ has been failing since 05/22 - when it was re-enabled. It had been disabled on 05/09 (passing on that day).

The job fails consistently but with different errors each time.
The most common errors are in introspection and overcloud deployment:

Introspection error:
--------------------

 BaremetalIntrospectionAction.introspect failed: <class 'ironic_inspector_client.common.http.ClientError'>: Internal server error (NotFound): No valid host was found. Reason: No conductor service registered which supports driver pxe_ipmitool. (HTTP 404)"

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal-353/undercloud/home/stack/overcloud_prep_images.log.gz

Deployment error:
-----------------

2018-05-27 06:00:33 | Error: resources[0].resources.NovaCompute: Went to status ERROR due to "Message: No valid host was found. There are not enough hosts available., Code: 500"

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal-352/undercloud/home/stack/overcloud_deploy.log.gz

Ocata running in OVB jobs in RDO cloud passes:

https://logs.rdoproject.org/openstack-periodic-24hr/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-ocata/9f4961f/undercloud/home/jenkins/

We have considered that the hardware may be an issue (oom, node size etc.) but similar jobs running pike/queens/master pass on the same hardware (https://ci.centos.org/job/tripleo-quickstart-promote-pike-rdo_trunk-minimal/).

The pike (and beyond) jobs do use ipmi driver as opposed to ipmitool (https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-pike-rdo_trunk-minimal-143/undercloud/home/stack/instackenv.json.gz)

Revision history for this message
Ronelle Landy (rlandy) wrote :

Marking this as critical since the ocata phase 1 promotion is now delayed 17 days.

tags: added: ci promotion-blocker
Changed in tripleo:
status: New → Triaged
importance: Undecided → Critical
milestone: none → rocky-2
Revision history for this message
Ronelle Landy (rlandy) wrote :

Note that when we see the overcloud deployment errors, introspection has passed - which means the "No conductor service registered which supports driver pxe_ipmitool. (HTTP 404)" error is not consistent.

Revision history for this message
Arx Cruz (arxcruz) wrote :

From the errors file, we can see several errors related to neutron, database, ironic and nova:
https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-ocata-rdo_trunk-minimal-356/undercloud/var/log/extra/errors.txt.gz

Revision history for this message
Ronelle Landy (rlandy) wrote :

<ykarel> rlandy|rover, yes services are being started around every 10 minutes. Possibly the cause of the diffrenet services going down on ironic, neutron, nova

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to instack-undercloud (stable/ocata)

Fix proposed to branch: stable/ocata
Review: https://review.openstack.org/571513

Ronelle Landy (rlandy)
Changed in tripleo:
status: Triaged → In Progress
assignee: nobody → Ronelle Landy (rlandy)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to instack-undercloud (stable/ocata)

Reviewed: https://review.openstack.org/571513
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=a2647bf669608e2364282a8e7873678ded6253d4
Submitter: Zuul
Branch: stable/ocata

commit a2647bf669608e2364282a8e7873678ded6253d4
Author: Alex Schultz <email address hidden>
Date: Thu Mar 8 11:00:41 2018 -0700

    Remove cloud-init and disable os-collect-config

    A user uses a guest image for the undercloud, cloud-init may be
    installed which can also cause other services like os-collect-config to
    be running. We should ensure that cloud-init is removed and that the
    os-collect-config service is disable to prevent it from interfering with
    overcloud deployments.

    Change-Id: I58f6fc4b299c8f1f561205ac9a2de75c46467ba8
    Closes-Bug: #1754426
    Closes-Bug: #1774079
    (cherry picked from commit 998230da5cb6c0b725b8e67c82bf2157b9f0c46b)
    (cherry picked from commit c95254293b8068057c42ddae8b45eadec87c5f70)

tags: added: in-stable-ocata
Revision history for this message
Ronelle Landy (rlandy) wrote :
Changed in tripleo:
status: In Progress → Fix Released
assignee: Ronelle Landy (rlandy) → yatin (yatinkarel)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/instack-undercloud 6.1.8

This issue was fixed in the openstack/instack-undercloud 6.1.8 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.