introspection hangs due to broken ipxe config

Bug #1604770 reported by Steven Hardy on 2016-07-20
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Critical
Dmitry Tantsur

Bug Description

After updating my undercloud the ipxe config is broken:

# cat inspector.ipxe
#!ipxe

:retry_dhcp
dhcp || goto retry_dhcp

:retry_boot
imgfree
kernel http://192.0.2.1:%{hiera('ironic::drivers::deploy::http_port')/agent.kernel ipa-inspection-callback-url=http://192.0.2.1:5050/v1/continue ipa-inspection-collectors=default,extra-hardware,logs systemd.journald.forward_to_console=yes BOOTIF=${mac} ipa-debug=1 ipa-inspection-dhcp-all-interfaces=1 initrd=agent.ramdisk || goto retry_boot
initrd http://192.0.2.1:%{hiera('ironic::drivers::deploy::http_port')/agent.ramdisk || goto retry_boot

This makes the introspection hang forever

Fix proposed to branch: master
Review: https://review.openstack.org/344792

Changed in tripleo:
assignee: nobody → Dmitry Tantsur (divius)
status: New → In Progress
Dmitry Tantsur (divius) on 2016-07-20
Changed in tripleo:
importance: Undecided → Critical

Reviewed: https://review.openstack.org/344792
Committed: https://git.openstack.org/cgit/openstack/instack-undercloud/commit/?id=33e9d78dcdbc9a3737bfc009660a5bb2698313e4
Submitter: Jenkins
Branch: master

commit 33e9d78dcdbc9a3737bfc009660a5bb2698313e4
Author: Dmitry Tantsur <email address hidden>
Date: Wed Jul 20 14:22:35 2016 +0200

    Fix wrong template in puppet-stack-config.yaml.template

    Change-Id: I0cd804d743358674c3959d0b3430f9ad8819ee02
    Closes-Bug: #1604770

Changed in tripleo:
status: In Progress → Fix Released
wes hayutin (weshayutin) wrote :

sorry.. mistral has the correct status.
 u'description': u"Introspect all nodes in a 'manageable' state.", u'version': u'2.0', u'input': [{u'queue_name': u'tripleo'}], u'type': u'direct', u'name': u'introspect_manageable_nodes'}}, u'introspected_nodes': {u'e089b005-b8da-4ffe-aafd-0f474166a224': {u'finished': True, u'error': u'Introspection timeout'},

wes hayutin (weshayutin) wrote :

hrm.. this worked on my local hardware.. again.

cat undercloud.qcow2.md5
1c679f13f19b69d7969d08774e80a247 undercloud.qcow2

This is the same image used in the failing job

https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal-379/console.txt.gz

changed: [172.19.3.91] => {"changed": true, "cmd": ["curl", "-sf", "http://artifacts.ci.centos.org/artifacts/rdo/images/master/delorean/testing/undercloud.qcow2.md5"], "delta": "0:00:00.026540", "end": "2016-07-20 22:05:05.665640", "rc": 0, "start": "2016-07-20 22:05:05.639100", "stderr": "", "stdout": "1c679f13f19b69d7969d08774e80a247 undercloud.qcow2", "stdout_lines": ["1c679f13f19b69d7969d08774e80a247 undercloud.qcow2"], "warnings": ["Consider using get_url or uri module rather than running curl"]}

+ openstack baremetal import --json instackenv.json
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
Successfully registered node UUID ad57542a-3a32-46c2-8b48-e00173eeafdc
Successfully registered node UUID 05297d6d-afbf-4030-9ba5-f7ed3f720cb6
Successfully set all nodes to available.
+ openstack baremetal configure boot
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
+ openstack baremetal introspection bulk start
WARNING: openstackclient.common.utils is deprecated and will be removed after Jun 2017. Please use osc_lib.utils
Setting nodes for introspection to manageable...
Starting introspection of manageable nodes
Waiting for introspection to finish...
Introspection for UUID ad57542a-3a32-46c2-8b48-e00173eeafdc finished successfully.
Introspection for UUID 05297d6d-afbf-4030-9ba5-f7ed3f720cb6 finished successfully.
Introspection completed.
Setting manageable nodes to available...

Steven Hardy (shardy) on 2016-07-21
Changed in tripleo:
milestone: none → newton-3
Steven Hardy (shardy) wrote :

Wes - Is it possible the CI job is using a cached IPA image/kernel?

This commit added configuration of a new timeout argument, and in my local environment introspection fails (spins forever failing to boot the IPA kernel with an unrecognised option --timeout error) - rebuilding the IPA image/kernel fixed it.

https://github.com/openstack/instack-undercloud/commit/474abe263cb68ab239003354474d9bb6bc22ee3b

Steven Hardy (shardy) wrote :

Ok, correction - rebuilding the image didn't fix it - I also changed the ipxe config, adding the --timeout options back it still fails (doh!)

dtantsur explained this depends on the ipxe rom version (of the virt host, not the undercloud), this is the failing version (c4bce43 is not working, I think 6366fa7a is the version we require):

http://people.redhat.com/~shardy/screenshots/Screenshot_baremetalbrbm_1.png

Steven Hardy (shardy) wrote :

ipxe-roms-qemu-20160127-1.git6366fa7a.el7.noarch is the package required on the virt host

Alan Pevec (apevec) wrote :

https://review.openstack.org/345410 updates ipxe on the virthost

This issue was fixed in the openstack/instack-undercloud 5.0.0.0b3 development milestone.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers