tripleo-quickstart failts at overcloud_prep_images.sh with Exception introspecting nodes

Bug #1785089 reported by David Rabel on 2018-08-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

I did this on a fresh KVM virtual machine with nested virtualization enabled, 16GB RAM, 50GB HDD, 4 Cores, Centos 7 installed:

$ curl -O https://raw.githubusercontent.com/openstack/tripleo-quickstart/master/quickstart.sh
$ bash quickstart.sh --install-deps
$ bash quickstart.sh 127.0.0.2

After some hours it fails at overcloud_prep_images.sh with Exception introspecting nodes:

$ sudo -u stack virt-cat -d undercloud /home/stack/overcloud_prep_images.log

2018-08-02 15:01:54 | + source /home/stack/stackrc
2018-08-02 15:01:54 | +++ set
2018-08-02 15:01:54 | +++ awk '{FS="="} /^OS_/ {print $1}'
2018-08-02 15:01:54 | ++ NOVA_VERSION=1.1
2018-08-02 15:01:54 | ++ export NOVA_VERSION
2018-08-02 15:01:54 | ++ OS_PASSWORD=67012aa9c45bdce89538978188b1765e486be2ce
2018-08-02 15:01:54 | ++ export OS_PASSWORD
2018-08-02 15:01:54 | ++ OS_AUTH_TYPE=password
2018-08-02 15:01:54 | ++ export OS_AUTH_TYPE
2018-08-02 15:01:54 | ++ OS_AUTH_URL=https://192.168.24.2:13000/
2018-08-02 15:01:54 | ++ PYTHONWARNINGS='ignore:Certificate has no, ignore:A true SSLContext object is not available'
2018-08-02 15:01:54 | ++ export OS_AUTH_URL
2018-08-02 15:01:54 | ++ export PYTHONWARNINGS
2018-08-02 15:01:54 | ++ OS_USERNAME=admin
2018-08-02 15:01:54 | ++ OS_PROJECT_NAME=admin
2018-08-02 15:01:54 | ++ COMPUTE_API_VERSION=1.1
2018-08-02 15:01:54 | ++ IRONIC_API_VERSION=1.34
2018-08-02 15:01:54 | ++ OS_BAREMETAL_API_VERSION=1.34
2018-08-02 15:01:54 | ++ OS_NO_CACHE=True
2018-08-02 15:01:54 | ++ OS_CLOUDNAME=undercloud
2018-08-02 15:01:54 | ++ export OS_USERNAME
2018-08-02 15:01:54 | ++ export OS_PROJECT_NAME
2018-08-02 15:01:54 | ++ export COMPUTE_API_VERSION
2018-08-02 15:01:54 | ++ export IRONIC_API_VERSION
2018-08-02 15:01:54 | ++ export OS_BAREMETAL_API_VERSION
2018-08-02 15:01:54 | ++ export OS_NO_CACHE
2018-08-02 15:01:54 | ++ export OS_CLOUDNAME
2018-08-02 15:01:54 | ++ OS_IDENTITY_API_VERSION=3
2018-08-02 15:01:54 | ++ export OS_IDENTITY_API_VERSION
2018-08-02 15:01:54 | ++ OS_PROJECT_DOMAIN_NAME=Default
2018-08-02 15:01:54 | ++ export OS_PROJECT_DOMAIN_NAME
2018-08-02 15:01:54 | ++ OS_USER_DOMAIN_NAME=Default
2018-08-02 15:01:54 | ++ export OS_USER_DOMAIN_NAME
2018-08-02 15:01:54 | ++ '[' -z '' ']'
2018-08-02 15:01:54 | ++ export PS1=
2018-08-02 15:01:54 | ++ PS1=
2018-08-02 15:01:54 | ++ export 'PS1=${OS_CLOUDNAME:+($OS_CLOUDNAME)} '
2018-08-02 15:01:54 | ++ PS1='${OS_CLOUDNAME:+($OS_CLOUDNAME)} '
2018-08-02 15:01:54 | ++ export CLOUDPROMPT_ENABLED=1
2018-08-02 15:01:54 | ++ CLOUDPROMPT_ENABLED=1
2018-08-02 15:01:54 | + openstack overcloud image upload
2018-08-02 15:03:07 | Image "overcloud-full-vmlinuz" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | a940326a-7020-4f9a-9520-679e4fbf4773 | overcloud-full-vmlinuz | aki | 6234048 | active |
2018-08-02 15:03:07 | +--------------------------------------+------------------------+-------------+---------+--------+
2018-08-02 15:03:07 | Image "overcloud-full-initrd" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | | b76caa28-d7d6-4654-96a6-cd710659a713 | overcloud-full-initrd | ari | 54712255 | active |
2018-08-02 15:03:07 | +--------------------------------------+-----------------------+-------------+----------+--------+
2018-08-02 15:03:07 | Image "overcloud-full" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | | 60524702-fb98-4f67-aa2a-04348e1205a6 | overcloud-full | qcow2 | 1432289280 | active |
2018-08-02 15:03:07 | +--------------------------------------+----------------+-------------+------------+--------+
2018-08-02 15:03:07 | Image "bm-deploy-kernel" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | | ea62719e-5f71-4cc2-901a-a739fbfa206b | bm-deploy-kernel | aki | 6234048 | active |
2018-08-02 15:03:07 | +--------------------------------------+------------------+-------------+---------+--------+
2018-08-02 15:03:07 | Image "bm-deploy-ramdisk" was uploaded.
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | | ID | Name | Disk Format | Size | Status |
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | | 448134e5-a07d-45c4-aef2-16573a0ea959 | bm-deploy-ramdisk | ari | 390504215 | active |
2018-08-02 15:03:07 | +--------------------------------------+-------------------+-------------+-----------+--------+
2018-08-02 15:03:07 | + openstack overcloud node import instackenv.json
2018-08-02 15:03:19 | Waiting for messages on queue 'tripleo' with no timeout.
2018-08-02 15:03:54 | Started Mistral Workflow tripleo.baremetal.v1.register_or_update. Execution ID: a5d74d3a-a8ba-4c06-884c-48c2f98a6483
2018-08-02 15:03:54 |
2018-08-02 15:03:54 |
2018-08-02 15:03:54 | 2 node(s) successfully moved to the "manageable" state.
2018-08-02 15:03:54 | Successfully registered node UUID e5a28fa9-b91c-49b7-81a7-840df045a1ea
2018-08-02 15:03:54 | Successfully registered node UUID 49ab1038-e3c2-4b5e-b6e1-f394d748d404
2018-08-02 15:03:54 | + openstack overcloud node introspect --all-manageable
2018-08-02 15:04:03 | Waiting for messages on queue 'tripleo' with no timeout.
2018-08-02 16:06:22 | Exception introspecting nodes: {u'status': u'RUNNING', u'node_uuids': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'failed_introspection': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'result': None, u'introspected_nodes': {u'49ab1038-e3c2-4b5e-b6e1-f394d748d404': {u'uuid': u'49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:12'}, u'e5a28fa9-b91c-49b7-81a7-840df045a1ea': {u'uuid': u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:09'}}, u'message': u'Retrying 2 nodes that failed introspection. Attempt 2 of 3 ', u'introspection_attempt': 2}
2018-08-02 16:06:22 | Waiting for introspection to finish...
2018-08-02 16:06:22 | Started Mistral Workflow tripleo.baremetal.v1.introspect_manageable_nodes. Execution ID: d7040ff7-9cb6-4843-bbc3-86edf247769a
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Retrying 2 nodes that failed introspection. Attempt 2 of 3
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Retrying 2 nodes that failed introspection. Attempt 3 of 3
2018-08-02 16:06:22 | Introspection of node 49ab1038-e3c2-4b5e-b6e1-f394d748d404 timed out.
2018-08-02 16:06:22 | Introspection of node e5a28fa9-b91c-49b7-81a7-840df045a1ea timed out.
2018-08-02 16:06:22 | Retry limit reached with 2 nodes still failing introspection
2018-08-02 16:06:22 | {u'status': u'RUNNING', u'node_uuids': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'failed_introspection': [u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'49ab1038-e3c2-4b5e-b6e1-f394d748d404'], u'result': None, u'introspected_nodes': {u'49ab1038-e3c2-4b5e-b6e1-f394d748d404': {u'uuid': u'49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/49ab1038-e3c2-4b5e-b6e1-f394d748d404', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:12'}, u'e5a28fa9-b91c-49b7-81a7-840df045a1ea': {u'uuid': u'e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'links': [{u'href': u'http://192.168.24.2:13050/v1/introspection/e5a28fa9-b91c-49b7-81a7-840df045a1ea', u'rel': u'self'}], u'finished_at': None, u'state': u'waiting', u'finished': False, u'error': None, u'started_at': u'2018-08-02T15:04:09'}}, u'message': u'Retrying 2 nodes that failed introspection. Attempt 2 of 3 ', u'introspection_attempt': 2}

Bogdan Dobrelya (bogdando) wrote :

What is the generated undercloud-parameter-defaults.yaml (you can find it in the stack user home dir, or in the generated tarball by the end of undercloud deployment)? It should look like http://logs.openstack.org/18/589218/1/check/tripleo-ci-centos-7-undercloud-containers/21b7cda/logs/undercloud/home/zuul/undercloud-parameter-defaults.yaml.txt.gz

Does undercloud have its provisioning interface (it is likely eth1) included into br-ctlplane ovs bridge?

There had been a few containerized Ironic related fixes in tripleo heat templates master branch (Rocky), so you may want to retry with the latest t-h-t packages (once we have a promotion build with the recent t-h-t patches...).

Changed in tripleo:
status: New → Incomplete
milestone: none → rocky-rc1
importance: Undecided → High
David Rabel (rabel-b1) wrote :

So the current quickstart.sh is not working?

What would be the exact step to retry with the latest t-h-t packages?

Bogdan Dobrelya (bogdando) wrote :

The latest packages should be automatically picked up via quickstart. But you can also tweak it via -e release=current

Bogdan Dobrelya (bogdando) wrote :

Sorry, I think the right arguments to pick the most recent packages and container images is
quickstart.sh ... -R master -e dlrn_hash_tag=current

David Rabel (rabel-b1) wrote :

With that parameters I get a different error:

$ bash quickstart.sh -R master -e dlrn_hash_tag=current 127.0.0.2
[...]
TASK [fetch-images : Get image expected checksum] ******************************
task path: /home/centos/.quickstart/tripleo-quickstart/roles/fetch-images/tasks/fetch.yml:70
Monday 13 August 2018 14:11:41 +0000 (0:00:00.245) 0:03:59.092 *********
[DEPRECATION WARNING]: Using tests as filters is deprecated. Instead of using
`result|failed` instead use `result is failed`. This feature will be removed in
 version 2.9. Deprecation warnings can be disabled by setting
deprecation_warnings=False in ansible.cfg.
fatal: [127.0.0.2]: FAILED! => {"changed": true, "cmd": ["curl", "-sfL", "https://images.rdoproject.org/master/rdo_trunk/488107cdd0ae6fd9a6e51741c4bdd7cd5fb34cdb_d4217a48/overcloud-full.tar.md5"], "delta": "0:00:00.919391", "end": "2018-08-13 14:11:42.685405", "msg": "non-zero return code", "rc": 22, "start": "2018-08-13 14:11:41.766014", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}

David Rabel (rabel-b1) wrote :

Destroyed everything and ran quickstart.sh again without those parameters.

undercloud-parameter-defaults.yaml looks like this:

# sudo -u stack virt-cat -d undercloud /home/stack/undercloud-parameter-defaults.yaml
{
    "parameter_defaults": {},
    "resource_registry": {
        "OS::TripleO::Undercloud::Net::SoftwareConfig": "/usr/share/openstack-tripleo-heat-templates/net-config-undercloud.yaml"
    }
}

I'd have a closer look at the undercloud VM, but I can't SSH it:
# ssh -i /home/centos/.quickstart/id_rsa_undercloud stack@192.168.23.30
ssh_exchange_identification: read: Connection reset by peer

Bogdan Dobrelya (bogdando) wrote :

Good news is that undercloud-parameter-defaults.yaml looks correct :)
I'll try to reproduce that on my local env.

Note, you can try virsh console or virt-manager GUI to access VMs with root creds (those can be set via -e modify_image_vc_root_password=r00tme or the like, I think)

David Rabel (rabel-b1) wrote :

:)

Something else seems to be wrong with my undercloud VM. It somehow crashed and now when I started it stays in "paused" state forever. Still got 4gb of free memory, so that shouldn't be the problem.

Changed in tripleo:
milestone: rocky-rc1 → stein-1
David Rabel (rabel-b1) wrote :

Meanwhile: Could you tell me any parameters or older versions of the quickstart script so I can use it anyway?

Bogdan Dobrelya (bogdando) wrote :

I couldn't reproduce that on my local devbox libvirt setup, introspection passed:

http://paste.openstack.org/show/wVpkQp4hG78x3On3q0gI/

Note, I'm used to run quickstart from a wrapper centos7:latest container as I do not have Centos installed on my devbox, but that does not really matter. My setup basically repeats the command

quickstart.sh ... -R master -e dlrn_hash_tag=current-tripleo

but just given:
* custom libvirt provisioning params,
* custom (non stack) user,
* custom local_working and working directories,
* custom patches to support SSH-less localhost deployments [0]
* config/environments/dev_privileged_libvirt.yml for privileged libvirt mode
* custom vbmc_libvirt_uri, which I needed in order to SSH from undercloud to my virthost with HOST_BREXT_IP=192.168.23.1

So you can prolly just ignore all of that and use keep using virthost 127.0.0.2 instead of localhost.

Anyway, here is the command and deployment logs I was testing with.

A) Libvirt provision finished, just interrupted due to the way I ran it from a container (omits direct editing of authorized_hosts of virthost) - see _quickstart.log tarball attached

B) Restarted with no teardown, also checks idempotency (after I manually updated virthost's authorized_hosts) - see _quickstart_continue.log tarball attached
...which is, basically, the original command from A had been added:
<...>
-v -e undercloud_use_custom_boot_images=true \
-e undercloud_custom_initrd=${IMAGECACHE}/overcloud-full.initrd \
-e undercloud_custom_vmlinuz=${IMAGECACHE}/overcloud-full.vmlinuz \
-e force_cached_images=true -e image_cache_expire_days=300 \
-T none localhost

The step B actually failed to run overcloud-prep-images.sh cuz of missing stackrc by my custom working_dir. I fixed that manually by copying it by the needed path, then retried overcloud-prep-images.sh, and it has passed.

[0] https://review.openstack.org/#/q/topic:localcon+(status:open+OR+status:merged)

Bogdan Dobrelya (bogdando) wrote :
Bogdan Dobrelya (bogdando) wrote :

I wonder if mismatching flavors could be the cause of timing out introspection? See https://bugs.launchpad.net/tripleo/+bug/1788875

What is outputs for

 openstack flavors list
 openstack baremetal node list
 openstack baremetal node show <insert_controller/compute_name>
?

Changed in tripleo:
milestone: stein-1 → stein-2
Changed in tripleo:
milestone: stein-2 → stein-3
Changed in tripleo:
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers