Problems provisioning the nodes deploying with tripleo-quickstart

Bug #1951260 reported by Juan Badia Payno
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
New
Undecided
Unassigned

Bug Description

Trying to provisioning the nodes is failing:

2021-11-17 12:08:35.787312 | 002d6d46-e31e-c464-dba0-00000000000e | OK | stat overcloud-full.initrd | localhost | result={
    "changed": false,
    "invocation": {
        "module_args": {
            "checksum_algorithm": "sha1",
            "follow": false,
            "get_attributes": true,
            "get_checksum": false,
            "get_md5": false,
            "get_mime": true,
            "path": "/var/lib/ironic/images/overcloud-full.initrd"
        }
    },
    "stat": {
        "exists": false
    }
}
2021-11-17 12:08:35.788437 | 002d6d46-e31e-c464-dba0-00000000000e | TIMING | stat overcloud-full.initrd | localhost | 0:00:00.752711 | 0.18s
2021-11-17 12:08:35.791406 | 002d6d46-e31e-c464-dba0-00000000000f | TASK | Set partition file based default image
2021-11-17 12:08:35.816927 | 002d6d46-e31e-c464-dba0-00000000000f | SKIPPED | Set partition file based default image | localhost
2021-11-17 12:08:35.817931 | 002d6d46-e31e-c464-dba0-00000000000f | TIMING | Set partition file based default image | localhost | 0:00:00.782212 | 0.03s
2021-11-17 12:08:35.821422 | 002d6d46-e31e-c464-dba0-000000000010 | TASK | Set whole-disk file based default image
2021-11-17 12:08:35.846808 | 002d6d46-e31e-c464-dba0-000000000010 | SKIPPED | Set whole-disk file based default image | localhost
2021-11-17 12:08:35.847729 | 002d6d46-e31e-c464-dba0-000000000010 | TIMING | Set whole-disk file based default image | localhost | 0:00:00.812011 | 0.03s
2021-11-17 12:08:35.850917 | 002d6d46-e31e-c464-dba0-000000000011 | TASK | Set glance based default image
2021-11-17 12:08:35.875978 | 002d6d46-e31e-c464-dba0-000000000011 | SKIPPED | Set glance based default image | localhost
2021-11-17 12:08:35.876863 | 002d6d46-e31e-c464-dba0-000000000011 | TIMING | Set glance based default image | localhost | 0:00:00.841131 | 0.03s
2021-11-17 12:08:35.879619 | 002d6d46-e31e-c464-dba0-000000000013 | TASK | Expand roles
2021-11-17 12:08:35.905693 | 002d6d46-e31e-c464-dba0-000000000013 | FATAL | Expand roles | localhost | error={
    "msg": "The task includes an option with an undefined variable. The error was: 'default_image' is undefined\n\nThe error appears to be in '/usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml': line 92, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n - name: Expand roles\n ^ here\n"
}
2021-11-17 12:08:35.906712 | 002d6d46-e31e-c464-dba0-000000000013 | TIMING | Expand roles | localhost | 0:00:00.870989 | 0.03s

NO MORE HOSTS LEFT *************************************************************

PLAY RECAP *********************************************************************
localhost : ok=3 changed=0 unreachable=0 failed=1 skipped=6 rescued=0 ignored=0
2021-11-17 12:08:35.910427 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.911290 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Total Tasks: 10 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.911755 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Elapsed Time: 0:00:00.876042 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.912154 | UUID | Info | Host | Task Name | Run Time
2021-11-17 12:08:35.912860 | 002d6d46-e31e-c464-dba0-00000000000c | SUMMARY | localhost | stat overcloud-full.raw | 0.26s
2021-11-17 12:08:35.913642 | 002d6d46-e31e-c464-dba0-00000000000e | SUMMARY | localhost | stat overcloud-full.initrd | 0.18s
2021-11-17 12:08:35.914487 | 002d6d46-e31e-c464-dba0-00000000000d | SUMMARY | localhost | stat overcloud-hardened-uefi-full.raw | 0.17s
2021-11-17 12:08:35.915000 | 002d6d46-e31e-c464-dba0-000000000013 | SUMMARY | localhost | Expand roles | 0.03s
2021-11-17 12:08:35.915601 | 002d6d46-e31e-c464-dba0-00000000000f | SUMMARY | localhost | Set partition file based default image | 0.03s
2021-11-17 12:08:35.916116 | 002d6d46-e31e-c464-dba0-000000000010 | SUMMARY | localhost | Set whole-disk file based default image | 0.03s
2021-11-17 12:08:35.916592 | 002d6d46-e31e-c464-dba0-000000000008 | SUMMARY | localhost | fail | 0.03s
2021-11-17 12:08:35.917117 | 002d6d46-e31e-c464-dba0-000000000011 | SUMMARY | localhost | Set glance based default image | 0.03s
2021-11-17 12:08:35.917644 | 002d6d46-e31e-c464-dba0-000000000009 | SUMMARY | localhost | fail | 0.02s
2021-11-17 12:08:35.917905 | 002d6d46-e31e-c464-dba0-00000000000a | SUMMARY | localhost | fail | 0.02s
2021-11-17 12:08:35.918130 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ End Summary Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.918371 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ State Information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.918584 | ~~~~~~~~~~~~~~~~~~ Number of nodes which did not deploy successfully: 1 ~~~~~~~~~~~~~~~~~
2021-11-17 12:08:35.919183 | The following node(s) had failures: localhost
2021-11-17 12:08:35.919506 | ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Temporary directory [ /tmp/tripleoqs681tve ] cleaned up
Ansible execution failed. playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /tmp/tripleo0udgz6a7/ansible-playbook-command.sh
Temporary directory [ /tmp/tripleo0udgz6a7 ] cleaned up
Exception occured while running the command
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 34, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 39, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 186, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v2/overcloud_node.py", line 334, in take_action
    extra_vars=extra_vars,
  File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 723, in run_ansible_playbook
    raise RuntimeError(err_msg)
RuntimeError: Ansible execution failed. playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /tmp/tripleo0udgz6a7/ansible-playbook-command.sh
Ansible execution failed. playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /tmp/tripleo0udgz6a7/ansible-playbook-command.sh
Traceback (most recent call last):
  File "/usr/lib/python3.6/site-packages/cliff/app.py", line 407, in run_subcommand
    result = cmd.run(parsed_args)
  File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 34, in run
    super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 39, in run
    return super(Command, self).run(parsed_args)
  File "/usr/lib/python3.6/site-packages/cliff/command.py", line 186, in run
    return_code = self.take_action(parsed_args) or 0
  File "/usr/lib/python3.6/site-packages/tripleoclient/v2/overcloud_node.py", line 334, in take_action
    extra_vars=extra_vars,
  File "/usr/lib/python3.6/site-packages/tripleoclient/utils.py", line 723, in run_ansible_playbook
    raise RuntimeError(err_msg)
RuntimeError: Ansible execution failed. playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /tmp/tripleo0udgz6a7/ansible-playbook-command.sh
clean_up ProvisionNode: Ansible execution failed. playbook: /usr/share/ansible/tripleo-playbooks/cli-overcloud-node-provision.yaml, Run Status: failed, Return Code: 2, To rerun the failed command manually execute the following script: /tmp/tripleo0udgz6a7/ansible-playbook-command.sh
END return value: 1

how to reproduce:

./quickstart.sh -n -X -R master-tripleo-ci --tags all -T none --extra-vars undercloud_disk=70 --nodes config/nodes/3ctlr_1comp_3ceph.yml -p quickstart.yml 127.0.0.2 2>&1 | tee 1_provisioning.log
./quickstart.sh -R master-tripleo-ci --no-clone --tags all -I -T none --nodes config/nodes/3ctlr_1comp_3ceph.yml -p quickstart-extras-undercloud.yml 127.0.0.2 2>&1 | tee 2_undercloud.log
./quickstart.sh -R master-tripleo-ci --no-clone --tags all -I -T none --nodes config/nodes/3ctlr_1comp_3ceph.yml -p quickstart-extras-overcloud-prep.yml 127.0.0.2 2>&1 | tee 3_pre_overcloud.log
./quickstart.sh -R master-tripleo-ci --no-clone --tags overcloud-scripts -I -T none --nodes config/nodes/3ctlr_1comp_3ceph.yml -p quickstart-extras-overcloud.yml 127.0.0.2 2>&1 | tee 4_scripts.log

Log in on the undercloud:

$ export OS_CLOUD=undercloud
$ openstack overcloud network provision -y -o overcloud-networks-deployed.yaml /usr/share/openstack-tripleo-heat-templates/network-data-samples/default-network-isolation.yaml
$ cp /usr/share/openstack-tripleo-heat-templates/network-data-samples/vip-data-default-network-isolation.yaml /home/stack/
$ sed -i 's/- network: external/- ip_address: 10.0.0.5\n network: external/g' vip-data-default-network-isolation.yaml
$ openstack overcloud network vip provision -y -o overcloud-vips-deployed.yaml --stack overcloud /home/stack/vip-data-default-network-isolation.yaml
$ openstack overcloud node provision -o overcloud-baremetal-deployed.yaml --stack overcloud overcloud_baremetal_deploy.yaml

=======
Just find out that all the required images are not on the /var/lib/ironic/images/

======

The whole_disk value was not properly set on the task "Setup overcloud image upload facts"

TASK [tripleo.operator.tripleo_overcloud_image_upload : Setup overcloud image upload facts] **************************************************************************************************************************************************
task path: /root/.quickstart/share/ansible/collections/ansible_collections/tripleo/operator/roles/tripleo_overcloud_image_upload/tasks/main.yml:3
Wednesday 17 November 2021 06:40:10 -0500 (0:00:00.062) 0:00:14.538 ****
ok: [undercloud] => {
    "ansible_facts": {
        "_image_upload_cmd": " openstack overcloud image upload --http-boot $UPLOAD_HTTP_BOOT --whole-disk --local >$UPLOAD_LOG 2>&1",
        "_image_upload_env": {
            "OS_CLOUD": "undercloud",
            "UPLOAD_ARCHITECTURE": null,
            "UPLOAD_HTTP_BOOT": "/var/lib/ironic/httpboot",
            "UPLOAD_IMAGE_PATH": null,
            "UPLOAD_IMAGE_TYPE": null,
            "UPLOAD_IPA_NAME": null,
            "UPLOAD_LOCAL_PATH": null,
            "UPLOAD_LOG": "/home/stack/overcloud_image_upload.log",
            "UPLOAD_OS_IMAGE_NAME": null,
            "UPLOAD_PLATFORM": null
        }
    },
    "changed": false
}

WORKAROUND:
============

Upload the image without the --whole-disk flag
$ source /home/stack/stackrc
$ openstack overcloud image upload --http-boot=/var/lib/ironic/httpboot --local

Revision history for this message
Juan Badia Payno (jbadiapa) wrote :
Download full text (8.1 KiB)

well, The workaround makes the job fail later (still on the provisioning)

PLAY [Overcloud Node Grow Volumes] *********************************************
2021-11-17 12:37:33.240702 | 002d6d46-e31e-3db7-c63e-00000000000f | TASK | Wait for provisioned nodes to boot
<overcloud-controller-1> Attempting python interpreter discovery
<overcloud-controller-0> Attempting python interpreter discovery
<overcloud-controller-2> Attempting python interpreter discovery
<192.168.24.24> ESTABLISH LOCAL CONNECTION FOR USER: stack
<192.168.24.8> ESTABLISH LOCAL CONNECTION FOR USER: stack
<192.168.24.8> EXEC /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'python2.6'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
<192.168.24.24> EXEC /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'python2.6'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
<192.168.24.16> ESTABLISH LOCAL CONNECTION FOR USER: stack
<192.168.24.16> EXEC /bin/sh -c 'echo PLATFORM; uname; echo FOUND; command -v '"'"'/usr/bin/python'"'"'; command -v '"'"'python3.7'"'"'; command -v '"'"'python3.6'"'"'; command -v '"'"'python3.5'"'"'; command -v '"'"'python2.7'"'"'; command -v '"'"'python2.6'"'"'; command -v '"'"'/usr/libexec/platform-python'"'"'; command -v '"'"'/usr/bin/python3'"'"'; command -v '"'"'python'"'"'; echo ENDFOUND && sleep 0'
<192.168.24.8> EXEC /bin/sh -c '/usr/bin/python3.6 && sleep 0'
<192.168.24.24> EXEC /bin/sh -c '/usr/bin/python3.6 && sleep 0'
<192.168.24.16> EXEC /bin/sh -c '/usr/bin/python3.6 && sleep 0'
Using module file /usr/lib/python3.6/site-packages/ansible/modules/utilities/logic/wait_for.py
Using module file /usr/lib/python3.6/site-packages/ansible/modules/utilities/logic/wait_for.py
Pipelining is enabled.
<192.168.24.24> EXEC /bin/sh -c '/usr/libexec/platform-python && sleep 0'
Using module file /usr/lib/python3.6/site-packages/ansible/modules/utilities/logic/wait_for.py
Pipelining is enabled.
<192.168.24.16> EXEC /bin/sh -c '/usr/libexec/platform-python && sleep 0'
Pipelining is enabled.
<192.168.24.8> EXEC /bin/sh -c '/usr/libexec/platform-python && sleep 0'

2021-11-17 12:47:35.540329 | 002d6d46-e31e-3db7-c63e-00000000000f | FATAL | Wait for provisioned nodes to boot | overcloud-controller-0 | error={
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/libexec/platform-python"
    },
    "changed": false,
    "elapsed": 601,
    "invocation": {
        "module_args": {
            "active_connection_states": [
                "ESTABLISHED",
                "FIN_WAIT1",
                "FIN_WAIT2",
                "SYN_RECV",
                "SYN_SENT",
                "TIME_WAIT"
   ...

Read more...

Revision history for this message
Ronelle Landy (rlandy) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.