Openstack overcloud node provision fails "Failed to connect to the host via ssh: ssh: Could not resolve hostname" for a node which is undeployed using metalsmith undeploy command
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
New
|
Undecided
|
Unassigned |
Bug Description
After running metalsmith undeploy <failed node>, when i run "openstack overcloud node provision --stack overcloud --network-config --overcloud-ssh-key /home/stack/
the node provision process starts and fails after sometime in the growvols section:
PLAY [Overcloud Node Grow Volumes] *******
2022-04-18 18:34:36.538550 | 48d539a1-
[WARNING]: Reset is not implemented for this connection
[WARNING]: Reset is not implemented for this connection
[WARNING]: Reset is not implemented for this connection
[WARNING]: Reset is not implemented for this connection
[WARNING]: Reset is not implemented for this connection
[WARNING]: Reset is not implemented for this connection
2022-04-18 18:34:47.491774 | 48d539a1-
2022-04-18 18:34:47.494887 | 48d539a1-
2022-04-18 18:34:47.546795 | 48d539a1-
2022-04-18 18:34:47.548397 | 48d539a1-
2022-04-18 18:34:47.588085 | 48d539a1-
2022-04-18 18:34:47.589687 | 48d539a1-
2022-04-18 18:34:47.611106 | 48d539a1-
2022-04-18 18:34:47.612727 | 48d539a1-
2022-04-18 18:34:47.723571 | 48d539a1-
2022-04-18 18:34:47.725272 | 48d539a1-
2022-04-18 18:34:47.769155 | 48d539a1-
2022-04-18 18:34:47.770876 | 48d539a1-
2022-04-18 18:34:47.784141 | 48d539a1-
2022-04-18 18:34:48.861226 | 48d539a1-
2022-04-18 18:34:48.864844 | 48d539a1-
2022-04-18 18:34:48.898703 | 48d539a1-
2022-04-18 18:34:48.900942 | 48d539a1-
2022-04-18 18:34:48.903167 | 48d539a1-
2022-04-18 18:34:48.905050 | 48d539a1-
2022-04-18 18:34:48.931621 | 48d539a1-
2022-04-18 18:34:48.933842 | 48d539a1-
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
novacompute-5: Failed to connect to the host via ssh: ssh: Could not resolve
hostname overcloud-
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
novacompute-1: Failed to connect to the host via ssh: ssh: Could not resolve
hostname overcloud-
2022-04-18 18:34:57.088190 | 48d539a1-
2022-04-18 18:34:57.089999 | 48d539a1-
2022-04-18 18:35:01.245531 | 48d539a1-
2022-04-18 18:35:01.247346 | 48d539a1-
NO MORE HOSTS LEFT *******
PLAY RECAP *******
overcloud-
overcloud-
overcloud-
overcloud-
overcloud-
overcloud-
In this current scenario i used metalsmith undeploy command for openstack-
This issue should not occur as i have removed the 2 above mentioned compute nodes (novacompute5 and novacompute1) from the overcloud-
overcloud-
- name: Controller
count: 2
defaults:
networks:
- network: ctlplane
vif: true
- network: external
subnet: external_subnet
- network: internal_api
subnet: internal_api_subnet
- network: storage
subnet: storage_subnet
- network: storage_mgmt
subnet: storage_mgmt_subnet
- network: tenant
subnet: tenant_subnet
network_config:
template: /home/stack/
default_
- external
instances:
#- hostname: overcloud-
#name: dc2-controller1
- hostname: overcloud-
name: dc2-controller2
#- hostname: overcloud-
#name: dc1-controller1
- hostname: overcloud-
name: dc1-controller2
- name: Compute
count: 4
defaults:
networks:
- network: ctlplane
vif: true
- network: internal_api
subnet: internal_api_subnet
- network: tenant
subnet: tenant_subnet
- network: storage
subnet: storage_subnet
network_config:
template: /home/stack/
instances:
- hostname: overcloud-
name: dc2-compute1
- hostname: overcloud-
name: dc2-compute2
- hostname: overcloud-
name: dc1-compute1
- hostname: overcloud-
name: dc1-compute2
- name: CephStorage
count: 3
defaults:
networks:
- network: ctlplane
vif: true
- network: internal_api
subnet: internal_api_subnet
- network: storage
subnet: storage_subnet
- network: storage_mgmt
subnet: storage_mgmt_subnet
network_config:
template: /home/stack/
instances:
- hostname: overcloud-
name: dc2-ceph1
# - hostname: overcloud-
# name: dc2-ceph2
- hostname: overcloud-
name: dc1-ceph1
- hostname: overcloud-
name: dc1-ceph2
I have also tried adding the failed nodes as mentioned below in the yaml file:
- hostname: overcloud-
name: dc2-compute3
provisioned: false
- hostname: overcloud-
name: dc1-controller1
provisioned: false
Still the issue persists.
When i try openstack overcloud node delete or unprovision i receive an output that says 'No nodes to unprovision'
Steps to Reproduce:
1. Install openstack tripleo for wallaby
2. Introspect the nodes
3. provision network using openstack overcloud network provision
4. provision node using openstack overcloud node provision (if any node fails or if you stop the command halfway follow the next step)
5. metalsmith undeploy <failed node> (metalsmith list should show deploying for that particular node)