Scale down unreachable host fails
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Medium
|
Brendan Shephard |
Bug Description
Summary:
Scaling down a unreachable compute node fails:
Details:
The first failure comes during gather_facts:
TASK [Gathering Facts] *******
Sunday 31 May 2020 11:42:20 +1000 (0:00:00.125) 0:00:00.125 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
fatal: [overcloud-
msg: |-
Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
skip_reason: Host overcloud-
unreachable: true
This one needs to have any_errors_fatal: no
- hosts: "{{ deploy_target_host }}"
name: Gather facts from overcloud
gather_facts: yes
any_errors_fatal: no
ignore_
tags:
- facts
The next error is during ensure we get the right selinux context
This one needs both:
any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
TASK [ensure we get the right selinux context] *******
Sunday 31 May 2020 11:19:13 +1000 (0:00:00.121) 0:00:25.502 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
fatal: [overcloud-
msg: |-
Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
skip_reason: Host overcloud-
unreachable: true
Then the scaling playbook:
This one also needs any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
This is what it needs to look like:
- hosts: overcloud
name: Scaling
# NOTE(cloudnull): This is set to true explicitly so that we have up-to-date facts
# on all overcloud when performing a scaling operation.
# Without up-to-date facts, we're creating a potential failure
# scenario.
gather_facts: true
any_errors_fatal: no
ignore_
become: false
vars:
ignore_offline: true
PLAY [Scaling] *******
TASK [Gathering Facts] *******
Sunday 31 May 2020 11:19:41 +1000 (0:00:00.055) 0:00:52.763 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-
fatal: [overcloud-
msg: |-
Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
unreachable: true
Reproducer steps:
In my case, I deleted a Compute node VM and then tried to scale down:
1. Deploy with 2 compute nodes
2. Manually delete a Compute node, or ensure it is unreachable to simulate actual hardware failure
3. Try to scale down the node
Actual Results:
Scale down fails as show above
Expected Results:
Scale down needs to work even if the node is unreachable. Else we can't scale down failed hardware
Changed in tripleo: | |
milestone: | none → victoria-1 |
importance: | Undecided → Medium |
Changed in tripleo: | |
milestone: | victoria-1 → victoria-3 |
Changed in tripleo: | |
milestone: | victoria-3 → wallaby-1 |
Changed in tripleo: | |
milestone: | wallaby-1 → wallaby-2 |
Changed in tripleo: | |
milestone: | wallaby-2 → wallaby-3 |
Changed in tripleo: | |
milestone: | wallaby-3 → wallaby-rc1 |
Changed in tripleo: | |
milestone: | wallaby-rc1 → xena-1 |
Changed in tripleo: | |
status: | In Progress → Fix Released |
Fix proposed to branch: master /review. opendev. org/732000
Review: https:/