Scale down unreachable host fails

Bug #1881452 reported by Brendan Shephard
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Medium
Brendan Shephard

Bug Description

Summary:
Scaling down a unreachable compute node fails:

Details:

The first failure comes during gather_facts:
TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:42:20 +1000 (0:00:00.125) 0:00:00.125 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  skip_reason: Host overcloud-novacompute-1 is unreachable
  unreachable: true

This one needs to have any_errors_fatal: no
- hosts: "{{ deploy_target_host }}"
  name: Gather facts from overcloud
  gather_facts: yes
  any_errors_fatal: no
  ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
  tags:
    - facts

The next error is during ensure we get the right selinux context
This one needs both:
any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"

TASK [ensure we get the right selinux context] ***********************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:19:13 +1000 (0:00:00.121) 0:00:25.502 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  skip_reason: Host overcloud-novacompute-1 is unreachable
  unreachable: true

  Then the scaling playbook:
This one also needs any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"

This is what it needs to look like:

- hosts: overcloud
  name: Scaling
  # NOTE(cloudnull): This is set to true explicitly so that we have up-to-date facts
  # on all overcloud when performing a scaling operation.
  # Without up-to-date facts, we're creating a potential failure
  # scenario.
  gather_facts: true
  any_errors_fatal: no
  ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
  become: false
  vars:
    ignore_offline: true

  PLAY [Scaling] *******************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:19:41 +1000 (0:00:00.055) 0:00:52.763 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  unreachable: true

Reproducer steps:
In my case, I deleted a Compute node VM and then tried to scale down:

1. Deploy with 2 compute nodes
2. Manually delete a Compute node, or ensure it is unreachable to simulate actual hardware failure
3. Try to scale down the node

Actual Results:
Scale down fails as show above

Expected Results:
Scale down needs to work even if the node is unreachable. Else we can't scale down failed hardware

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/732000

Changed in tripleo:
assignee: nobody → Brendan Shephard (bshephar)
status: New → In Progress
Changed in tripleo:
milestone: none → victoria-1
importance: Undecided → Medium
Revision history for this message
Rabi Mishra (rabi) wrote :

I would prefer we review/merge https://review.opendev.org/#/c/723382/ instead.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/732000
Reason: thanks for your contribution Brendan, let's use ther other patch mentionned earlier.

Changed in tripleo:
milestone: victoria-1 → victoria-3
Changed in tripleo:
milestone: victoria-3 → wallaby-1
Changed in tripleo:
milestone: wallaby-1 → wallaby-2
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
status: In Progress → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.