tripleo

Scale down unreachable host fails

Bug #1881452 reported by Brendan Shephard on 2020-05-31

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Medium	Brendan Shephard	tripleo xena-1

Bug Description

Summary:
Scaling down a unreachable compute node fails:

Details:

The first failure comes during gather_facts:
TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:42:20 +1000 (0:00:00.125) 0:00:00.125 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  skip_reason: Host overcloud-novacompute-1 is unreachable
  unreachable: true

This one needs to have any_errors_fatal: no
- hosts: "{{ deploy_target_host }}"
  name: Gather facts from overcloud
  gather_facts: yes
  any_errors_fatal: no
  ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
  tags:
    - facts

The next error is during ensure we get the right selinux context
This one needs both:
any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"

TASK [ensure we get the right selinux context] ***********************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:19:13 +1000 (0:00:00.121) 0:00:25.502 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  skip_reason: Host overcloud-novacompute-1 is unreachable
  unreachable: true

Then the scaling playbook:
This one also needs any_errors_fatal: no
and
ignore_unreachable: "{{ ignore_unreachable | default(false) }}"

This is what it needs to look like:

- hosts: overcloud
  name: Scaling
  # NOTE(cloudnull): This is set to true explicitly so that we have up-to-date facts
  # on all overcloud when performing a scaling operation.
  # Without up-to-date facts, we're creating a potential failure
  # scenario.
  gather_facts: true
  any_errors_fatal: no
  ignore_unreachable: "{{ ignore_unreachable | default(false) }}"
  become: false
  vars:
    ignore_offline: true

PLAY [Scaling] *******************************************************************************************************************************************************************************************************************************

TASK [Gathering Facts] ***********************************************************************************************************************************************************************************************************************
Sunday 31 May 2020 11:19:41 +1000 (0:00:00.055) 0:00:52.763 ************
[WARNING]: Unhandled error in Python interpreter discovery for host overcloud-novacompute-1: Failed to connect to the host via ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
fatal: [overcloud-novacompute-1]: UNREACHABLE! => changed=false
  msg: |-
    Data could not be sent to remote host "192.168.24.13". Make sure this host can be reached over ssh: ssh: connect to host 192.168.24.13 port 22: No route to host
  unreachable: true

Reproducer steps:
In my case, I deleted a Compute node VM and then tried to scale down:

1. Deploy with 2 compute nodes
2. Manually delete a Compute node, or ensure it is unreachable to simulate actual hardware failure
3. Try to scale down the node

Actual Results:
Scale down fails as show above

Expected Results:
Scale down needs to work even if the node is unreachable. Else we can't scale down failed hardware

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-05-31: Fix proposed to tripleo-heat-templates (master)

Fix proposed to branch: master
Review: https://review.opendev.org/732000

Changed in tripleo:
assignee:	nobody → Brendan Shephard (bshephar)
status:	New → In Progress

Emilien Macchi (emilienm) on 2020-06-01

Changed in tripleo:
milestone:	none → victoria-1
importance:	Undecided → Medium

Revision history for this message

Rabi Mishra (rabi) wrote on 2020-06-01:

I would prefer we review/merge https://review.opendev.org/#/c/723382/ instead.

Revision history for this message

OpenStack Infra (hudson-openstack) wrote on 2020-06-01: Change abandoned on tripleo-heat-templates (master)

Change abandoned by Emilien Macchi (<email address hidden>) on branch: master
Review: https://review.opendev.org/732000
Reason: thanks for your contribution Brendan, let's use ther other patch mentionned earlier.

Emilien Macchi (emilienm) on 2020-07-28

Changed in tripleo:
milestone:	victoria-1 → victoria-3

Marios Andreou (marios-b) on 2020-11-03

Changed in tripleo:
milestone:	victoria-3 → wallaby-1

Marios Andreou (marios-b) on 2020-12-08

Changed in tripleo:
milestone:	wallaby-1 → wallaby-2

Marios Andreou (marios-b) on 2021-01-29

Changed in tripleo:
milestone:	wallaby-2 → wallaby-3

Marios Andreou (marios-b) on 2021-03-17

Changed in tripleo:
milestone:	wallaby-3 → wallaby-rc1

Marios Andreou (marios-b) on 2021-05-06

Changed in tripleo:
milestone:	wallaby-rc1 → xena-1

Brendan Shephard (bshephar) on 2021-06-04

Changed in tripleo:
status:	In Progress → Fix Released

Report a bug

This report contains Public information

Everyone can see this information.

You are

Subscribing...

Edit bug mail

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.