GetOvercloudConfig fails when a node is blacklisted: server_id isn't found

Bug #1793605 reported by Emilien Macchi on 2018-09-20
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
High
Emilien Macchi

Bug Description

Source: https://bugzilla.redhat.com/show_bug.cgi?id=1631395

Description of problem:
While testing blacklist stack update and blacklisting some nodes, running stack update fails.

Steps to Reproduce:
1. Deploy any kind of overcloud.
2. Create blacklist.yaml that includes at least one node.
3. Trigger stack update.

Actual results:

The action raised an exception [action_ex_id=cc1e4b01-b516-4ebf-983a-b79d40d4060b, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'fd6d150b-4086-4d97-8f80-5e8b295f5242'Warning: Permanently added '192.168.24.13' (ECDSA) to the list of known hosts.

Expected results:

UPDATE_COMPLETE

Complete trace:
2018-09-13 04:54:56.958 1 WARNING mistral.executors.default_executor [req-18751b69-fb57-4b2a-bb03-36d737c97a50 c4cd5fba81064d849c2f6639f8460129 c0c3e696aeac4513b936f36cddf89371 - default default] The action raised an exception [action_ex_id=cc1e4b01-b516-4ebf-983a-b79d40d4060b, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'fd6d150b-4086-4d97-8f80-5e8b295f5242': KeyError: u'fd6d150b-4086-4d97-8f80-5e8b295f5242'
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor Traceback (most recent call last):
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor result = action.run(action_ctx)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/config.py", line 76, in run
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor commit_message=message)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/utils/config.py", line 424, in download_config
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor self.write_config(stack, name, config_dir, config_type)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/utils/config.py", line 298, in write_config
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor server_names[server_id],
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor KeyError: u'fd6d150b-4086-4d97-8f80-5e8b295f5242'
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor

Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)

Fix proposed to branch: master
Review: https://review.openstack.org/604483

Changed in tripleo:
status: Triaged → In Progress

Reviewed: https://review.openstack.org/604483
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=272bd17c304d7d047ed75679568a09e9ebf7865b
Submitter: Zuul
Branch: master

commit 272bd17c304d7d047ed75679568a09e9ebf7865b
Author: Emilien Macchi <email address hidden>
Date: Fri Sep 21 17:32:18 2018 -0400

    config: ignore missing server_id from the stack

    When blacklisting nodes on the overcloud, we don't want to generated
    a configuration with these servers.
    This patch ignore the server when server_id can't be found in the stack
    when generating the configuration of the overcloud.
    A warning is shown so the operator knows this server isn't part of the
    configuration, probably due to blacklisting.
    If getting the server name fails for another reason than a KeyError,
    we fail the configuration generation and raise an exception with the
    error message.

    Change-Id: Ie7660894050e5eca251aaf8c10f0cc7e7d837dfc
    Closes-Bug: #1793605

Changed in tripleo:
status: In Progress → Fix Released

Reviewed: https://review.openstack.org/605412
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=f86b2b4c55b2aa10612388e75f604faae969971a
Submitter: Zuul
Branch: stable/rocky

commit f86b2b4c55b2aa10612388e75f604faae969971a
Author: Emilien Macchi <email address hidden>
Date: Fri Sep 21 17:32:18 2018 -0400

    config: ignore missing server_id from the stack

    When blacklisting nodes on the overcloud, we don't want to generated
    a configuration with these servers.
    This patch ignore the server when server_id can't be found in the stack
    when generating the configuration of the overcloud.
    A warning is shown so the operator knows this server isn't part of the
    configuration, probably due to blacklisting.
    If getting the server name fails for another reason than a KeyError,
    we fail the configuration generation and raise an exception with the
    error message.

    Change-Id: Ie7660894050e5eca251aaf8c10f0cc7e7d837dfc
    Closes-Bug: #1793605
    (cherry picked from commit 272bd17c304d7d047ed75679568a09e9ebf7865b)

tags: added: in-stable-rocky

This issue was fixed in the openstack/tripleo-common 10.0.0 release.

James Bagwell (jimbagwell) wrote :
Download full text (8.0 KiB)

Hello, I am still encountering this issue:

The action raised an exception [action_ex_id=bc158ba8-ea5e-42d0-9712-30d24698a364, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'23f1abe0-aa3f-46c3-a712-d3216b9b297d'

[stack@undercloud (stackrc) ~]$ mistral execution-get dc662a9e-91e6-45a2-a033-0300e7bdc386
+--------------------+---------------------------------------+
| Field | Value |
+--------------------+---------------------------------------+
| ID | dc662a9e-91e6-45a2-a033-0300e7bdc386 |
| Workflow ID | f3401518-86d3-4c8a-aad7-0507362bcf97 |
| Workflow name | tripleo.messaging.v1.send |
| Workflow namespace | |
| Description | sub-workflow execution |
| Task Execution ID | 9e060c2a-f011-40ff-bab2-18de8a85fa1d |
| Root Execution ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| State | ERROR |
| State info | Workflow failed due to message status |
| Created at | 2018-11-21 18:47:35 |
| Updated at | 2018-11-21 18:47:39 |
+--------------------+---------------------------------------+
[stack@undercloud (stackrc) ~]$ mistral task-get 9e060c2a-f011-40ff-bab2-18de8a85fa1d
+-----------------------+----------------------------------------------+
| Field | Value |
+-----------------------+----------------------------------------------+
| ID | 9e060c2a-f011-40ff-bab2-18de8a85fa1d |
| Name | send_message |
| Workflow name | tripleo.deployment.v1.config_download_deploy |
| Workflow namespace | |
| Workflow Execution ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| State | ERROR |
| State info | Workflow failed due to message status |
| Created at | 2018-11-21 18:47:35 |
| Updated at | 2018-11-21 18:47:39 |
+-----------------------+----------------------------------------------+
[stack@undercloud (stackrc) ~]$ mistral execution-get bd5a619c-4ba4-495e-938d-f238f027e250
+--------------------+-----------------------------------------------------------------------------------------------------------+
| Field | Value |
+--------------------+-----------------------------------------------------------------------------------------------------------+
| ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| Workflow ID | 29d5cd71-4b32-4dfc-9710-38a213b2e8e6 |
| Workflow n...

Read more...

James Bagwell (jimbagwell) wrote :

[stack@undercloud (stackrc) mistral]$ rpm -qa openstack-tripleo-common
openstack-tripleo-common-10.0.1-0.20181112071049.b8bfff8.el7.noarch

This issue was fixed in the openstack/tripleo-common 9.5.0 release.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers