GetOvercloudConfig fails when a node is blacklisted: server_id isn't found

Bug #1793605 reported by Emilien Macchi
12
This bug affects 2 people
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
High
Emilien Macchi

Bug Description

Source: https://bugzilla.redhat.com/show_bug.cgi?id=1631395

Description of problem:
While testing blacklist stack update and blacklisting some nodes, running stack update fails.

Steps to Reproduce:
1. Deploy any kind of overcloud.
2. Create blacklist.yaml that includes at least one node.
3. Trigger stack update.

Actual results:

The action raised an exception [action_ex_id=cc1e4b01-b516-4ebf-983a-b79d40d4060b, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'fd6d150b-4086-4d97-8f80-5e8b295f5242'Warning: Permanently added '192.168.24.13' (ECDSA) to the list of known hosts.

Expected results:

UPDATE_COMPLETE

Complete trace:
2018-09-13 04:54:56.958 1 WARNING mistral.executors.default_executor [req-18751b69-fb57-4b2a-bb03-36d737c97a50 c4cd5fba81064d849c2f6639f8460129 c0c3e696aeac4513b936f36cddf89371 - default default] The action raised an exception [action_ex_id=cc1e4b01-b516-4ebf-983a-b79d40d4060b, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'fd6d150b-4086-4d97-8f80-5e8b295f5242': KeyError: u'fd6d150b-4086-4d97-8f80-5e8b295f5242'
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor Traceback (most recent call last):
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor result = action.run(action_ctx)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/actions/config.py", line 76, in run
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor commit_message=message)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/utils/config.py", line 424, in download_config
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor self.write_config(stack, name, config_dir, config_type)
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor File "/usr/lib/python2.7/site-packages/tripleo_common/utils/config.py", line 298, in write_config
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor server_names[server_id],
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor KeyError: u'fd6d150b-4086-4d97-8f80-5e8b295f5242'
2018-09-13 04:54:56.958 1 ERROR mistral.executors.default_executor

Changed in tripleo:
assignee: nobody → Emilien Macchi (emilienm)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (master)

Fix proposed to branch: master
Review: https://review.openstack.org/604483

Changed in tripleo:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (master)

Reviewed: https://review.openstack.org/604483
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=272bd17c304d7d047ed75679568a09e9ebf7865b
Submitter: Zuul
Branch: master

commit 272bd17c304d7d047ed75679568a09e9ebf7865b
Author: Emilien Macchi <email address hidden>
Date: Fri Sep 21 17:32:18 2018 -0400

    config: ignore missing server_id from the stack

    When blacklisting nodes on the overcloud, we don't want to generated
    a configuration with these servers.
    This patch ignore the server when server_id can't be found in the stack
    when generating the configuration of the overcloud.
    A warning is shown so the operator knows this server isn't part of the
    configuration, probably due to blacklisting.
    If getting the server name fails for another reason than a KeyError,
    we fail the configuration generation and raise an exception with the
    error message.

    Change-Id: Ie7660894050e5eca251aaf8c10f0cc7e7d837dfc
    Closes-Bug: #1793605

Changed in tripleo:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to tripleo-common (stable/rocky)

Fix proposed to branch: stable/rocky
Review: https://review.openstack.org/605412

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to tripleo-common (stable/rocky)

Reviewed: https://review.openstack.org/605412
Committed: https://git.openstack.org/cgit/openstack/tripleo-common/commit/?id=f86b2b4c55b2aa10612388e75f604faae969971a
Submitter: Zuul
Branch: stable/rocky

commit f86b2b4c55b2aa10612388e75f604faae969971a
Author: Emilien Macchi <email address hidden>
Date: Fri Sep 21 17:32:18 2018 -0400

    config: ignore missing server_id from the stack

    When blacklisting nodes on the overcloud, we don't want to generated
    a configuration with these servers.
    This patch ignore the server when server_id can't be found in the stack
    when generating the configuration of the overcloud.
    A warning is shown so the operator knows this server isn't part of the
    configuration, probably due to blacklisting.
    If getting the server name fails for another reason than a KeyError,
    we fail the configuration generation and raise an exception with the
    error message.

    Change-Id: Ie7660894050e5eca251aaf8c10f0cc7e7d837dfc
    Closes-Bug: #1793605
    (cherry picked from commit 272bd17c304d7d047ed75679568a09e9ebf7865b)

tags: added: in-stable-rocky
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 10.0.0

This issue was fixed in the openstack/tripleo-common 10.0.0 release.

Revision history for this message
James Bagwell (jimbagwell) wrote :
Download full text (8.0 KiB)

Hello, I am still encountering this issue:

The action raised an exception [action_ex_id=bc158ba8-ea5e-42d0-9712-30d24698a364, action_cls='<class 'mistral.actions.action_factory.GetOvercloudConfig'>', attributes='{}', params='{u'container_config': u'overcloud-config', u'container': u'overcloud'}']
 u'23f1abe0-aa3f-46c3-a712-d3216b9b297d'

[stack@undercloud (stackrc) ~]$ mistral execution-get dc662a9e-91e6-45a2-a033-0300e7bdc386
+--------------------+---------------------------------------+
| Field | Value |
+--------------------+---------------------------------------+
| ID | dc662a9e-91e6-45a2-a033-0300e7bdc386 |
| Workflow ID | f3401518-86d3-4c8a-aad7-0507362bcf97 |
| Workflow name | tripleo.messaging.v1.send |
| Workflow namespace | |
| Description | sub-workflow execution |
| Task Execution ID | 9e060c2a-f011-40ff-bab2-18de8a85fa1d |
| Root Execution ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| State | ERROR |
| State info | Workflow failed due to message status |
| Created at | 2018-11-21 18:47:35 |
| Updated at | 2018-11-21 18:47:39 |
+--------------------+---------------------------------------+
[stack@undercloud (stackrc) ~]$ mistral task-get 9e060c2a-f011-40ff-bab2-18de8a85fa1d
+-----------------------+----------------------------------------------+
| Field | Value |
+-----------------------+----------------------------------------------+
| ID | 9e060c2a-f011-40ff-bab2-18de8a85fa1d |
| Name | send_message |
| Workflow name | tripleo.deployment.v1.config_download_deploy |
| Workflow namespace | |
| Workflow Execution ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| State | ERROR |
| State info | Workflow failed due to message status |
| Created at | 2018-11-21 18:47:35 |
| Updated at | 2018-11-21 18:47:39 |
+-----------------------+----------------------------------------------+
[stack@undercloud (stackrc) ~]$ mistral execution-get bd5a619c-4ba4-495e-938d-f238f027e250
+--------------------+-----------------------------------------------------------------------------------------------------------+
| Field | Value |
+--------------------+-----------------------------------------------------------------------------------------------------------+
| ID | bd5a619c-4ba4-495e-938d-f238f027e250 |
| Workflow ID | 29d5cd71-4b32-4dfc-9710-38a213b2e8e6 |
| Workflow n...

Read more...

Revision history for this message
James Bagwell (jimbagwell) wrote :

[stack@undercloud (stackrc) mistral]$ rpm -qa openstack-tripleo-common
openstack-tripleo-common-10.0.1-0.20181112071049.b8bfff8.el7.noarch

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/tripleo-common 9.5.0

This issue was fixed in the openstack/tripleo-common 9.5.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.