Environment: Stein on RHEL8 (Python3).
The "openstack overcloud deploy command" fails during step 1 (or so):
Exception occured while running the command
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/tripleoclient/command.py", line 29, in run
super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/osc_lib/command/command.py", line 41, in run
return super(Command, self).run(parsed_args)
File "/usr/lib/python3.6/site-packages/cliff/command.py", line 184, in run
return_code = self.take_action(parsed_args) or 0
File "/usr/lib/python3.6/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
verbosity=self.app_args.verbose_level)
File "/usr/lib/python3.6/site-packages/tripleoclient/workflows/deployment.py", line 323, in config_download
raise exceptions.DeploymentError("Overcloud configuration failed.")
tripleoclient.exceptions.DeploymentError: Overcloud configuration failed.
Overcloud configuration failed.
But the actual deployments is still running (in a Mistral workflow). And the deployment goes to the end eventually, successfully.
However the operators gets an error back from tripleoclient.
Looking at Mistral Engine logs:
engine.log:2019-03-15 20:35:38.214 1 INFO mistral.engine.engine_server [req-b75cf539-88db-4419-9d44-588660cb26a2 - - - - -] Received RPC request 'report_running_actions'[action_ex_ids=['11d76
cda-4906-442b-bcc5-f3e118890e55']]
engine.log:2019-03-15 20:40:58.388 1 INFO mistral.services.action_execution_checker [req-940d159f-4ca6-4d06-8351-fe9b6c118840 - - - - -] Actions executions to transit to error, because heartbeat wasn't received
Workaround: Increasing the heartbeat intervals in Mistral:
max_missed_heartbeats = 30
check_interval = 40
first_heartbeat_timeout = 7200
Reviewed: https:/ /review. openstack. org/647597 /git.openstack. org/cgit/ openstack/ tripleo- heat-templates/ commit/ ?id=374fafd66af a792ba197403b47 9dadbfa3055bce
Committed: https:/
Submitter: Zuul
Branch: master
commit 374fafd66afa792 ba197403b479dad bfa3055bce
Author: Emilien Macchi <email address hidden>
Date: Mon Mar 25 15:48:47 2019 -0400
mistral: configure heartbeat parameters to avoid action timeout
This patch configures and increases the defaults heartbeat parameters in
Mistral so we don't hit timeouts when an action in a workflow takes
times to reply back in Mistral, when deploying an Overcloud.
Parameters added:
MistralMa xMissedHeartbea ts:
description: >
details.
constraints: eckInterval:
description: >
executions every 10 seconds. When the checker runs it will
heartbeat received is older than 10 * max_missed_ heartbeats
seconds. If set to 0 then this feature is disabled.
constraints: rstHeartbeatTim eout:
description: >
first_ heartbeat_ timeout = 3600, wait 3600 seconds before
constraints:
type: number
default: 15
The maximum amount of missed heartbeats to be allowed.
If set to 0 then this feature is disabled. See check_interval for more
- range: { min: 0 }
MistralCh
type: number
default: 20
How often (in seconds) action executions are checked.
For example when check_interval is 10, check action
transit all running action executions to error if the last
- range: { min: 0 }
MistralFi
type: number
default: 3600
The first heartbeat is handled differently, to provide a
grace period in case there is no available executor to handle
the action execution. For example when
closing the action executions that never received a heartbeat.
- range: { min: 0 }
Configuration applied to Undercloud:
Maximum missed heartbeats: 30 seconds
Time between interval checks: 40 seconds
First Heartbeat timeout after 7200 seconds
Depends-On: I7a2313bed58485 e077ae210d22290 2f4f997f0f0 09547c228da226b 706383a3e20
Change-Id: Id8663e76b61c9e
Closes-Bug: #1821611