stx-openstack apply failed due to CallledProcessError puppet-manifest-apply

Bug #1918703 reported by Yvonne Ding
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
StarlingX
Invalid
Low
chen haochuan

Bug Description

Brief Description
-----------------
stx-openstack apply failed due to CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh' '/var/run/platform/puppet/20.12/hieradata', '192.168.204.2', 'controller', 'runtime', '/tmp/tmp_3nV4d.yaml']'

Severity
--------
Major

Expected Behavior
------------------
stx-openstack apply success

Actual Behavior
----------------
stx-openstack apply failed

Reproducibility
---------------
reproducible

System Configuration
--------------------
Regular standard 2+2

Branch/Pull Time/Commit
-----------------------
BUILD_ID="20210306T023358Z"

Timestamp/Logs
--------------
sysinv 2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host 192.168.204.2: CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/20.12/hieradata', '192.168.204.2', 'controller', 'runtime', '/tmp/tmp_3nV4d.yaml']' returned non-zero exit status 1
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common Traceback (most recent call last):
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 76, in puppet_apply_manifest
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common raise CalledProcessError(retcode, cmd)
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/20.12/hieradata', '192.168.204.2', 'controller', 'runtime', '/tmp/tmp_3nV4d.yaml']' returned non-zero exit status 1
2021-03-11 03:44:31.079 2688 ERROR sysinv.puppet.common
sysinv 2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager [-] failed to apply runtime manifest: SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager Traceback (most recent call last):
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1775, in _apply_runtime_manifest
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager hieradata_path=hieradata_path)
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 81, in puppet_apply_manifest
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager raise exception.SysinvException(_(msg))
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:44:31.082 2688 ERROR sysinv.agent.manager
sysinv 2021-03-11 03:44:31.172 2688 INFO sysinv.agent.manager [-] Caught exception. Retrying... Exception: Failed to execute runtime manifest for host 192.168.204.2
sysinv 2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp [-] Exception during message handling: SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp Traceback (most recent call last):
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/amqp.py", line 437, in _process_data
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp **args)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/openstack/common/rpc/dispatcher.py", line 172, in dispatch
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp result = getattr(proxyobj, method)(ctxt, **kwargs)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib/python2.7/site-packages/retrying.py", line 68, in wrapped_f
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp return Retrying(*dargs, **dkw).call(f, *args, **kw)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib/python2.7/site-packages/retrying.py", line 223, in call
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp return attempt.get(self._wrap_exception)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib/python2.7/site-packages/retrying.py", line 261, in get
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp six.reraise(self.value[0], self.value[1], self.value[2])
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib/python2.7/site-packages/retrying.py", line 217, in call
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp attempt = Attempt(fn(*args, **kwargs), attempt_number, False)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib/python2.7/site-packages/oslo_concurrency/lockutils.py", line 328, in inner
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp return f(*args, **kwargs)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1706, in config_apply_runtime_manifest
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp self._apply_runtime_manifest(config_dict, hieradata_path=hieradata_path)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1775, in _apply_runtime_manifest
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp hieradata_path=hieradata_path)
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 81, in puppet_apply_manifest
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp raise exception.SysinvException(_(msg))
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:44:31.173 2688 ERROR sysinv.openstack.common.rpc.amqp
sysinv 2021-03-11 03:44:35.204 2688 INFO sysinv.agent.manager [-] Agent config applied 748150cb-d1ac-42e9-895b-1aee23c6ffc1
sysinv 2021-03-11 03:50:19.078 2688 INFO sysinv.agent.manager [-] config_apply_runtime_manifest: 3cf332ae-6ff2-4644-ae16-1e3a3b4a2639 {u'classes': [u'platform::mtce::runtime'], u'force': False, u'personalities': [u'controller']} controller
sysinv 2021-03-11 03:50:19.079 2688 INFO sysinv.agent.manager [-] controller-standby or storage, mount /var/run/platform
sysinv 2021-03-11 03:50:19.109 2688 INFO sysinv.agent.manager [-] _apply_runtime_manifest with hieradata_path = '/var/run/platform/puppet/20.12/hieradata'
sysinv 2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common [-] Failed to execute runtime manifest for host 192.168.204.2: CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/20.12/hieradata', '192.168.204.2', 'controller', 'runtime', '/tmp/tmpEHuxMT.yaml']' returned non-zero exit status 1
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common Traceback (most recent call last):
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 76, in puppet_apply_manifest
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common subprocess.check_call(cmd, stdout=fnull, stderr=fnull) # pylint: disable=not-callable
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common File "/usr/lib64/python2.7/subprocess.py", line 542, in check_call
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common raise CalledProcessError(retcode, cmd)
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common CalledProcessError: Command '['/usr/local/bin/puppet-manifest-apply.sh', '/var/run/platform/puppet/20.12/hieradata', '192.168.204.2', 'controller', 'runtime', '/tmp/tmpEHuxMT.yaml']' returned non-zero exit status 1
2021-03-11 03:50:26.878 2688 ERROR sysinv.puppet.common
sysinv 2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager [-] failed to apply runtime manifest: SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager Traceback (most recent call last):
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager File "/usr/lib64/python2.7/site-packages/sysinv/agent/manager.py", line 1775, in _apply_runtime_manifest
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager hieradata_path=hieradata_path)
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager File "/usr/lib64/python2.7/site-packages/sysinv/puppet/common.py", line 81, in puppet_apply_manifest
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager raise exception.SysinvException(_(msg))
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager SysinvException: Failed to execute runtime manifest for host 192.168.204.2
2021-03-11 03:50:26.879 2688 ERROR sysinv.agent.manager

Revision history for this message
Yvonne Ding (yding) wrote :
description: updated
Ghada Khalil (gkhalil)
tags: added: stx.distro.openstack
Revision history for this message
Frank Miller (sensfan22) wrote :

@yvonne : What steps did you execute before applying stx-openstack? Did you enable swift or make any other chart changes?

Revision history for this message
Bob Church (rchurch) wrote :

Looks like the puppet error is related to mtcAgent:

2021-03-11T03:50:25.476 Info: 2021-03-11 03:50:25 +0000 /Stage[main]/Platform::Mtce/File[/etc/mtc.ini]: Filebucketed /etc/mtc.ini to puppet with sum 24f47dd95e4524c22f9afe6de16cb663
2021-03-11T03:50:25.478 Notice: 2021-03-11 03:50:25 +0000 /Stage[main]/Platform::Mtce/File[/etc/mtc.ini]/content: content changed '{md5}24f47dd95e4524c22f9afe6de16cb663' to '{md5}14d458d809427a2c2581959bb38834c1'
2021-03-11T03:50:25.480 Debug: 2021-03-11 03:50:25 +0000 /Stage[main]/Platform::Mtce/File[/etc/mtc.ini]: The container Class[Platform::Mtce] will propagate my refresh event
2021-03-11T03:50:25.482 Debug: 2021-03-11 03:50:25 +0000 Class[Platform::Mtce]: The container Stage[main] will propagate my refresh event
2021-03-11T03:50:25.484 Debug: 2021-03-11 03:50:25 +0000 Executing: '/bin/systemctl is-active crond'
2021-03-11T03:50:25.486 Debug: 2021-03-11 03:50:25 +0000 Executing: '/bin/systemctl is-enabled crond'
2021-03-11T03:50:25.488 Debug: 2021-03-11 03:50:25 +0000 Exec[signal-mtc-agent](provider=posix): Executing 'pkill -HUP mtcAgent'
2021-03-11T03:50:25.490 Debug: 2021-03-11 03:50:25 +0000 Executing: 'pkill -HUP mtcAgent'
2021-03-11T03:50:25.492 Error: 2021-03-11 03:50:25 +0000 pkill -HUP mtcAgent returned 1 instead of one of [0]
2021-03-11T03:50:25.494 /usr/share/ruby/vendor_ruby/puppet/util/errors.rb:106:in `fail'
2021-03-11T03:50:25.496 /usr/share/ruby/vendor_ruby/puppet/type/exec.rb:160:in `sync'
2021-03-11T03:50:25.498 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:236:in `sync'
2021-03-11T03:50:25.500 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:134:in `sync_if_needed'
2021-03-11T03:50:25.502 /usr/share/ruby/vendor_ruby/puppet/transaction/resource_harness.rb:88:in `block in perform_changes'

Revision history for this message
Yvonne Ding (yding) wrote :

@Frank, swift was not enabled in this case. It happened some time after STX openstack nova/neutron regression.

Revision history for this message
Yvonne Ding (yding) wrote :

@Bob, runtime manifest error please check LP https://bugs.launchpad.net/starlingx/+bug/1918722

Revision history for this message
Yvonne Ding (yding) wrote :

The issue can also be reproduced on stx "20210327T013338Z" load via following testcase,
test_openstack_services.py::test_reapply_stx_openstack_no_change[controller-0]

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Asked Eric MacDonald to have a look at the mtce errors reported here

Changed in starlingx:
assignee: nobody → Eric MacDonald (rocksolidmtce)
Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

So the issue I'm looking into is why the mtcAgent sighup config reload failed during a manifest apply.

2021-03-11T03:50:25.492 Error: 2021-03-11 03:50:25 +0000 pkill -HUP mtcAgent returned 1 instead of one of [0]

Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

The user is modifying the maintenance service parameter 'mnfa_timeout' value from the active controller-1. Which is normal and the only place it can be modified from. This change triggers the mtc in-service manifest to run to apply that change to /etc/mtc.ini

When a change like that is made, the mtcAgent on the active controller needs its configuration reloaded so that it learns of the change. This is fine on the active controller. However, the in-service manifest is also run on the standby controller so its /etc/mtc.ini is also updated.

However, the mtcAgent does not run on the standby controller therefore the manifest does not need to and should not be trying to sig-hup the mtcAgent there. That's why the manifest is reporting an error on the standby controller.

Here are the relevant logs:

2021-03-11T03:50:25.561 [2178865.00460] controller-1 mtcAgent sig daemon_signal.cpp ( 147) daemon_signal_hdlr : Info : Received SIGHUP ; Reloading Config

[cut out other config reload logs]

# here is the config reload logs that produce a customer log

2021-03-11T03:50:25.578 fmAlarmUtils.cpp(624): Sending FM raise alarm request: alarm_id (200.021), entity_id (host=controller-1.config=mnfa_timeout)
2021-03-11T03:50:25.625 fmAlarmUtils.cpp(658): FM Response for raise alarm: (0), alarm_id (200.021), entity_id (host=controller-1.config=mnfa_timeout)

| 2021-03-11T03:50:25.619752| log | 200.021 | controller-1 platform maintenance service parameter 'mnfa_timeout' changed from 120 to 0 | host=controller-1.config=mnfa_timeout | not-applicable | :50:25.619752 |

Next Steps:
1. Determine if this manifest has always behaved this way.
2. Determine if this potentially silent error has any consequence in this issue.

Revision history for this message
Eric MacDonald (rocksolidmtce) wrote :

The same behavior over a mtc service parameter modify exists in (at least) the last 3 releases.
This behavior is inconsequential and does not lead to a manifest apply error nor the root cause of this reported issue.

Need then to look back in the logs for other causes/sources of the reported issue.

Revision history for this message
Ghada Khalil (gkhalil) wrote :

Unassigning given Eric's comments that the mtcAgent is not causing the failure reported here. The distro.opesntack will need to look into this further.

Changed in starlingx:
assignee: Eric MacDonald (rocksolidmtce) → nobody
Revision history for this message
Austin Sun (sunausti) wrote :

@martin:
   could you triage this issue firstly ?
Thanks.
BR
Austin Sun.

Changed in starlingx:
assignee: nobody → chen haochuan (martin1982)
Revision history for this message
Austin Sun (sunausti) wrote :

@martin:
   any update ?

Changed in starlingx:
importance: Undecided → Low
Revision history for this message
Thales Elero Cervi (tcervi) wrote :

Closing it now due to inactivity.

Changed in starlingx:
status: New → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.