Comment 15 for bug 1653737

Revision history for this message
Vladimir Sharshov (vsharshov) wrote :

I've checked the log and see strong connection problem between master node and node. Astute try to connect and check status of node from

2017-01-27 14:55:55 INFO [29965] Start puppet with timeout 5400 sec. Node 1, task provision_1, manifest /etc/puppet/shell_manifests/provision_1_manifest.pp
2017-01-27 14:56:06 DEBUG [29965] Retry #1 to run mcollective agent on nodes: '1'
2017-01-27 14:56:17 DEBUG [29965] Retry #2 to run mcollective agent on nodes: '1'
2017-01-27 14:56:28 DEBUG [29965] Retry #3 to run mcollective agent on nodes: '1'
2017-01-27 14:56:38 DEBUG [29965] Retry #4 to run mcollective agent on nodes: '1'
2017-01-27 14:56:48 DEBUG [29965] Retry #5 to run mcollective agent on nodes: '1'
2017-01-27 14:56:58 DEBUG [29965] Retry #6 to run mcollective agent on nodes: '1'

It is repeat tries 3 times which has 6 tries inside.

2017-01-27 15:02:09 DEBUG [29965] Puppet on node has undefined status. 2 retries remained. Node 1, task provision_1, manifest /etc/puppet/shell_manifests/provision_1_manifest.pp
2017-01-27 15:07:03 DEBUG [29965] Puppet on node has undefined status. 1 retries remained. Node 1, task provision_1, manifest /etc/puppet/shell_manifests/provision_1_manifest.pp
2017-01-27 15:11:57 DEBUG [29965] Puppet on node has undefined status. 0 retries remained. Node 1, task provision_1, manifest /etc/puppet/shell_manifests/provision_1_manifest.pp
2017-01-27 15:16:48 ERROR [29965] Node 1, task provision_1, manifest /etc/puppet/shell_manifests/provision_1_manifest.pp, status: undefined

Same behavior for all 4 nodes.

Resolution: Astute done 12 tries for every 4 nodes and it takes 21 minutes. Nodes do not answered.
Looks like we have serious problem with network/provision client/mcollective.

Interesting details: we have error on mcollective log on node:
2017-01-27T15:15:36.255295+00:00 debug: 15:15:35.807222 #1693] DEBUG -- : runner.rb:54:in `block in run' PLMC6: Message does not pass filters, ignoring

Looking for the code of mcollective: https://github.com/puppetlabs/marionette-collective/blob/master/lib/mcollective/runner.rb#L196 looks like we can get such error if node id changed somehow to unexpected.

Can you provide access to env where error was reproduced?