Observing concurrent ansible puppet invocations

Bug #1906625 reported by Michele Baldessari
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Triaged
High
Unassigned

Bug Description

So in https://7f1d3bc811eac15dd3d0-a12e67e84744b622e6a13b507a2faa27.ssl.cf1.rackcdn.com/764782/2/check/tripleo-ci-centos-8-containers-multinode/b779d9b/logs/undercloud/home/zuul/overcloud_deploy.log we observe an odd thing. The same puppet code from the tripleo_ha_wrapper role gets invoked twice:
Dec 02 19:11:29 centos-8-ovh-gra1-0022018729 ansible-command[86253]: Invoked with _raw_params=puppet apply --detailed-exitcodes --summarize --color=false --modulepath '/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules' --tags 'pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation' -e 'include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::ovn_dbs_bundle'

and then shortly after

Dec 02 19:12:16 centos-8-ovh-gra1-0022018729 ansible-command[88379]: Invoked with _raw_params=puppet apply --detailed-exitcodes --summarize --color=false --modulepath '/etc/puppet/modules:/opt/stack/puppet-modules:/usr/share/openstack-puppet/modules' --tags 'pacemaker::resource::bundle,pacemaker::property,pacemaker::resource::ip,pacemaker::resource::ocf,pacemaker::constraint::order,pacemaker::constraint::colocation' -e 'include ::tripleo::profile::base::pacemaker; include ::tripleo::profile::pacemaker::ovn_dbs_bundle'

This should not really happen because the ovn-dbs-pacemaker-puppet.yaml invokes the role only once and the role really calls puppet only once?

This then leads to the following failure:
Error: pcs -f /var/lib/pacemaker/cib/puppet-cib-backup20201202-88391-bdv7s2 resource create ip-192.168.24.7 IPaddr2 ip=192.168.24.7 cidr_netmask=32 meta resource-stickiness=INFINITY --disabled failed: Error: 'ip-192.168.24.7' already exists. Too many tries

Revision history for this message
Alex Schultz (alex-schultz) wrote :

If a command is disconnected while executing, ansible automatically retries. It might have happened in this case.

Revision history for this message
Michele Baldessari (michele) wrote :

Thanks Alex, let's see how often we see this in the wild. I'd hope ansible would scream when doing a retry like that though?

Revision history for this message
Alex Schultz (alex-schultz) wrote :

no it's a silent thing. I was confused the first time i saw it too.

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

What do you mean "disconnected" command? The "Run init bundle puppet on the host for ..." task looks an ordinary shell task

Changed in tripleo:
milestone: none → wallaby-2
Revision history for this message
Michele Baldessari (michele) wrote :

It means the command (shell, whatever) was started on the node, but somehow the ansible connection from UC to OC got killed/reset/something and ansible does another ssh and reruns the command without logging anything or checking if the same command run previously is still running

Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

rised to high since that corner cases is really subtle and hard to debug and/or do roout cause analysis... Also I think https://bugs.launchpad.net/tripleo/+bug/1912184 is also quite a similar issue then, but for async module, when it experiences similat "disconnects" (um, async disconnects? :D)

Changed in tripleo:
importance: Medium → High
Changed in tripleo:
milestone: wallaby-2 → wallaby-3
Changed in tripleo:
milestone: wallaby-3 → wallaby-rc1
Changed in tripleo:
milestone: wallaby-rc1 → xena-1
Changed in tripleo:
milestone: xena-1 → xena-2
Changed in tripleo:
milestone: xena-2 → xena-3
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.