tripleo

Restart chronyd execution is skipped on overcloud-novacompute-0

Bug #1821018 reported by Ronelle Landy on 2019-03-20

This bug affects 1 person

Affects		Status	Importance	Assigned to	Milestone
	tripleo	Fix Released	Undecided	Unassigned	tripleo stein-rc1

Bug Description

This problem was initially raised in https://bugs.launchpad.net/tripleo/+bug/1820580 - but this bug was marked fixed when multiple ntp servers were passed. However, the problem remains that the chronyd service is not restarted after configuration on novacompute nodes.

Looking at an OVB overcloud_deploy.log, RUNNING HANDLER [chrony : Restart chronyd] is shown twice - once when it runs on the controller nodes after chrony is configured (and executes) and a second time when it is supposed to run on the novacompute node, after chrony is configured, but is skipped:

2019-03-19 15:54:10 | RUNNING HANDLER [chrony : Restart chronyd] *************************************
2019-03-19 15:54:10 | Tuesday 19 March 2019 15:53:40 +0000 (0:00:00.327) 0:04:12.296 *********
2019-03-19 15:54:10 | skipping: [overcloud-novacompute-0] => {
2019-03-19 15:54:10 | "changed": false,
2019-03-19 15:54:10 | "skip_reason": "Conditional result was False"

Skipping the restart causes the overcloud deployment on baremetal (an internal system) to fail because if the restart is skipped the custom servers in the configuration are never picked up and the default servers are blocked from internal systems.

See the logs copied below (from OVB - the deploy passes here):

Runs on controller nodes:
https://logs.rdoproject.org/60/636860/5/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/c731cdf/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-03-20_05_23_46

Is skipped on novacompute node:
https://logs.rdoproject.org/60/636860/5/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001/c731cdf/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-03-20_05_24_47

On baremetal, the failure is shown (note that multiple ntp servers and dns servers are passed to this overcloud deployment)
https://sf.hosted.upshift.rdu2.redhat.com/logs/25/165325/13/check/periodic-tripleo-ci-centos-7-baremetal-single_nic-3ctlr_1comp-featureset001-master/d299874/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz#_2019-03-19_00_52_24

Since the restart is called via a handler from an included role:

https://github.com/openstack/ansible-role-chrony/blob/master/handlers/main.yml
https://github.com/openstack/tripleo-heat-templates/blob/master/deployment/timesync/chrony-baremetal-ansible.yaml#L121

possibly, this is the result on an ansible bug in the version used:

[zuul@undercloud ~]$ ansible --version
ansible 2.6.14
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/zuul/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.5 (default, Oct 30 2018, 23:45:53) [GCC 4.8.5 20150623 (Red Hat 4.8.5-36)]