Chronyd sync fails in overcloud deployment

Bug #1820580 reported by Sagi (Sergey) Shnaidman
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Fix Released
Critical
Unassigned

Bug Description

Chronyd sync fails sometimes in multiple jobs. Last failure in OVB featureset001 master promotion job:

http://logs.rdoproject.org/openstack-periodic/git.openstack.org/openstack-infra/tripleo-ci/master/periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master/4978205/logs/undercloud/home/zuul/overcloud_deploy.log.txt.gz

2019-03-18 08:34:51 |
2019-03-18 08:34:51 | TASK [Ensure system is NTP time synced] ****************************************
2019-03-18 08:34:51 | Monday 18 March 2019 08:31:39 +0000 (0:00:00.842) 0:03:45.959 **********
2019-03-18 08:34:51 | skipping: [overcloud-novacompute-0] => {
2019-03-18 08:34:51 | "changed": false,
2019-03-18 08:34:51 | "skip_reason": "Conditional result was False"
2019-03-18 08:34:51 | }
2019-03-18 08:34:51 | changed: [overcloud-controller-1] => {
2019-03-18 08:34:51 | "changed": true,
2019-03-18 08:34:51 | "cmd": [
2019-03-18 08:34:51 | "chronyc",
2019-03-18 08:34:51 | "waitsync",
2019-03-18 08:34:51 | "20"
2019-03-18 08:34:51 | ],
2019-03-18 08:34:51 | "delta": "0:00:10.021030",
2019-03-18 08:34:51 | "end": "2019-03-18 08:31:49.485257",
2019-03-18 08:34:51 | "rc": 0,
2019-03-18 08:34:51 | "start": "2019-03-18 08:31:39.464227"
2019-03-18 08:34:51 | }
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | STDOUT:
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 2, refid: CDCE4602, correction: 0.000001539, skew: 1.234
2019-03-18 08:34:51 | changed: [overcloud-controller-2] => {
2019-03-18 08:34:51 | "changed": true,
2019-03-18 08:34:51 | "cmd": [
2019-03-18 08:34:51 | "chronyc",
2019-03-18 08:34:51 | "waitsync",
2019-03-18 08:34:51 | "20"
2019-03-18 08:34:51 | ],
2019-03-18 08:34:51 | "delta": "0:00:10.033575",
2019-03-18 08:34:51 | "end": "2019-03-18 08:31:49.541771",
2019-03-18 08:34:51 | "rc": 0,
2019-03-18 08:34:51 | "start": "2019-03-18 08:31:39.508196"
2019-03-18 08:34:51 | }
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | STDOUT:
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 2, refid: CDCE4602, correction: 0.000001224, skew: 0.375
2019-03-18 08:34:51 | fatal: [overcloud-controller-0]: FAILED! => {
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | "changed": true,
2019-03-18 08:34:51 | "cmd": [
2019-03-18 08:34:51 | "chronyc",
2019-03-18 08:34:51 | "waitsync",
2019-03-18 08:34:51 | "20"
2019-03-18 08:34:51 | ],
2019-03-18 08:34:51 | "delta": "0:03:10.205168",
2019-03-18 08:34:51 | "end": "2019-03-18 08:34:49.741343",
2019-03-18 08:34:51 | "rc": 1,
2019-03-18 08:34:51 | "start": "2019-03-18 08:31:39.536175"
2019-03-18 08:34:51 | }
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | STDOUT:
2019-03-18 08:34:51 |
2019-03-18 08:34:51 | try: 1, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 2, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 3, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 4, refid: 00000000, correction: 0.000000000, skew: 0.000
2019-03-18 08:34:51 | try: 5, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 6, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 7, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 8, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 9, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 10, refid: 00000000, correction: 0.000000001, skew: 0.000
2019-03-18 08:34:51 | try: 11, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:51 | try: 12, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:51 | try: 13, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:51 | try: 14, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:56 | try: 15, refid: 00000000, correction: 0Exception occured while running the command

Maybe it's because chronyd is not reloaded when got configured in overcloud deployment. It's already installed and runs on hosts, so it's not either reloaded or restarted when it's reconfigured.

2019-03-18 08:34:56 | Traceback (most recent call last):
2019-03-18 08:34:56 | File "/usr/lib/python2.7/site-packages/tripleoclient/command.py", line 29, in run
2019-03-18 08:34:56 | super(Command, self).run(parsed_args)
2019-03-18 08:34:56 | File "/usr/lib/python2.7/site-packages/osc_lib/command/command.py", line 41, in run
2019-03-18 08:34:56 | return super(Command, self).run(parsed_args)
2019-03-18 08:34:56 | File "/usr/lib/python2.7/site-packages/cliff/command.py", line 184, in run
2019-03-18 08:34:56 | return_code = self.take_action(parsed_args) or 0
2019-03-18 08:34:56 | File "/usr/lib/python2.7/site-packages/tripleoclient/v1/overcloud_deploy.py", line 949, in take_action
2019-03-18 08:34:56 | verbosity=self.app_args.verbose_level)
2019-03-18 08:34:56 | File "/usr/lib/python2.7/site-packages/tripleoclient/workflows/deployment.py", line 323, in config_download
2019-03-18 08:34:56 | raise exceptions.DeploymentError("Overcloud configuration failed.")
2019-03-18 08:34:56 | DeploymentError: Overcloud configuration failed.
2019-03-18 08:34:56 | Overcloud configuration failed.
2019-03-18 08:34:56 | .000000002, skew: 0.000
2019-03-18 08:34:56 | try: 16, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:56 | try: 17, refid: 00000000, correction: 0.000000002, skew: 0.000
2019-03-18 08:34:56 | try: 18, refid: 00000000, correction: 0.000000003, skew: 0.000
2019-03-18 08:34:56 | try: 19, refid: 00000000, correction: 0.000000003, skew: 0.000
2019-03-18 08:34:56 | try: 20, refid: 00000000, correction: 0.000000003, skew: 0.000
2019-03-18 08:34:56 |
2019-03-18 08:34:56 |
2019-03-18 08:34:56 | MSG:
2019-03-18 08:34:56 |
2019-03-18 08:34:56 | non-zero return code

Tags: alert ci
tags: added: ci
tags: added: promotion-blocker
Revision history for this message
Marios Andreou (marios-b) wrote :

removed promotion blocker tag as there is a green run right now in https://review.rdoproject.org/zuul/builds?job_name=periodic-tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset001-master

via sshnaidm|rover just now in #oooq

18:36 < sshnaidm|rover> marios_|ruck, well, now it doesn't fail, seems like random error
18:36 < sshnaidm|rover> marios_|ruck, I mean it doesn't block promotion currently, but should be fixed of course

tags: removed: promotion-blocker
Revision history for this message
Bogdan Dobrelya (bogdando) wrote :

So this boils down to http://paste.openstack.org/show/747980/ that seems a total nonsense

wes hayutin (weshayutin)
tags: added: alert
Revision history for this message
Alex Schultz (alex-schultz) wrote :

Talking it over on irc, it's likely a job config which is causing the job to hit something like https://bugs.launchpad.net/tripleo/+bug/1806521 due to only a single server being configured. Chrony needs mutiple servers configured (if you don't use pools) as it doesn't try to lookup the server again on failure like NTP did. This is a change from NTP. As a best practice it's a good idea to provide multiple ntp servers

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :
Revision history for this message
Marios Andreou (marios-b) wrote :

https://review.openstack.org/#/c/644378/ merged closing out for now change back if we see it and more is needed

Changed in tripleo:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.