Master/stein check and promotion OVB jobs are randomly giving ansible time out while overcloud deploy
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Fix Released
|
Critical
|
Ronelle Landy |
Bug Description
We are seeing randomly Ansible timeout issue on following jobs while deploying overcloud.
In fs02:
https:/
2019-08-14 04:11:40 | changed: [overcloud-
2019-08-14 04:11:40 | "changed": true,
2019-08-14 04:11:40 | "cmd": "/bin/rsync --delay-updates -F --compress --archive --out-format=
2019-08-14 04:11:40 | "rc": 0
2019-08-14 04:11:40 | }
2019-08-14 04:11:40 |
2019-08-14 04:11:40 | MSG:
2019-08-14 04:11:40 |
2019-08-14 04:11:40 | .d..t...... ./
2019-08-14 04:11:40 | cd+++++++++ facter/
2019-08-14 04:11:40 | cd+++++++++ facter/cache/
2019-08-14 04:11:40 | cd+++++++++ facter/
2019-08-14 04:11:40 | >f+++++++++ facter/
2019-08-14 04:11:40 | >f+++++++++ facter/
2019-08-14 04:11:40 | >f+++++++++ facter/
2019-08-14 04:11:40 | >f+++++++++ facter/
2019-08-14 04:11:40 | >f+++++++++ facter/
2019-08-14 04:11:40 |
2019-08-14 04:11:40 |
2019-08-14 04:11:40 | TASK [Run container-puppet tasks (generate config) during step 1] **************
2019-08-14 04:11:40 | Wednesday 14 August 2019 03:55:36 +0000 (0:00:00.647) 0:44:07.399 ******
2019-08-14 04:11:40 | ok: [overcloud-
2019-08-14 04:11:40 |
2019-08-14 04:11:40 | "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
2019-08-14 04:11:40 | "changed": false
2019-08-14 04:11:40 | }
2019-08-14 04:11:40 |
2019-08-14 04:11:40 | Ansible timed out at 3612 seconds.
2019-08-14 04:11:40 | + status_code=1
2019-08-14 04:11:40 | + openstack stack list
2019-08-14 03:57:14 | TASK [Run container-puppet tasks (generate config) during step 1] **************
2019-08-14 03:57:14 | Wednesday 14 August 2019 03:40:59 +0000 (0:00:00.893) 0:44:34.074 ******
2019-08-14 03:57:14 | ok: [overcloud-
2019-08-14 03:57:14 |
2019-08-14 03:57:14 | "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
2019-08-14 03:57:14 | "changed": false
2019-08-14 03:57:14 | }
2019-08-14 03:57:14 | ok: [overcloud-
2019-08-14 03:57:14 |
2019-08-14 03:57:14 | "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
2019-08-14 03:57:14 | "changed": false
2019-08-14 03:57:14 | }
2019-08-14 03:57:14 | ok: [overcloud-
2019-08-14 03:57:14 | "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
2019-08-14 03:57:14 | "changed": false
2019-08-14 03:57:14 | }
2019-08-14 03:57:14 |
2019-08-14 03:57:14 | Ansible timed out at 3652 seconds.
and fs01 in check job:
We need to find out why ansible timeout is happening randomly.
summary: |
Master check and promotion jobs are giving ansible time out while - overcloud deploy + overcloud deploy in fs01/02/35 |
summary: |
- Master check and promotion jobs are giving ansible time out while - overcloud deploy in fs01/02/35 + Master check and promotion OVB jobs are randomly giving ansible time out + while overcloud deploy |
Changed in tripleo: | |
milestone: | train-3 → train-rc1 |
Changed in tripleo: | |
milestone: | train-rc1 → ussuri-1 |
Changed in tripleo: | |
milestone: | ussuri-1 → ussuri-2 |
Changed in tripleo: | |
status: | Triaged → Fix Released |
So noticed it's taking time for network operations(example: docker pull) on an overcloud node.
So for some images it takes time > 10 minutes(ideally should take few seconds as it's local network).
so on affected node all docker pull takes time.
Good to check if it's master only or affects all releases to properly isolate the issue.
Example 1(14 minute 21 second):- https:/ /logs.rdoprojec t.org/openstack -periodic- master/ opendev. org/openstack/ tripleo- ci/master/ periodic- tripleo- ci-centos- 7-ovb-1ctlr_ 1comp-featurese t002-master- upload/ edd7af2/ logs/undercloud /home/zuul/ overcloud_ deploy. log.txt. gz#_2019- 08-14_03_ 45_18 controller- 0] => { 24.1:8787/ tripleomaster/ centos- binary- cinder- volume: dff0f7b2c1ce50d 9929321f65f1ab2 90f2d042a9_ e8dbf8af- updated- 20190814013900" ,
2019-08-14 03:45:18 | changed: [overcloud-
2019-08-14 03:45:18 |
2019-08-14 03:45:18 | "changed": true,
2019-08-14 03:45:18 | "cmd": "docker pull 192.168.
2019-08-14 03:45:18 | "delta": "0:14:21.982103",
2019-08-14 03:45:18 | "end": "2019-08-14 03:39:27.913156",
2019-08-14 03:45:18 | "rc": 0,
2019-08-14 03:45:18 | "start": "2019-08-14 03:25:05.931053"
2019-08-14 03:45:18 | }
Example 2(10 minute 32 second):- https:/ /logs.rdoprojec t.org/64/ 676364/ 1/openstack- check/tripleo- ci-centos- 7-ovb-3ctlr_ 1comp-featurese t001/19bb5b1/ logs/undercloud /home/zuul/ overcloud_ deploy. log.txt. gz#_2019- 08-15_16_ 38_15
2019-08-15 16:38:15 | changed: [overcloud- controller- 2] => { 24.1:8787/ tripleomaster/ centos- binary- cinder- volume: 6cd9e55d69e90ea 73219abced4e9a3 f2372b204e_ 10e135ca- updated- 20190815144647" ,
2019-08-15 16:38:15 |
2019-08-15 16:38:15 | "changed": true,
2019-08-15 16:38:15 | "cmd": "docker pull 192.168.
2019-08-15 16:38:15 | "delta": "0:10:32.870128",
2019-08-15 16:38:15 | "end": "2019-08-15 16:37:34.438094",
2019-08-15 16:38:15 | "rc": 0,
2019-08-15 16:38:15 | "start": "2019-08-15 16:27:01.567966"
2019-08-15 16:38:15 | }