te-broker loses a lot of time for preparing environment
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
tripleo |
Won't Fix
|
High
|
Unassigned |
Bug Description
2018-11-02 15:09:33.314607 | primary | +(/home/
2018-11-02 15:09:33.314736 | primary | +(/home/
2018-11-02 15:29:33.316802 | primary | +(/home/
2018-11-02 15:29:33.317601 | primary | +(/home/
jobs sent request for environment at 15:09 and aborted after a timeout (1200 sec) while not getting any answer
In te-broker meanwhile:
2018-11-02 15:14:57,484 - testenv-
2018-11-02 15:14:57,485 - testenv-
2018-11-02 15:14:57,499 - testenv-
2018-11-02 15:14:57,503 - testenv-
"", "job_identifier": "614633: ovb-3ctlr_
2018-11-02 15:31:34,346 - testenv-
730 seconds passed
ERROR (ClientException): Failed to attach network adapter device to d772ba2f-
And it means that te-broker got request only in 15:14 while creating environment for 17 minutes and finishing in 15:31. In this time job doesn't wait anymore (aborted in 15.29) and te-broker fail to connect to undercloud machine (which is aborted and killed).
1. Te-broker gets request 5 minutes after job sent it
2. Too much time for creating environment and short timeout in the job
As a workaround we can increase job wait timeout to 1500.
Changed in tripleo: | |
milestone: | stein-2 → stein-3 |
https:/ /logs.rdoprojec t.org/58/ 615358/ 1/openstack- check/tripleo- ci-centos- 7-ovb-3ctlr_ 1comp-featurese t053/6f4d470/ job-output. txt.gz# _2018-11- 02_23_44_ 00_157569
testenv- worker- 24201
2018-11-02 23:44:00.157569 | primary | +(/home/ zuul/src/ git.openstack. org/openstack/ tripleo- ci/toci_ gate_test. sh:171) : ./testenv-client -b 192.168. 100.250: 4730 -t 14400 --envsize 4 --ucinstance 73fb20c2- 00d7-4a54- 86fe-28ccc45208 77 --net-iso multi-nic -- ./toci_ quickstart. sh zuul/src/ git.openstack. org/openstack/ tripleo- ci/toci_ gate_test. sh:165) : sleep 1500 zuul/src/ git.openstack. org/openstack/ tripleo- ci/toci_ gate_test. sh:165) : '[' '!' -e /tmp/toci.started ']' zuul/src/ git.openstack. org/openstack/ tripleo- ci/toci_ gate_test. sh:165) : sudo kill -9 4741
2018-11-02 23:44:00.157704 | primary | +(/home/
2018-11-03 00:09:00.159711 | primary | +(/home/
2018-11-03 00:09:00.160669 | primary | +(/home/
2018-11-02 23:38:08,574 - testenv- worker- 24201 - INFO - Starting test-env worker with data ['/opt/ stack/tripleo- ci/scripts/ te-broker/ create- env', '/opt/stack/ tripleo- ci/scripts/ te-broker/ destroy- env'] worker- 24201 - INFO - running TE worker worker- 24201 - INFO - Getting new job... worker- 24201 - INFO - Received job : {"callback_name": "callback_ 6e93b5f17161429 88629dd6ff9d0f8 b0", "extra_nodes": "0", "envsize": "4", "timeout": "14400", "create_ undercloud" : "", "job_identifier": "615358: ovb-3ctlr_ 1comp-featurese t053", "compute_envsize": "0", "net_iso": "multi-nic", "ssh_key": "", "ucinstance": "73fb20c2- 00d7-4a54- 86fe-28ccc45208 77"}
2018-11-02 23:38:08,574 - testenv-
2018-11-02 23:38:08,579 - testenv-
2018-11-02 23:44:00,244 - testenv-
.. 1100 seconds
2018-11-03 00:11:42,114 - testenv- worker- 24201 - INFO - + ENVNUM=24201
...
it took too much time again, maybe problem in gearman server or its communication(?)