te-broker loses a lot of time for preparing environment

Bug #1801430 reported by Sagi (Sergey) Shnaidman
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
Won't Fix
High
Unassigned

Bug Description

https://logs.rdoproject.org/33/614633/3/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c5677eb/job-output.txt.gz#_2018-11-02_15_09_33_314607

2018-11-02 15:09:33.314607 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:171): ./testenv-client -b 192.168.100.250:4730 -t 10800 --envsize 4 --ucinstance d772ba2f-052c-43fa-a44e-5cc1c1df6261 --net-iso multi-nic -- ./toci_quickstart.sh
2018-11-02 15:09:33.314736 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sleep 1200
2018-11-02 15:29:33.316802 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): '[' '!' -e /tmp/toci.started ']'
2018-11-02 15:29:33.317601 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sudo kill -9 4731

jobs sent request for environment at 15:09 and aborted after a timeout (1200 sec) while not getting any answer

In te-broker meanwhile:

2018-11-02 15:14:57,484 - testenv-worker-23915 - INFO - Starting test-env worker with data ['/opt/stack/tripleo-ci/scripts/te-broker/create-env', '/opt/stack/tripleo-ci/scripts/te-broker/destroy-env']
2018-11-02 15:14:57,485 - testenv-worker-23915 - INFO - running TE worker
2018-11-02 15:14:57,499 - testenv-worker-23915 - INFO - Getting new job...
2018-11-02 15:14:57,503 - testenv-worker-23915 - INFO - Received job : {"callback_name": "callback_0cf84ffddc6f49b2889489c41e8c78a8", "extra_nodes": "0", "envsize": "4", "timeout": "10800", "create_undercloud":
"", "job_identifier": "614633: ovb-3ctlr_1comp-featureset035", "compute_envsize": "0", "net_iso": "multi-nic", "ssh_key": "", "ucinstance": "d772ba2f-052c-43fa-a44e-5cc1c1df6261"}
2018-11-02 15:31:34,346 - testenv-worker-23915 - ERROR - + ENVNUM=23915

730 seconds passed

ERROR (ClientException): Failed to attach network adapter device to d772ba2f-052c-43fa-a44e-5cc1c1df6261 (HTTP 500) (Request-ID: req-7e857300-0b93-47c4-a61c-33ea87c1d9a2)

And it means that te-broker got request only in 15:14 while creating environment for 17 minutes and finishing in 15:31. In this time job doesn't wait anymore (aborted in 15.29) and te-broker fail to connect to undercloud machine (which is aborted and killed).

1. Te-broker gets request 5 minutes after job sent it
2. Too much time for creating environment and short timeout in the job

As a workaround we can increase job wait timeout to 1500.

Revision history for this message
Sagi (Sergey) Shnaidman (sshnaidm) wrote :

https://logs.rdoproject.org/58/615358/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/6f4d470/job-output.txt.gz#_2018-11-02_23_44_00_157569

testenv-worker-24201

2018-11-02 23:44:00.157569 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:171): ./testenv-client -b 192.168.100.250:4730 -t 14400 --envsize 4 --ucinstance 73fb20c2-00d7-4a54-86fe-28ccc4520877 --net-iso multi-nic -- ./toci_quickstart.sh
2018-11-02 23:44:00.157704 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sleep 1500
2018-11-03 00:09:00.159711 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): '[' '!' -e /tmp/toci.started ']'
2018-11-03 00:09:00.160669 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sudo kill -9 4741

2018-11-02 23:38:08,574 - testenv-worker-24201 - INFO - Starting test-env worker with data ['/opt/stack/tripleo-ci/scripts/te-broker/create-env', '/opt/stack/tripleo-ci/scripts/te-broker/destroy-env']
2018-11-02 23:38:08,574 - testenv-worker-24201 - INFO - running TE worker
2018-11-02 23:38:08,579 - testenv-worker-24201 - INFO - Getting new job...
2018-11-02 23:44:00,244 - testenv-worker-24201 - INFO - Received job : {"callback_name": "callback_6e93b5f1716142988629dd6ff9d0f8b0", "extra_nodes": "0", "envsize": "4", "timeout": "14400", "create_undercloud": "", "job_identifier": "615358: ovb-3ctlr_1comp-featureset053", "compute_envsize": "0", "net_iso": "multi-nic", "ssh_key": "", "ucinstance": "73fb20c2-00d7-4a54-86fe-28ccc4520877"}

.. 1100 seconds

2018-11-03 00:11:42,114 - testenv-worker-24201 - INFO - + ENVNUM=24201
...

it took too much time again, maybe problem in gearman server or its communication(?)

Changed in tripleo:
milestone: stein-2 → stein-3
Revision history for this message
wes hayutin (weshayutin) wrote :

te-broker has EOL

Changed in tripleo:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.