te-broker loses a lot of time for preparing environment

Bug #1801430 reported by Sagi (Sergey) Shnaidman on 2018-11-02
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
tripleo
High
Unassigned

Bug Description

https://logs.rdoproject.org/33/614633/3/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset035/c5677eb/job-output.txt.gz#_2018-11-02_15_09_33_314607

2018-11-02 15:09:33.314607 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:171): ./testenv-client -b 192.168.100.250:4730 -t 10800 --envsize 4 --ucinstance d772ba2f-052c-43fa-a44e-5cc1c1df6261 --net-iso multi-nic -- ./toci_quickstart.sh
2018-11-02 15:09:33.314736 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sleep 1200
2018-11-02 15:29:33.316802 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): '[' '!' -e /tmp/toci.started ']'
2018-11-02 15:29:33.317601 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sudo kill -9 4731

jobs sent request for environment at 15:09 and aborted after a timeout (1200 sec) while not getting any answer

In te-broker meanwhile:

2018-11-02 15:14:57,484 - testenv-worker-23915 - INFO - Starting test-env worker with data ['/opt/stack/tripleo-ci/scripts/te-broker/create-env', '/opt/stack/tripleo-ci/scripts/te-broker/destroy-env']
2018-11-02 15:14:57,485 - testenv-worker-23915 - INFO - running TE worker
2018-11-02 15:14:57,499 - testenv-worker-23915 - INFO - Getting new job...
2018-11-02 15:14:57,503 - testenv-worker-23915 - INFO - Received job : {"callback_name": "callback_0cf84ffddc6f49b2889489c41e8c78a8", "extra_nodes": "0", "envsize": "4", "timeout": "10800", "create_undercloud":
"", "job_identifier": "614633: ovb-3ctlr_1comp-featureset035", "compute_envsize": "0", "net_iso": "multi-nic", "ssh_key": "", "ucinstance": "d772ba2f-052c-43fa-a44e-5cc1c1df6261"}
2018-11-02 15:31:34,346 - testenv-worker-23915 - ERROR - + ENVNUM=23915

730 seconds passed

ERROR (ClientException): Failed to attach network adapter device to d772ba2f-052c-43fa-a44e-5cc1c1df6261 (HTTP 500) (Request-ID: req-7e857300-0b93-47c4-a61c-33ea87c1d9a2)

And it means that te-broker got request only in 15:14 while creating environment for 17 minutes and finishing in 15:31. In this time job doesn't wait anymore (aborted in 15.29) and te-broker fail to connect to undercloud machine (which is aborted and killed).

1. Te-broker gets request 5 minutes after job sent it
2. Too much time for creating environment and short timeout in the job

As a workaround we can increase job wait timeout to 1500.

https://logs.rdoproject.org/58/615358/1/openstack-check/tripleo-ci-centos-7-ovb-3ctlr_1comp-featureset053/6f4d470/job-output.txt.gz#_2018-11-02_23_44_00_157569

testenv-worker-24201

2018-11-02 23:44:00.157569 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:171): ./testenv-client -b 192.168.100.250:4730 -t 14400 --envsize 4 --ucinstance 73fb20c2-00d7-4a54-86fe-28ccc4520877 --net-iso multi-nic -- ./toci_quickstart.sh
2018-11-02 23:44:00.157704 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sleep 1500
2018-11-03 00:09:00.159711 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): '[' '!' -e /tmp/toci.started ']'
2018-11-03 00:09:00.160669 | primary | +(/home/zuul/src/git.openstack.org/openstack/tripleo-ci/toci_gate_test.sh:165): sudo kill -9 4741

2018-11-02 23:38:08,574 - testenv-worker-24201 - INFO - Starting test-env worker with data ['/opt/stack/tripleo-ci/scripts/te-broker/create-env', '/opt/stack/tripleo-ci/scripts/te-broker/destroy-env']
2018-11-02 23:38:08,574 - testenv-worker-24201 - INFO - running TE worker
2018-11-02 23:38:08,579 - testenv-worker-24201 - INFO - Getting new job...
2018-11-02 23:44:00,244 - testenv-worker-24201 - INFO - Received job : {"callback_name": "callback_6e93b5f1716142988629dd6ff9d0f8b0", "extra_nodes": "0", "envsize": "4", "timeout": "14400", "create_undercloud": "", "job_identifier": "615358: ovb-3ctlr_1comp-featureset053", "compute_envsize": "0", "net_iso": "multi-nic", "ssh_key": "", "ucinstance": "73fb20c2-00d7-4a54-86fe-28ccc4520877"}

.. 1100 seconds

2018-11-03 00:11:42,114 - testenv-worker-24201 - INFO - + ENVNUM=24201
...

it took too much time again, maybe problem in gearman server or its communication(?)

Changed in tripleo:
milestone: stein-2 → stein-3
wes hayutin (weshayutin) wrote :

te-broker has EOL

Changed in tripleo:
status: Triaged → Won't Fix
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers