Comment 4 for bug 1638908

Revision history for this message
Alfredo Moralejo (amoralej) wrote :

There was an attempt to fix it for the swift upload case by increasing haproxy timeouts in:

https://review.openstack.org/#/c/389737/

That, apparently, fixed the issue, but maybe not completely.

In the examples posted in the description, error occurred when:

- tripleoclient tried to connect to heat service using https on haproxy.
- heat tried to connect to nova using http on haproxy

Digging a bit on error timestamps in logs in https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal-651/undercloud/var/log/heat/heat-engine.log.gz:

2016-11-03 04:48:46.279 20747 DEBUG heat.engine.scheduler [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] Task stack_task from Stack "overcloud-Controller-3sktk7mhr5kq-0-7
mi5p4zfscxn" [75257dba-11c9-4571-90ea-bb51dbd2ac6e] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:215

...

2016-11-03 04:49:57.695 20746 DEBUG heat.engine.scheduler [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] Task stack_task from Stack "overcloud-Compute-iqruwqc7bwfk" [8e3b
edf5-503a-4718-bab7-c21a173bc0af] sleeping _sleep /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:156
2016-11-03 04:49:46.283 20747 INFO heat.engine.resource [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] CREATE: ServerUpdateAllowed "Controller" [da8fcc62-7c62-4e31-96ce-5
896d038dec7] Stack "overcloud-Controller-3sktk7mhr5kq-0-7mi5p4zfscxn" [75257dba-11c9-4571-90ea-bb51dbd2ac6e]
2016-11-03 04:49:46.283 20747 ERROR heat.engine.resource ConnectFailure: Unable to establish connection to http://192.168.24.3:8774/v2.1/servers/da8fcc62-7c62-4e31-96ce-5896d038dec7: ('Connection aborted.', BadStatusLine("''",))
2016-11-03 04:49:58.161 20747 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: 2e9d442ef54649fca87418dd0ba0af52 reply to reply_5fe11ae54cb2455f87b25fa4e66fcb3f __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:194

We could infere that it's hitting a 10s timeout in haproxy. Taking a look into https://github.com/openstack/puppet-tripleo/blob/master/manifests/haproxy.pp#L38-L40 , both connect and http-request are 10s by default. I'd suggest to increase them to something like 20 seconds and test, wdyt?