There was an attempt to fix it for the swift upload case by increasing haproxy timeouts in:
https://review.openstack.org/#/c/389737/
That, apparently, fixed the issue, but maybe not completely.
In the examples posted in the description, error occurred when:
- tripleoclient tried to connect to heat service using https on haproxy. - heat tried to connect to nova using http on haproxy
Digging a bit on error timestamps in logs in https://ci.centos.org/artifacts/rdo/jenkins-tripleo-quickstart-promote-master-delorean-minimal-651/undercloud/var/log/heat/heat-engine.log.gz:
2016-11-03 04:48:46.279 20747 DEBUG heat.engine.scheduler [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] Task stack_task from Stack "overcloud-Controller-3sktk7mhr5kq-0-7 mi5p4zfscxn" [75257dba-11c9-4571-90ea-bb51dbd2ac6e] running step /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:215
...
2016-11-03 04:49:57.695 20746 DEBUG heat.engine.scheduler [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] Task stack_task from Stack "overcloud-Compute-iqruwqc7bwfk" [8e3b edf5-503a-4718-bab7-c21a173bc0af] sleeping _sleep /usr/lib/python2.7/site-packages/heat/engine/scheduler.py:156 2016-11-03 04:49:46.283 20747 INFO heat.engine.resource [req-05836d4e-16ec-4fa8-904c-ffde74d58b61 a0374c3772a64c76a96872a6190f03e6 2316dad07f6f48049d1cb84ab7a453ee - - -] CREATE: ServerUpdateAllowed "Controller" [da8fcc62-7c62-4e31-96ce-5 896d038dec7] Stack "overcloud-Controller-3sktk7mhr5kq-0-7mi5p4zfscxn" [75257dba-11c9-4571-90ea-bb51dbd2ac6e] 2016-11-03 04:49:46.283 20747 ERROR heat.engine.resource ConnectFailure: Unable to establish connection to http://192.168.24.3:8774/v2.1/servers/da8fcc62-7c62-4e31-96ce-5896d038dec7: ('Connection aborted.', BadStatusLine("''",)) 2016-11-03 04:49:58.161 20747 DEBUG oslo_messaging._drivers.amqpdriver [-] received message msg_id: 2e9d442ef54649fca87418dd0ba0af52 reply to reply_5fe11ae54cb2455f87b25fa4e66fcb3f __call__ /usr/lib/python2.7/site-packages/oslo_messaging/_drivers/amqpdriver.py:194
We could infere that it's hitting a 10s timeout in haproxy. Taking a look into https://github.com/openstack/puppet-tripleo/blob/master/manifests/haproxy.pp#L38-L40 , both connect and http-request are 10s by default. I'd suggest to increase them to something like 20 seconds and test, wdyt?
There was an attempt to fix it for the swift upload case by increasing haproxy timeouts in:
https:/ /review. openstack. org/#/c/ 389737/
That, apparently, fixed the issue, but maybe not completely.
In the examples posted in the description, error occurred when:
- tripleoclient tried to connect to heat service using https on haproxy.
- heat tried to connect to nova using http on haproxy
Digging a bit on error timestamps in logs in https:/ /ci.centos. org/artifacts/ rdo/jenkins- tripleo- quickstart- promote- master- delorean- minimal- 651/undercloud/ var/log/ heat/heat- engine. log.gz:
2016-11-03 04:48:46.279 20747 DEBUG heat.engine. scheduler [req-05836d4e- 16ec-4fa8- 904c-ffde74d58b 61 a0374c3772a64c7 6a96872a6190f03 e6 2316dad07f6f480 49d1cb84ab7a453 ee - - -] Task stack_task from Stack "overcloud- Controller- 3sktk7mhr5kq- 0-7 11c9-4571- 90ea-bb51dbd2ac 6e] running step /usr/lib/ python2. 7/site- packages/ heat/engine/ scheduler. py:215
mi5p4zfscxn" [75257dba-
...
2016-11-03 04:49:57.695 20746 DEBUG heat.engine. scheduler [req-05836d4e- 16ec-4fa8- 904c-ffde74d58b 61 a0374c3772a64c7 6a96872a6190f03 e6 2316dad07f6f480 49d1cb84ab7a453 ee - - -] Task stack_task from Stack "overcloud- Compute- iqruwqc7bwfk" [8e3b 4718-bab7- c21a173bc0af] sleeping _sleep /usr/lib/ python2. 7/site- packages/ heat/engine/ scheduler. py:156 resource [req-05836d4e- 16ec-4fa8- 904c-ffde74d58b 61 a0374c3772a64c7 6a96872a6190f03 e6 2316dad07f6f480 49d1cb84ab7a453 ee - - -] CREATE: ServerUpdateAllowed "Controller" [da8fcc62- 7c62-4e31- 96ce-5 Controller- 3sktk7mhr5kq- 0-7mi5p4zfscxn" [75257dba- 11c9-4571- 90ea-bb51dbd2ac 6e] resource ConnectFailure: Unable to establish connection to http:// 192.168. 24.3:8774/ v2.1/servers/ da8fcc62- 7c62-4e31- 96ce-5896d038de c7: ('Connection aborted.', BadStatusLine( "''",)) _drivers. amqpdriver [-] received message msg_id: 2e9d442ef54649f ca87418dd0ba0af 52 reply to reply_5fe11ae54 cb2455f87b25fa4 e66fcb3f __call__ /usr/lib/ python2. 7/site- packages/ oslo_messaging/ _drivers/ amqpdriver. py:194
edf5-503a-
2016-11-03 04:49:46.283 20747 INFO heat.engine.
896d038dec7] Stack "overcloud-
2016-11-03 04:49:46.283 20747 ERROR heat.engine.
2016-11-03 04:49:58.161 20747 DEBUG oslo_messaging.
We could infere that it's hitting a 10s timeout in haproxy. Taking a look into https:/ /github. com/openstack/ puppet- tripleo/ blob/master/ manifests/ haproxy. pp#L38- L40 , both connect and http-request are 10s by default. I'd suggest to increase them to something like 20 seconds and test, wdyt?