We are hitting the same issue in RDO-CI promotion pipeline. In this case we are seeing the issue in overcloud_deploy_post.log [1], but i'd say it's the same issue. We are failing to collect logs from compute and controller nodes, so it's not very helpful. The differences between dlrn repos in last passing jobs and fist failing one is in http://paste.openstack.org/show/599614/.
And as Juan Antonio mentioned, the issue seems to be when glance try to connect to keystone for token validation. Note that similar errors appear in other service log as gnocchi log [2]:
2017-02-19 15:04:14.259 188952 ERROR keystonemiddleware.auth_token [-] Bad response code while validating token: 504
2017-02-19 15:04:14.260 188952 WARNING keystonemiddleware.auth_token [-] Identity response: <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>
In the logs mentioned in comment #1 I see following error in apache log:
[Sun Feb 19 14:46:05.764384 2017] [mpm_prefork:error] [pid 188948] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
I can't find the httpd configuration in jobs logs, so i'm not sure if it may be undersized in terms of max processes or if thre is something else.
We are hitting the same issue in RDO-CI promotion pipeline. In this case we are seeing the issue in overcloud_ deploy_ post.log [1], but i'd say it's the same issue. We are failing to collect logs from compute and controller nodes, so it's not very helpful. The differences between dlrn repos in last passing jobs and fist failing one is in http:// paste.openstack .org/show/ 599614/.
And as Juan Antonio mentioned, the issue seems to be when glance try to connect to keystone for token validation. Note that similar errors appear in other service log as gnocchi log [2]:
2017-02-19 15:04:14.259 188952 ERROR keystonemiddlew are.auth_ token [-] Bad response code while validating token: 504 are.auth_ token [-] Identity response: <html><body><h1>504 Gateway Time-out</h1>
2017-02-19 15:04:14.260 188952 WARNING keystonemiddlew
The server didn't respond in time.
</body></html>
In the logs mentioned in comment #1 I see following error in apache log:
[Sun Feb 19 14:46:05.764384 2017] [mpm_prefork:error] [pid 188948] AH00161: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
I can't find the httpd configuration in jobs logs, so i'm not sure if it may be undersized in terms of max processes or if thre is something else.
About https:/ /review. rdoproject. org/r/# /c/5390/ , jobs in ocata release are passing after this change [3]. Additionally, tests in https:/ /review. openstack. org/#/c/ 435693/ run with similar package combination in terms of heat-templeates /heat-agent and they passed so i'm inclined to think that it's unrelated.
[1] https:/ /ci.centos. org/artifacts/ rdo/jenkins- tripleo- quickstart- promote- master- delorean- minimal_ pacemaker- 489/undercloud/ home/stack/ logs.openstack. org/15/ 359215/ 62/check- tripleo/ gate-tripleo- ci-centos- 7-ovb-updates/ d048040/ logs/overcloud- controller- 0/var/log/ gnocchi/ app.txt. gz /ci.centos. org/view/ rdo/view/ promotion- pipeline/ job/rdo_ trunk-promote- ocata-current- tripleo/ 27/
[2] http://
[3] https:/