Comment 23 for bug 1589433

Revision history for this message
Dina Belova (dbelova) wrote : Re: Deployment of env with mongodb and ceilometer is failed

OK, it looks like we were able to find the root cause of the issue.

================
What's happening
================

Here is the actions flow we're facing on the failed environments:
- we're calling glance to upload an image
- glance is using swift client to grab swift's endpoint in case if Keystone v3 is used - https://github.com/openstack/glance_store/blob/master/glance_store/_drivers/swift/connection_manager.py#L150
- swift client is searching the endpoint in the service catalog - https://github.com/openstack/python-swiftclient/blob/master/swiftclient/client.py#L583-L589 - in fact, operation of service catalog extraction is cached (10 minutes expiration time) and there might be old service catalog snapshot used.
- if the HW is quick enough, deployment is passing really quick (like we have on our test environments - full redeploy is passing during ~25 minutes, so after cache is expired 1-2 times at max during the deployment operation) -> the cache might not expire and update the service catalog prior the image upload operations starts -> we face a failure

=================
What can be done?
=================

1) set less service catalog expiration time in keystone config - *No! Let's not do it!* Service catalog is needed while token creation -> this will affect token management time -> that's really a bad idea
2) add more retrying logic for the image upload - this is good something to start with. Let's do it
3) find out why service catalog is not invalidated if we're asking for the endpoint and fix it - this is also a good idea, let's do it as well. This will prevent issues like this in future.