Comment 7 for bug 1634823

Revision history for this message
Gabriele Cerami (gcerami) wrote :

I tried this patch (before seeing the ones proposed here)

diff --git a/tripleoclient/v1/overcloud_deploy.py b/tripleoclient/v1/overcloud_deploy.py
index 11d236a..144ba87 100644
--- a/tripleoclient/v1/overcloud_deploy.py
+++ b/tripleoclient/v1/overcloud_deploy.py
@@ -984,8 +984,19 @@ class DeployOvercloud(command.Command):
             utils.create_overcloudrc(stack, parsed_args.no_proxy)
             utils.create_tempest_deployer_input()

- if (stack_create or parsed_args.force_postconfig):
- self._deploy_postconfig(stack, parsed_args)
+ max_retries = 20
+ retries = 0
+ delay = 30
+ while retries < max_retries:
+ try:
+ if (stack_create or parsed_args.force_postconfig):
+ self._deploy_postconfig(stack, parsed_args)
+ break
+ except Exception:
+ print(sys.exc_info()[0])
+ print("postconfig attempt {0} failed. Waiting {1} second ".format(retries, delay))
+ retries += 1
+ time.sleep(delay)

             overcloud_endpoint = utils.get_overcloud_endpoint(stack)
             print("Overcloud Endpoint: {0}".format(overcloud_endpoint))

It essentially does the same thing that the patch propose, but retries the entire post_config instead of the keystone part.

In the ci devenv this does NOT fix anything, this is what happens.

2016-10-26 20:26:33 [overcloud-BlockStorageNodesPostDeployment-szuwqeh4o3xp]: CREATE_COMPLETE Stack CREATE completed successfully
Stack overcloud CREATE_COMPLETE
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 0 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 1 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 2 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 3 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 4 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 5 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 6 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 7 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 8 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 9 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 10 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 11 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 12 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 13 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 14 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 15 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 16 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 17 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 18 failed. Waiting 30 second
<class 'keystoneauth1.exceptions.http.ServiceUnavailable'>
postconfig attempt 19 failed. Waiting 30 second

But while looping, I tried with a direct command

keystone endpoint-list

and it returns the keystone endpoint, so it means that keystone is really UP.
It also means that maybe some of the object passed to _keystone_init (like stack) are not valid since the beginning, and trying to get informations from there always leads to failures