After looking further in the code and chatting with Chris Behrens about this, I would like to offer the following summary:
- nova.compute.manager.init_host() calls driver.list_instances(). This needs to happen during start of n-cpu.
- nova.virt.ironic.driver includes a _retry_if_service_is_unavailable() method to retry if the ir-api service is not available
- the problem you have is actually that the "ironic" service user in _keystone_ is not yet created at this point, and the error being raised is not HTTPServiceUnavailable, so it's not getting retried.
Robert, your bug report didn't include the actual exception class being raised -- could you attach that? While I think that bringing the n-cpu and ironic services online before their keystone service accounts are created is a bug in tripleo's tooling, I also think it's reasonable for ironic.driver to retry on any transitory failure. I'm fine adding this to the list of exceptions for which it retries.
After looking further in the code and chatting with Chris Behrens about this, I would like to offer the following summary:
- nova.compute. manager. init_host( ) calls driver. list_instances( ). This needs to happen during start of n-cpu. ironic. driver includes a _retry_ if_service_ is_unavailable( ) method to retry if the ir-api service is not available ailable, so it's not getting retried.
- nova.virt.
- the problem you have is actually that the "ironic" service user in _keystone_ is not yet created at this point, and the error being raised is not HTTPServiceUnav
Robert, your bug report didn't include the actual exception class being raised -- could you attach that? While I think that bringing the n-cpu and ironic services online before their keystone service accounts are created is a bug in tripleo's tooling, I also think it's reasonable for ironic.driver to retry on any transitory failure. I'm fine adding this to the list of exceptions for which it retries.