Comment 16 for bug 1682423

Revision history for this message
Matt Riedemann (mriedem) wrote :

There is some confusion here about different issues and how this is not working.

The sleep workaround that Bernhard is using is totally unrelated to anything with service versions or message version pins in the services that Mark is talking about.

What Bernhard (and others) are likely hitting with the 404 immediately after the server is created, is that in Ocata we now do this when creating a server:

1. The API service creates a build request (in the nova_api.build_requests table) which is temporary until the instance is created in a cell database.
2. The API casts to conductor to continue building the instance.
3. The conductor service asks the scheduler for a host on which to build the instance, and that host is mapped to a cell via the nova_api.host_mappings table.
4. Once a host is chosen, conductor creates the instance in the cell database and deletes the build request record in the API database.

While this is happening, if requests are coming into the API to get the instance, the API code is going to lookup the instance in the cell database via it's nova_api.instance_mappings record. If the instance is not mapped to a cell yet, the API will lookup the build request record and use that to return details about the instance while it's building.

There should be no window of time where neither exist and you get a 404.

I do remember there being some intermittent Tempest failures because of a window of time where the build request was gone but the instance wasn't mapped to the cell, but we fixed that and it should already be in Ocata.

What version of Ocata are people experiencing this issue? If you're still at 15.0.0 then you need to upgrade the controller services to make sure you have the latest fixes.