Comment 0 for bug 1903210

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

When deploying a Stein multi-region environment, 2 issues are observed:

2020-09-08 15:03:01.590 15355 INFO nova.compute.resource_tracker [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] Compute node record created for compute1.maas:compute1.maas with uuid: 246ba562-1f2e-4296-a69a-53bceda49739
2020-09-08 15:03:02.358 15355 ERROR [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] [req-af77bcff-3de0-47ef-98ac-ff4447d9aee3] Failed to create resource provider record in placement API for UUID 246ba562-1f2e-4296-a69a-53bceda49739. Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: compute1.maas already exists. ", "request_id": "req-af77bcff-3de0-47ef-98ac-ff4447d9aee3"}]}.

Investigating the issue was found that the compute node was registering itself in the database of another reason. More specifically, the first one from the "openstack endpoint list --service placement".

Turns out that according to the release notes of Rocky release:

"The following deprecated options have been removed from the placement group of nova.conf:

    os_region_name (use region_name instead)"

By replacing the os_region_name config with region_name, it allowed the compute node to talk to the correct endpoint and register the node against the correct placement database. Which leads to problem #2.

While testing migrations, it was noticed that migrations started with "openstack server migrate --live-migration" would result in the logs:

2020-11-05 14:34:22.723 1993 ERROR [req-0170510a-264b-441d-84ab-211ac89c5f5f dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] [instance: 967c9efa-eb81-4892-af15-e148b3ab838b] Binding failed for port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a and host Error: (404 {"NeutronError": {"type": "PortNotFound", "message": "Port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a could not be found.", "detail": ""}})

while migrations with "openstack server migrate --live <host>" would result in the CLI error:

Migration pre-check error: Binding failed for port d92c626a-25d9-4ef3-981a-15c430cdf9c8, please check neutron logs for more information. (HTTP 400) (Request-ID: req-9445bf06-f034-4b39-ac5b-e29485c9f5d2)

so, investigating this, was found that the conductor was talking to the wrong endpoint of neutron-api. By adding region_name to [neutron] section in nova.conf of nova-cloud-controllers, it addressed the problem. But then led to another one later in the migration:

2020-11-05 20:19:16.238 15881 ERROR oslo_messaging.rpc.server [req-c878d5a1-4167-4aa9-8a88-c77dfe77940a dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] Exception during message handling: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

then, investigating that, was found that the region_name needed to be added to the [neutron] section in nova.conf of nova-computes as well, so the problem is addressed and migrations can succeed.

However, I did not find any specific mention of the region_name of [neutron] section in Nova release notes. The only relevant mention I found of it in the code is in [0], but that has been removed in Train. Moreover, that code is not invoked in the first error (the port binding one). Instead, the code goes through [1] which hasn't changed since Stein, but picks up the added region_name parameter in [neutron].

Therefore the parameter region_name must be added to [placement] section of nova-computes, and to the [neutron] section of nova-computes and nova-cloud-controllers to address this issue.

