Activity log for bug #1903210

Date Who What changed Old value New value Message
2020-11-05 21:23:29 Rodrigo Barbieri bug added bug
2020-11-05 21:23:52 Rodrigo Barbieri bug task added charm-nova-cloud-controller
2020-11-05 22:27:17 Rodrigo Barbieri charm-nova-cloud-controller: assignee Rodrigo Barbieri (rodrigo-barbieri2010)
2020-11-05 22:27:19 Rodrigo Barbieri charm-nova-compute: assignee Rodrigo Barbieri (rodrigo-barbieri2010)
2020-11-07 20:33:31 Rodrigo Barbieri description When deploying a Stein multi-region environment, 2 issues are observed: 1) 2020-09-08 15:03:01.590 15355 INFO nova.compute.resource_tracker [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] Compute node record created for compute1.maas:compute1.maas with uuid: 246ba562-1f2e-4296-a69a-53bceda49739 2020-09-08 15:03:02.358 15355 ERROR nova.scheduler.client.report [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] [req-af77bcff-3de0-47ef-98ac-ff4447d9aee3] Failed to create resource provider record in placement API for UUID 246ba562-1f2e-4296-a69a-53bceda49739. Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: compute1.maas already exists. ", "request_id": "req-af77bcff-3de0-47ef-98ac-ff4447d9aee3"}]}. Investigating the issue was found that the compute node was registering itself in the database of another reason. More specifically, the first one from the "openstack endpoint list --service placement". Turns out that according to the release notes of Rocky release: "The following deprecated options have been removed from the placement group of nova.conf: os_region_name (use region_name instead)" By replacing the os_region_name config with region_name, it allowed the compute node to talk to the correct endpoint and register the node against the correct placement database. Which leads to problem #2. 2) While testing migrations, it was noticed that migrations started with "openstack server migrate --live-migration" would result in the logs: 2020-11-05 14:34:22.723 1993 ERROR nova.network.neutronv2.api [req-0170510a-264b-441d-84ab-211ac89c5f5f dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] [instance: 967c9efa-eb81-4892-af15-e148b3ab838b] Binding failed for port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a and host juju-60a5dc-bionic-stein-federated-01-7.cloud.sts. Error: (404 {"NeutronError": {"type": "PortNotFound", "message": "Port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a could not be found.", "detail": ""}}) while migrations with "openstack server migrate --live <host>" would result in the CLI error: Migration pre-check error: Binding failed for port d92c626a-25d9-4ef3-981a-15c430cdf9c8, please check neutron logs for more information. (HTTP 400) (Request-ID: req-9445bf06-f034-4b39-ac5b-e29485c9f5d2) so, investigating this, was found that the conductor was talking to the wrong endpoint of neutron-api. By adding region_name to [neutron] section in nova.conf of nova-cloud-controllers, it addressed the problem. But then led to another one later in the migration: 2020-11-05 20:19:16.238 15881 ERROR oslo_messaging.rpc.server [req-c878d5a1-4167-4aa9-8a88-c77dfe77940a dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] Exception during message handling: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.5.2.231:9696/v2.0/ports/1eaef00a-e73f-4c04-a60d-0fd5438ea807: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) then, investigating that, was found that the region_name needed to be added to the [neutron] section in nova.conf of nova-computes as well, so the problem is addressed and migrations can succeed. However, I did not find any specific mention of the region_name of [neutron] section in Nova release notes. The only relevant mention I found of it in the code is in [0], but that has been removed in Train. Moreover, that code is not invoked in the first error (the port binding one). Instead, the code goes through [1] which hasn't changed since Stein, but picks up the added region_name parameter in [neutron]. Therefore the parameter region_name must be added to [placement] section of nova-computes, and to the [neutron] section of nova-computes and nova-cloud-controllers to address this issue. [0] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L193 [1] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L214 When deploying a Stein multi-region environment, 2 issues are observed: 1) 2020-09-08 15:03:01.590 15355 INFO nova.compute.resource_tracker [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] Compute node record created for compute1.maas:compute1.maas with uuid: 246ba562-1f2e-4296-a69a-53bceda49739 2020-09-08 15:03:02.358 15355 ERROR nova.scheduler.client.report [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] [req-af77bcff-3de0-47ef-98ac-ff4447d9aee3] Failed to create resource provider record in placement API for UUID 246ba562-1f2e-4296-a69a-53bceda49739. Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: compute1.maas already exists. ", "request_id": "req-af77bcff-3de0-47ef-98ac-ff4447d9aee3"}]}. Investigating the issue was found that the compute node was registering itself in the database of another region. More specifically, the first one from the "openstack endpoint list --service placement". Turns out that according to the release notes of Rocky release: "The following deprecated options have been removed from the placement group of nova.conf:     os_region_name (use region_name instead)" By replacing the os_region_name config with region_name, it allowed the compute node to talk to the correct endpoint and register the node against the correct placement database. Which leads to problem #2. 2) While testing migrations, it was noticed that migrations started with "openstack server migrate --live-migration" would result in the logs: 2020-11-05 14:34:22.723 1993 ERROR nova.network.neutronv2.api [req-0170510a-264b-441d-84ab-211ac89c5f5f dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] [instance: 967c9efa-eb81-4892-af15-e148b3ab838b] Binding failed for port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a and host juju-60a5dc-bionic-stein-federated-01-7.cloud.sts. Error: (404 {"NeutronError": {"type": "PortNotFound", "message": "Port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a could not be found.", "detail": ""}}) while migrations with "openstack server migrate --live <host>" would result in the CLI error: Migration pre-check error: Binding failed for port d92c626a-25d9-4ef3-981a-15c430cdf9c8, please check neutron logs for more information. (HTTP 400) (Request-ID: req-9445bf06-f034-4b39-ac5b-e29485c9f5d2) so, investigating this, was found that the conductor was talking to the wrong endpoint of neutron-api. By adding region_name to [neutron] section in nova.conf of nova-cloud-controllers, it addressed the problem. But then led to another one later in the migration: 2020-11-05 20:19:16.238 15881 ERROR oslo_messaging.rpc.server [req-c878d5a1-4167-4aa9-8a88-c77dfe77940a dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] Exception during message handling: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.5.2.231:9696/v2.0/ports/1eaef00a-e73f-4c04-a60d-0fd5438ea807: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',)) then, investigating that, was found that the region_name needed to be added to the [neutron] section in nova.conf of nova-computes as well, so the problem is addressed and migrations can succeed. However, I did not find any specific mention of the region_name config of [neutron] section in Nova release notes. The only relevant mention I found of it in the code is in [0], but that has been removed in Train. Moreover, that code is not invoked in the first error (the port binding one). Instead, the code goes through [1] which hasn't changed since Stein, but picks up the added region_name parameter in [neutron]. Therefore the parameter region_name must be added to [placement] section of nova-computes, and to the [neutron] section of nova-computes and nova-cloud-controllers to address this issue. [0] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L193 [1] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L214
2020-11-10 12:13:07 Edward Hope-Morley charm-nova-cloud-controller: milestone 21.01
2020-11-10 12:13:11 Edward Hope-Morley charm-nova-compute: milestone 21.01
2020-11-10 12:33:36 OpenStack Infra charm-nova-compute: status In Progress Fix Committed
2020-11-10 12:34:52 OpenStack Infra charm-nova-cloud-controller: status In Progress Fix Committed
2020-11-24 11:33:55 Aurelien Lourot charm-nova-cloud-controller: milestone 21.01 20.10
2020-11-24 11:33:56 Aurelien Lourot charm-nova-compute: milestone 21.01 20.10
2020-11-24 11:35:24 Aurelien Lourot charm-nova-cloud-controller: status Fix Committed Fix Released
2020-11-24 11:35:29 Aurelien Lourot charm-nova-cloud-controller: status Fix Released Fix Committed
2020-11-24 11:35:31 Aurelien Lourot charm-nova-compute: status Fix Committed Fix Released
2020-11-24 15:36:59 Aurelien Lourot charm-nova-cloud-controller: status Fix Committed Fix Released