missing region config

Bug #1903210 reported by Rodrigo Barbieri
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
OpenStack Nova Cloud Controller Charm
Fix Released
Undecided
Rodrigo Barbieri
OpenStack Nova Compute Charm
Fix Released
Undecided
Rodrigo Barbieri

Bug Description

When deploying a Stein multi-region environment, 2 issues are observed:

1)
2020-09-08 15:03:01.590 15355 INFO nova.compute.resource_tracker [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] Compute node record created for compute1.maas:compute1.maas with uuid: 246ba562-1f2e-4296-a69a-53bceda49739
2020-09-08 15:03:02.358 15355 ERROR nova.scheduler.client.report [req-1e5b881d-8c0e-4213-a500-8bea79877f92 - - - - -] [req-af77bcff-3de0-47ef-98ac-ff4447d9aee3] Failed to create resource provider record in placement API for UUID 246ba562-1f2e-4296-a69a-53bceda49739. Got 409: {"errors": [{"status": 409, "title": "Conflict", "detail": "There was a conflict when trying to complete your request.\n\n Conflicting resource provider name: compute1.maas already exists. ", "request_id": "req-af77bcff-3de0-47ef-98ac-ff4447d9aee3"}]}.

Investigating the issue was found that the compute node was registering itself in the database of another region. More specifically, the first one from the "openstack endpoint list --service placement".

Turns out that according to the release notes of Rocky release:

"The following deprecated options have been removed from the placement group of nova.conf:

    os_region_name (use region_name instead)"

By replacing the os_region_name config with region_name, it allowed the compute node to talk to the correct endpoint and register the node against the correct placement database. Which leads to problem #2.

2)
While testing migrations, it was noticed that migrations started with "openstack server migrate --live-migration" would result in the logs:

2020-11-05 14:34:22.723 1993 ERROR nova.network.neutronv2.api [req-0170510a-264b-441d-84ab-211ac89c5f5f dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] [instance: 967c9efa-eb81-4892-af15-e148b3ab838b] Binding failed for port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a and host juju-60a5dc-bionic-stein-federated-01-7.cloud.sts. Error: (404 {"NeutronError": {"type": "PortNotFound", "message": "Port 3d82d547-454d-4ee7-ad5c-9d834e3afb9a could not be found.", "detail": ""}})

while migrations with "openstack server migrate --live <host>" would result in the CLI error:

Migration pre-check error: Binding failed for port d92c626a-25d9-4ef3-981a-15c430cdf9c8, please check neutron logs for more information. (HTTP 400) (Request-ID: req-9445bf06-f034-4b39-ac5b-e29485c9f5d2)

so, investigating this, was found that the conductor was talking to the wrong endpoint of neutron-api. By adding region_name to [neutron] section in nova.conf of nova-cloud-controllers, it addressed the problem. But then led to another one later in the migration:

2020-11-05 20:19:16.238 15881 ERROR oslo_messaging.rpc.server [req-c878d5a1-4167-4aa9-8a88-c77dfe77940a dedbb29c90e94218a838fd7c6bdc8a44 7118e587a8be4a2c81eff9429e8bd249 - ae745ee07d0445e4a11ef46e1cfff59c ae745ee07d0445e4a11ef46e1cfff59c] Exception during message handling: keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to http://10.5.2.231:9696/v2.0/ports/1eaef00a-e73f-4c04-a60d-0fd5438ea807: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))

then, investigating that, was found that the region_name needed to be added to the [neutron] section in nova.conf of nova-computes as well, so the problem is addressed and migrations can succeed.

However, I did not find any specific mention of the region_name config of [neutron] section in Nova release notes. The only relevant mention I found of it in the code is in [0], but that has been removed in Train. Moreover, that code is not invoked in the first error (the port binding one). Instead, the code goes through [1] which hasn't changed since Stein, but picks up the added region_name parameter in [neutron].

Therefore the parameter region_name must be added to [placement] section of nova-computes, and to the [neutron] section of nova-computes and nova-cloud-controllers to address this issue.

[0] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L193

[1] https://github.com/openstack/nova/blob/cde42879a497cd2b91f0cf926e0417fda07b3c31/nova/network/neutronv2/api.py#L214

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

fix to nova-cloud-controller: https://review.opendev.org/#/c/761672

fix to nova-compute: https://review.opendev.org/#/c/761671

Changed in charm-nova-cloud-controller:
assignee: nobody → Rodrigo Barbieri (rodrigo-barbieri2010)
Changed in charm-nova-compute:
assignee: nobody → Rodrigo Barbieri (rodrigo-barbieri2010)
Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

The region_name property for the [neutron] section comes from https://github.com/openstack/keystoneauth

Revision history for this message
Rodrigo Barbieri (rodrigo-barbieri2010) wrote :

fixed typos in description

description: updated
Changed in charm-nova-cloud-controller:
milestone: none → 21.01
Changed in charm-nova-compute:
milestone: none → 21.01
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (master)

Reviewed: https://review.opendev.org/761671
Committed: https://git.openstack.org/cgit/openstack/charm-nova-compute/commit/?id=835d4b167ab7db41962239eaabb56f492fa34cde
Submitter: Zuul
Branch: master

commit 835d4b167ab7db41962239eaabb56f492fa34cde
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Nov 5 18:32:10 2020 -0300

    Add/update region config in nova.conf

    On multi-region deployments, Nova may talk to the wrong
    neutron endpoint (from the wrong region) if the region
    is unspecified.

    On Rocky+ it will also require updating the
    os_region_name config to region_name, as os_region_name
    has been deprecated, otherwise Nova will talk to the wrong
    placement endpoint as well.

    This fix addresses the issue where nova-compute will not
    register the node to the correct nova_api/placement
    database, and will also not be able to complete live-migrations.

    Given that the template for the [placement] section is
    applied to every release, it is included both old and
    new config options.

    Change-Id: I9500ba400d55e6f1bc11f2ba05b25b4714cda578
    Closes-bug: #1903210

Changed in charm-nova-compute:
status: In Progress → Fix Committed
Changed in charm-nova-cloud-controller:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (master)

Reviewed: https://review.opendev.org/761672
Committed: https://git.openstack.org/cgit/openstack/charm-nova-cloud-controller/commit/?id=25da3180b53abc9843cba37b12e08258de8644bf
Submitter: Zuul
Branch: master

commit 25da3180b53abc9843cba37b12e08258de8644bf
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Nov 5 18:26:51 2020 -0300

    Add region config to [neutron] in nova.conf

    On multi-region deployments, Nova may talk to the wrong
    neutron endpoint (from the wrong region) if the region
    is unspecified.

    The issue that requires this fix is most apparent when
    doing live migrations, as the Conductor tries to call
    Neutron to perform port bindings.

    Closes-bug: #1903210
    Change-Id: Id118f6a5794de298c31debf6e31ffe92271982d1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-cloud-controller (stable/20.10)

Fix proposed to branch: stable/20.10
Review: https://review.opendev.org/762152

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-nova-compute (stable/20.10)

Fix proposed to branch: stable/20.10
Review: https://review.opendev.org/762153

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-compute (stable/20.10)

Reviewed: https://review.opendev.org/762153
Committed: https://opendev.org/openstack/charm-nova-compute/commit/7a178053f1cb915f3fe50f328c7a2cf90c03f565
Submitter: Zuul
Branch: stable/20.10

commit 7a178053f1cb915f3fe50f328c7a2cf90c03f565
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Nov 5 18:32:10 2020 -0300

    Add/update region config in nova.conf

    On multi-region deployments, Nova may talk to the wrong
    neutron endpoint (from the wrong region) if the region
    is unspecified.

    On Rocky+ it will also require updating the
    os_region_name config to region_name, as os_region_name
    has been deprecated, otherwise Nova will talk to the wrong
    placement endpoint as well.

    This fix addresses the issue where nova-compute will not
    register the node to the correct nova_api/placement
    database, and will also not be able to complete live-migrations.

    Given that the template for the [placement] section is
    applied to every release, it is included both old and
    new config options.

    Change-Id: I9500ba400d55e6f1bc11f2ba05b25b4714cda578
    Closes-bug: #1903210
    (cherry picked from commit 835d4b167ab7db41962239eaabb56f492fa34cde)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-nova-cloud-controller (stable/20.10)

Reviewed: https://review.opendev.org/762152
Committed: https://opendev.org/openstack/charm-nova-cloud-controller/commit/420da9c6e791368364ae750b801d08973a34460e
Submitter: Zuul
Branch: stable/20.10

commit 420da9c6e791368364ae750b801d08973a34460e
Author: Rodrigo Barbieri <email address hidden>
Date: Thu Nov 5 18:26:51 2020 -0300

    Add region config to [neutron] in nova.conf

    On multi-region deployments, Nova may talk to the wrong
    neutron endpoint (from the wrong region) if the region
    is unspecified.

    The issue that requires this fix is most apparent when
    doing live migrations, as the Conductor tries to call
    Neutron to perform port bindings.

    Closes-bug: #1903210
    Change-Id: Id118f6a5794de298c31debf6e31ffe92271982d1
    (cherry picked from commit 25da3180b53abc9843cba37b12e08258de8644bf)

Changed in charm-nova-cloud-controller:
milestone: 21.01 → 20.10
Changed in charm-nova-compute:
milestone: 21.01 → 20.10
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
status: Fix Released → Fix Committed
Changed in charm-nova-compute:
status: Fix Committed → Fix Released
Changed in charm-nova-cloud-controller:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.