test_*_with_qos_min_bw_allocation fails in the nova-multi-cell job with: nova.exception.MigrationPreCheckError: Migration pre-check error: Failed to create port bindings for host <host>

Bug #1907522 reported by melanie witt
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
OpenStack Compute (nova)
High
Balazs Gibizer
Ussuri
Medium
Balazs Gibizer
Victoria
Medium
Balazs Gibizer

Bug Description

Seen in the gate on the master branch, the test_migrate_with_qos_min_bw_allocation and test_resize_with_qos_min_bw_allocation tests that were recently added to tempest [1] are failing in the nova-multi-cell job with:

"ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Failed to create port bindings for host ubuntu-focal-ovh-bhs1-0022118086"

with the following traceback in screen-n-super-cond.txt [2]:

Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/cross_cell_migrate.py", line 266, in _create_port_bindings
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self._bindings_by_port_id = self.network_api.bind_ports_to_host(
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/network/neutron.py", line 1352, in bind_ports_to_host
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server raise exception.PortBindingFailed(port_id=port_id)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server nova.exception.PortBindingFailed: Binding failed for port d37d5be3-df95-4842-bddf-2058d8e452fc, please check neutron logs for more information.
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server During handling of the above exception, another exception occurred:
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server Traceback (most recent call last):
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/manager.py", line 354, in _cold_migrate
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server task.execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 26, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.rollback(ex)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.force_reraise()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server raise value
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 23, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return original(self)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 40, in execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return self._execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/migrate.py", line 295, in _execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server task.execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 26, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.rollback(ex)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.force_reraise()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server raise value
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 23, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return original(self)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 40, in execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return self._execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/cross_cell_migrate.py", line 838, in _execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server target_cell_migration = self._prep_resize_at_dest(
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/cross_cell_migrate.py", line 727, in _prep_resize_at_dest
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server target_cell_migration_context = verify_task.execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 26, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.rollback(ex)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self.force_reraise()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server six.reraise(self.type_, self.value, self.tb)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/usr/local/lib/python3.8/dist-packages/six.py", line 703, in reraise
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server raise value
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 23, in wrap
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return original(self)
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/base.py", line 40, in execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server return self._execute()
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/cross_cell_migrate.py", line 327, in _execute
Dec 08 19:11:12.292344 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server self._create_port_bindings()
Dec 08 19:11:12.297758 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server File "/opt/stack/nova/nova/conductor/tasks/cross_cell_migrate.py", line 269, in _create_port_bindings
Dec 08 19:11:12.297758 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server raise exception.MigrationPreCheckError(reason=_(
Dec 08 19:11:12.297758 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server nova.exception.MigrationPreCheckError: Migration pre-check error: Failed to create port bindings for host ubuntu-focal-ovh-bhs1-0022118086
Dec 08 19:11:12.297758 ubuntu-focal-ovh-bhs1-0022118085 nova-conductor[93226]: ERROR oslo_messaging.rpc.server

and corresponding neutron failure:

"ERROR neutron.plugins.ml2.managers neutron_lib.exceptions.placement.UnknownResourceProvider: No such resource provider known by Neutron"

with traceback in screen-q-svc.txt [3] :

Dec 08 19:11:11.521492 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers [req-1d6e98ea-4f98-4a73-a4f7-470561c7e03b req-0389a6b9-c4f4-448e-baa9-8f31bf231926 service neutron] Failed to bind port d37d5be3-df95-4842-bddf-2058d8e452fc on host ubuntu-focal-ovh-bhs1-0022118086 allocated on resource provider c7942897-51a0-56e7-86e1-4727af85911d, because no mechanism driver reports being responsible
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers [req-1d6e98ea-4f98-4a73-a4f7-470561c7e03b req-0389a6b9-c4f4-448e-baa9-8f31bf231926 service neutron] Mechanism driver openvswitch failed in bind_port: neutron_lib.exceptions.placement.UnknownResourceProvider: No such resource provider known by Neutron: c7942897-51a0-56e7-86e1-4727af85911d
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers Traceback (most recent call last):
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 923, in _bind_port_level
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers drivers=[self._infer_driver_from_allocation(
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers File "/opt/stack/neutron/neutron/plugins/ml2/managers.py", line 973, in _infer_driver_from_allocation
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers raise place_exc.UnknownResourceProvider(
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers neutron_lib.exceptions.placement.UnknownResourceProvider: No such resource provider known by Neutron: c7942897-51a0-56e7-86e1-4727af85911d
Dec 08 19:11:11.526334 ubuntu-focal-ovh-bhs1-0022118085 neutron-server[85124]: ERROR neutron.plugins.ml2.managers

[1] https://review.opendev.org/c/openstack/tempest/+/694539
[2] https://zuul.opendev.org/t/openstack/build/d7c15477b23e4a28b400c30277b3b3e3/log/controller/logs/screen-n-super-cond.txt?severity=4#4180
[3] https://zuul.opendev.org/t/openstack/build/d7c15477b23e4a28b400c30277b3b3e3/log/controller/logs/screen-q-svc.txt#72367

Revision history for this message
melanie witt (melwitt) wrote :
Revision history for this message
Lajos Katona (lajos-katona) wrote :

I check the neutron side, thanks for reporting

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The test result in https://review.opendev.org/c/openstack/nova/+/766364 shows that after fixing https://bugs.launchpad.net/nova/+bug/1907511 on stable we now hit the this bug on stable too.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Quick analysis:

Run that I used for the analysis[1]:

Port binding on the source host before the migration:

Attempting to bind port 0f0f2b61-ea5b-4183-af5e-7c71901283cf on host ubuntu-focal-ovh-bhs1-0022143433 for vnic_type normal with profile {"allocation": "c0328279-3748-5bb9-9c42-1ae95a6f8537"}

Failing port binding on the dest host:

Attempting to bind port 0f0f2b61-ea5b-4183-af5e-7c71901283cf on host ubuntu-focal-ovh-bhs1-0022143424 for vnic_type normal with profile {"allocation": "c0328279-3748-5bb9-9c42-1ae95a6f8537"}

Form this it is clear that nova does not update the "allocation", the RP uuid bandwidth is allocated from, in binding profile when tries to bind the port to the destination host. Therefore neutron does not find the RP on the host the binding points to and therefore the port binding fails.

[1] https://zuul.opendev.org/t/openstack/build/94291078ad4b4df99922be7e41ffdee6/logs

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

It seems that the qos handling is missing from the cross cell resize codepath. The qos support and the cross cell support was implemented in parallel so this caused that qos work missed the cross cell case.

The scheduling and placement resource allocation handling is common between same cell and cross cell resize. But after that point the two codepaths diverge. The neutron port binding is properly updated in ComputeManager.prep_resize() for same cell resize. But the PrepResizeAtDestTask conductor task during the cross cell resize does not consider the destination host placement allocation when creates the port bindig for the dest host in neutron.

The proposed way forward is to:
1) document that qos is not supported in cross cell resize case
2) disable the qos resize tempest test for the nova-multi-cell job

then later:

3) implement qos handling for cross cell resize
4) re-enable the tempest test

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

Patch has been proposed to master to unblock the gate: https://review.opendev.org/c/openstack/nova/+/766471

Changed in nova:
assignee: nobody → Balazs Gibizer (balazs-gibizer)
importance: Undecided → High
status: New → Triaged
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

The gate unblocking patch has been merged https://review.opendev.org/c/openstack/nova/+/766471

The next step is to not try to schedule the migration to cross cell if there are qos ports attached to the server. I will work on this.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

There are two patches proposed on master for this to settle:
functional reproduction: https://review.opendev.org/c/openstack/nova/+/766791
a fix that makes forces nova to fall back to same-cell resize if there are ports with resource request attached to the server: https://review.opendev.org/c/openstack/nova/+/766925

I will close this bug with this fix, as adding a full support for cross-cell resize with qos would be more like a feature than a bug.

Changed in nova:
status: Triaged → In Progress
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Changed in nova:
status: In Progress → Fix Released
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

victoria backport merged
ussuri backport proposed https://review.opendev.org/c/openstack/nova/+/773932/1

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 22.1.0

This issue was fixed in the openstack/nova 22.1.0 release.

Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :
Revision history for this message
Balazs Gibizer (balazs-gibizer) wrote :

ussuri backport landed.

The cross cell resize feature was finished in ussuri so no need to backport his fix to train or later.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 21.2.0

This issue was fixed in the openstack/nova 21.2.0 release.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/nova 23.0.0.0rc1

This issue was fixed in the openstack/nova 23.0.0.0rc1 release candidate.

To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers