[ovn] IPv6 VIPs broken with ML2/OVN

Bug #2028651 reported by Gregory Thiemonge
14
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Fix Released
High
Rodolfo Alonso

Bug Description

Originally reported in the Octavia launchpad: https://bugs.launchpad.net/octavia/+bug/2028524

The commit https://review.opendev.org/c/openstack/neutron/+/882588 introduced a regression in Octavia

It adds a validate_port_binding_and_virtual_port function that raises an exception when a port:
- has non-empty binding:host_id
- has fixed_ips/subnets
- has VIRTUAL type (in ovn)

When we create a load balancer in Octavia (with an IPv6 VIP)

$ openstack loadbalancer create --vip-subnet ipv6-public-subnet --name lb1
+---------------------+--------------------------------------+
| Field | Value |
+---------------------+--------------------------------------+
| admin_state_up | True |
| availability_zone | None |
| created_at | 2023-07-25T07:11:25 |
| description | |
| flavor_id | None |
| id | 75cf51d2-4576-4878-8bfe-ad55584a7d76 |
| listeners | |
| name | lb1 |
| operating_status | OFFLINE |
| pools | |
| project_id | 86f57e2e56874381a0d586263fc8d900 |
| provider | amphora |
| provisioning_status | PENDING_CREATE |
| updated_at | None |
| vip_address | 2001:db8::b1 |
| vip_network_id | 2d16ac53-8438-435d-a787-e5ceb4b783be |
| vip_port_id | 83e51017-8f02-4916-bcd2-ebe0475b1ce6 |
| vip_qos_policy_id | None |
| vip_subnet_id | 813adce0-21de-44c9-958a-6967441b8623 |
| tags | |
| additional_vips | [] |
+---------------------+--------------------------------------+

The VIP port contains:

$ openstack port show 83e51017-8f02-4916-bcd2-ebe0475b1ce6
+-------------------------+--------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------------+--------------------------------------------------------------------------------------------------------+
| admin_state_up | DOWN |
| allowed_address_pairs | |
| binding_host_id | gthiemon-devstack |
| binding_profile | |
| binding_vif_details | |
| binding_vif_type | unbound |
| binding_vnic_type | normal |
| created_at | 2023-07-25T07:11:25Z |
| data_plane_status | None |
| description | |
| device_id | lb-75cf51d2-4576-4878-8bfe-ad55584a7d76 |
| device_owner | Octavia |
| device_profile | None |
| dns_assignment | fqdn='host-2001-db8--b1.openstackgate.local.', hostname='host-2001-db8--b1', ip_address='2001:db8::b1' |
| dns_domain | |
| dns_name | |
| extra_dhcp_opts | |
| fixed_ips | ip_address='2001:db8::b1', subnet_id='813adce0-21de-44c9-958a-6967441b8623' |
| id | 83e51017-8f02-4916-bcd2-ebe0475b1ce6 |
| ip_allocation | None |
| mac_address | fa:16:3e:c9:4f:7e |
| name | octavia-lb-75cf51d2-4576-4878-8bfe-ad55584a7d76 |
| network_id | 2d16ac53-8438-435d-a787-e5ceb4b783be |
| numa_affinity_policy | None |
| port_security_enabled | True |
| project_id | 86f57e2e56874381a0d586263fc8d900 |
| propagate_uplink_status | None |
| qos_network_policy_id | None |
| qos_policy_id | None |
| resource_request | None |
| revision_number | 10 |
| security_group_ids | 7c8d8935-9445-4e74-a815-a24246af757a |
| status | DOWN |
| tags | |
| trunk_details | None |
| updated_at | 2023-07-25T07:12:14Z |
+-------------------------+--------------------------------------------------------------------------------------------------------+

The port is not bound and has a binding_host_id, has a fixed_ips with a subnet and there's another port that has an allowed_address_pair with the VIP port's allocated address (so the port is a virtual port in OVN)

Any updates of this port result in a BadRequest Exception:

Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow Traceback (most recent call last):
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/base.py", line 129, in _add_security_group_to_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self.network_proxy.update_port(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/network/v2/_proxy.py", line 2979, in update_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return self._update(_port.Port, port, if_revision=if_revision, **attrs)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/proxy.py", line 64, in check Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return method(self, expected, actual, *args, **kwargs)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/network/v2/_proxy.py", line 189, in _update
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return res.commit(self, base_path=base_path, if_revision=if_revision) Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1794, in commit
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return self._commit(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1839, in _commit
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self._translate_response(response, has_body=has_body)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1278, in _translate_response
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow exceptions.raise_from_response(response, error_message=error_message)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/exceptions.py", line 263, in raise_from_response
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow raise cls(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/618567c4-78c7-4398-b889-b567f6fd6aeb, Bad port request: A virtual logical switch port cannot be bound to a host.

The goal of this validation function seems to raise an exception when the binding_host_id is not empty, but the PortBindingUpdateVirtualPortsEvent class sets the binding_host_id of virtual ports.
https://opendev.org/openstack/neutron/src/commit/58c8493ff9defbb4544803ec3fc0432c0685c592/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L532-L537

Interestingly, it's not 100% reproducible (around 90%) and it is not reproducible with IPv4 VIPs, with IPv4 ports, the binding_host_id is always empty.

I have a couple of questions:
- is the validation function correct? should the binding_host_id be empty for VIP ports?
- why is binding_host_id set for IPv6 VIPs but not for IPv4 VIPs?

I can provide more logs if needed

Tags: ovn
Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :
Download full text (9.5 KiB)

In IPv4, there's a PortBindingUpdateVirtualPortsEvent DELETE event that clears the binding_host_id when the allowed_address_pair is added to the 2nd port

Jul 25 03:31:48 gthiemon-devstack neutron-server[549213]: DEBUG ovsdbapp.backend.ovs_idl.event [None req-57e56002-2327-4537-90cf-61b0c904fe13 None None] Matched DELETE: PortBindingUpdateVirtualPortsEvent(events=('update', 'delete'), table='Port_Binding', conditions=None, old_conditions=None), priority=20 to row=Port_Binding(mac=['fa:16:3e:c8:2f:45 172.24.4.35'], port_security=['fa:16:3e:c8:2f:45 172.24.4.35'], nat_addresses=[], type=, up=[False], virtual_parent=[], parent_port=[], requested_additional_chassis=[], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, external_ids={'name': 'octavia-lb-36cab1a4-4a6b-487d-84d0-ba037e5565ea', 'neutron:cidrs': '172.24.4.35/24', 'neutron:device_id': 'lb-36cab1a4-4a6b-487d-84d0-ba037e5565ea', 'neutron:device_owner': 'Octavia', 'neutron:network_name': 'neutron-2d16ac53-8438-435d-a787-e5ceb4b783be', 'neutron:port_capabilities': '', 'neutron:port_name': 'octavia-lb-36cab1a4-4a6b-487d-84d0-ba037e5565ea', 'neutron:project_id': '86f57e2e56874381a0d586263fc8d900', 'neutron:revision_number': '2', 'neutron:security_group_ids': '6511f970-de29-48fe-b87f-18f6988d00e8', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, ha_chassis_group=[], additional_chassis=[], tag=[], additional_encap=[], mirror_rules=[], encap=[], datapath=6fd08134-6556-41b1-83d4-f80c4a08505f, chassis=[], tunnel_key=12, gateway_chassis=[], requested_chassis=[], logical_port=2a1763e0-5ca4-4442-aee6-7fb55eefa020) old= {{(pid=549213) matches /usr/local/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43}}

In IPv6, we see a similar DELETE event, but there are also additional PortBindingUpdateVirtualPortsEvent UPDATE events that occur after the last request to the neutron API:

Jul 25 03:30:07 gthiemon-devstack neutron-server[549209]: DEBUG ovsdbapp.backend.ovs_idl.event [None req-abafc63b-58db-4175-9a20-bd5b15776ef0 None None] Matched DELETE: PortBindingUpdateVirtualPortsEvent(events=('update', 'delete'), table='Port_Binding', conditions=None, old_conditions=None), priority=20 to row=Port_Binding(mac=['fa:16:3e:84:42:0c 2001:db8::172'], port_security=['fa:16:3e:84:42:0c 2001:db8::172'], nat_addresses=[], type=, up=[False], virtual_parent=[], parent_port=[], requested_additional_chassis=[], options={'mcast_flood_reports': 'true', 'requested-chassis': ''}, external_ids={'name': 'octavia-lb-9fff8992-f29e-4caa-828e-6a8b9ff494bf', 'neutron:cidrs': '2001:db8::172/64', 'neutron:device_id': 'lb-9fff8992-f29e-4caa-828e-6a8b9ff494bf', 'neutron:device_owner': 'Octavia', 'neutron:network_name': 'neutron-2d16ac53-8438-435d-a787-e5ceb4b783be', 'neutron:port_capabilities': '', 'neutron:port_name': 'octavia-lb-9fff8992-f29e-4caa-828e-6a8b9ff494bf', 'neutron:project_id': '86f57e2e56874381a0d586263fc8d900', 'neutron:revision_number': '2', 'neutron:security_group_ids': '1adcc745-52df-4577-994a-a36f2cc1c5fe', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, ha_chassis_g...

Read more...

tags: added: ovn
Revision history for this message
Bence Romsics (bence-romsics) wrote :

Hi,

Thank you for the report!

Do you maybe have a reproduction directly using the neutron API (by directly I mean something lighweight like the openstack cli or curl or openstacksdk)? That would help a lot. Also it would be great if we could have API request/response logs both for the working ipv4 and the problematic ipv6 cases. IIUIC then we have even two cases with ipv6: one when it works and one when it fails.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

Hi, my "it's not 100% reproducible (around 90%)" was not accurate, I did new tests:
On a VIP
- with ipv4: no issue
- with ipv6: 100% reproducible
- with both ipv4 and ipv6 on the same port: ~50% reproducible

I have now a reproducer with openstacksdk

Please note that the reproducer script deletes any VMs named "server0", any SGs named "sg0" or "sg1" and any ports named "vip-port0" or "vm-port0".

basically, it:
- creates a VIP port A (down and not bound)
- creates a cirros server
- creates another port B
- attaches port B to the server
- adds an allowed_address_pair (with ip_address = port A (VIP) address) to the port B
- checks the port A, binding_host_id is empty
- connects to cirros and configures the VIP address on the new interface
- checks port A, binding_host_id is set

So it seems clear that an IPv6 advertisement is sent by cirros when configuring the interface, OVN catches this IPv6 adv and sends an event to Neutron, then Neutron sets the binding_host_id or the VIP port.

Is it the expected behavior? If so, I think the validation function is incorrect

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :

I forgot one thing:
After running the script, vip-port0 cannot be updated:

$ openstack port set --name newname vip-port0
BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/01c2f81e-5cad-4c23-b61e-851e52bdbd59, Bad port request: A virtual logical switch port cannot be bound to a host.

summary: - IPv6 VIPs broken with ML2/OVN
+ [ovn] IPv6 VIPs broken with ML2/OVN
Changed in neutron:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Rodolfo Alonso (rodolfo-alonso-hernandez) wrote :

Hello Gregory:

The issue is legit: the check in [1] will read the current port binding profile (once updated). If the port has a host associated, the check will fail and will prevent any update.

The port host is provided by ``PortBindingUpdateVirtualPortsEvent`` class. In IPv6 addresses is easier because, as you commented, OVN read the ND packet. For IPv4 you can create another VM and ping to the VIP of the first VM. That will trigger the ``PortBindingUpdateVirtualPortsEvent`` event and the assignation of the port host.

I'll push an update in order to make this check **only if the port host is updated**.

Regards.

[1]https://github.com/openstack/neutron/blob/d1bfc3d70ada49c9d764565a67e41491e36d44e0/neutron/common/ovn/utils.py#L1064

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/openstack/neutron/+/892564

Changed in neutron:
status: Triaged → In Progress
Changed in neutron:
assignee: nobody → Rodolfo Alonso (rodolfo-alonso-hernandez)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (master)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/892564
Committed: https://opendev.org/openstack/neutron/commit/a3b00768d648742034a4e834875fc4586655787c
Submitter: "Zuul (22348)"
Branch: master

commit a3b00768d648742034a4e834875fc4586655787c
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Aug 23 00:19:24 2023 +0000

    Check the device ID and host ID during virtual port binding

    If a port receives a device ID and a binding profile host ID
    fields update, at the same time, this is because Nova is trying
    to bind the port to a VM (device ID) in a host (host ID). In
    ML2/OVN, a virtual port cannot be bound to a VM.

    NOTE:
    * A virtual port can receive a host ID update. That happens when
      the fixed IP port that has the virtual port IP address as
      allowed address pair is bound.
    * A virtual port can receive a devide ID update. Octavia uses
      the devide ID to identify to what load balancer the virtual
      port belongs.

    This check was introduced in [1].

    [1]https://review.opendev.org/c/openstack/neutron/+/882588

    Closes-Bug: #2028651
    Related-Bug: #2018529
    Change-Id: I8784c6716f5a53b91d43323771e6f30fa8e8e506

Changed in neutron:
status: In Progress → Fix Released
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 23.0.0.0rc1

This issue was fixed in the openstack/neutron 23.0.0.0rc1 release candidate.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to neutron (stable/2023.1)

Fix proposed to branch: stable/2023.1
Review: https://review.opendev.org/c/openstack/neutron/+/895433

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to neutron (stable/2023.1)

Reviewed: https://review.opendev.org/c/openstack/neutron/+/895433
Committed: https://opendev.org/openstack/neutron/commit/4adbc85de70f106e7b358a62f2d9a715bc8701ca
Submitter: "Zuul (22348)"
Branch: stable/2023.1

commit 4adbc85de70f106e7b358a62f2d9a715bc8701ca
Author: Rodolfo Alonso Hernandez <email address hidden>
Date: Wed Aug 23 00:19:24 2023 +0000

    Check the device ID and host ID during virtual port binding

    If a port receives a device ID and a binding profile host ID
    fields update, at the same time, this is because Nova is trying
    to bind the port to a VM (device ID) in a host (host ID). In
    ML2/OVN, a virtual port cannot be bound to a VM.

    NOTE:
    * A virtual port can receive a host ID update. That happens when
      the fixed IP port that has the virtual port IP address as
      allowed address pair is bound.
    * A virtual port can receive a devide ID update. Octavia uses
      the devide ID to identify to what load balancer the virtual
      port belongs.

    This check was introduced in [1].

    [1]https://review.opendev.org/c/openstack/neutron/+/882588

    Closes-Bug: #2028651
    Related-Bug: #2018529
    Change-Id: I8784c6716f5a53b91d43323771e6f30fa8e8e506
    (cherry picked from commit a3b00768d648742034a4e834875fc4586655787c)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix included in openstack/neutron 22.1.0

This issue was fixed in the openstack/neutron 22.1.0 release.

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.