Random failures of failovers with IPv6 VIP

Bug #2028524 reported by Gregory Thiemonge
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
octavia
New
Undecided
Unassigned

Bug Description

A failover of a LB with an IPv6 VIP randomly fails.

Create a LB with an IPv6 VIP:

$ openstack loadbalancer create --vip-subnet ipv6-public-subnet --name lb1

(not reproducible with ipv4)

$ vip_port_id=$(openstack loadbalancer show -c vip_port_id -f value lb1)

Check the VIP port:

$ openstack port show $vip_port_id | grep bind
| binding_host_id | gthiemon-devstack |
| binding_profile | |
| binding_vif_details | |
| binding_vif_type | unbound |
| binding_vnic_type | normal |

binding_host_id should be empty, the port is not bound.
(in some cases, it's empty, but then it's set after a first failover, so you need at least 2 failovers to trigger the bug)

Perform a failover:

$ openstack loadbalancer failover lb1

It fails:

Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: |__Flow 'octavia-failover-loadbalancer-flow': octavia.network.base.PlugVIPException: BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/618567c4-78c7-4398-
b889-b567f6fd6aeb, Bad port request: A virtual logical switch port cannot be bound to a host.
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow Traceback (most recent call last):
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/base.py", line 129, in _add_security_group_to_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self.network_proxy.update_port(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/network/v2/_proxy.py", line 2979, in update_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return self._update(_port.Port, port, if_revision=if_revision, **attrs)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/proxy.py", line 64, in check Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return method(self, expected, actual, *args, **kwargs)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/network/v2/_proxy.py", line 189, in _update
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return res.commit(self, base_path=base_path, if_revision=if_revision) Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1794, in commit
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow return self._commit(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1839, in _commit
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self._translate_response(response, has_body=has_body)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/resource.py", line 1278, in _translate_response
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow exceptions.raise_from_response(response, error_message=error_message)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/openstacksdk/openstack/exceptions.py", line 263, in raise_from_response
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow raise cls(
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow openstack.exceptions.BadRequestException: BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/618567c4-78c7-4398-b889-b567f6fd6aeb, Bad port request: A virtual logical switch port cannot be bound to a host.
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow During handling of the above exception, another exception occurred:
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow Traceback (most recent call last):
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 271, in _add_vip_security_group_to_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self._add_security_group_to_port(sec_grp_id, port_id)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/base.py", line 134, in _add_security_group_to_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow raise base.NetworkException(str(e))
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow octavia.network.base.NetworkException: BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/618567c4-78c7-4398-b889-b567f6fd6aeb, Bad port request: A virtual logical switch port cannot be bound to a host.
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow During handling of the above exception, another exception occurred:
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow Traceback (most recent call last):
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/taskflow/taskflow/engines/action_engine/executor.py", line 52, in _execute_task
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow result = task.execute(**arguments)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/controller/worker/v2/tasks/network_tasks.py", line 530, in execute
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow sg_id = self.network_driver.update_vip_sg(db_lb, db_lb.vip)
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 411, in update_vip_sg
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow self._add_vip_security_group_to_port(load_balancer.id, vip.port_id,
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow File "/opt/stack/octavia/octavia/network/drivers/neutron/allowed_address_pairs.py", line 275, in _add_vip_security_group_to_port
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow raise base.PlugVIPException(str(e))
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow octavia.network.base.PlugVIPException: BadRequestException: 400: Client Error for url: http://192.168.1.101:9696/networking/v2.0/ports/618567c4-78c7-4398-b889-b567f6fd6aeb, Bad port request: A virtual logical switch port cannot be bound to a host.
Jul 24 03:08:07 gthiemon-devstack octavia-worker[97901]: ERROR octavia.common.base_taskflow

The following neutron commit introduced this regression:

https://review.opendev.org/c/openstack/neutron/+/882588

It checks that a Virtual port is not bound to a host, but in our case, it is (and maybe this is another bug in neutron)

Next steps:

1. the exception is triggered when Octavia updates the security group of the VIP port, why does it do that? the port is not bound, it doesn't need SGs. It should update only the VRRP port.
2. we need to figure out why neutron sets the binding_host_id on a VIP port, I'm working on a reproducer with openstacksdk for neutron folks.

Revision history for this message
Gregory Thiemonge (gthiemonge) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.