[ovn-octavia-provider] Deleting or updating Load Balancer members in on pool can effect other pools or same pool if the same IP is used

Bug #2051172 reported by Jay Rhine
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
New
Undecided
Unassigned

Bug Description

* High level description:
We have discovered that when using Octavia OVN Load Balancers, if you have 2 or more pools with the same member ips (but different ports) if you remove the member from one pool it will remove the ip configuration from the "ip_port_mappings" in the OVN Load Balancer even if another pool is still using this IP. Additionally, if you update the pool members to point to the some member IP's but a different port the "ip_port_mappings" will also be cleared for those IP's. The update behavior is because the additions are done by the provider followed by the deletions, so if the IP addresses are the same, the final deletions will clear out the IP's.

We experienced this problem when using Magnum with OVN load balancers, as Magnum in some cases would update the load balancers to ensure proper configuration, and this would result in the load balancers not function even though the final Octavia configuration was correct (but since the ip_port_mapping table was cleared the LB were not operational)

This behavior was likely introduced by the fix for bug:

https://bugs.launchpad.net/neutron/+bug/2007835

Which resulted in commit:
https://opendev.org/openstack/ovn-octavia-provider/commit/e40e0d62ac089afd14c03102d80990e792211af3

This commit updates the OVS / OVN databases in a more efficient way (only adding or deleting what is needed). However, since it is not recomputing all ip_port_mappings for every update, its possible to hit this scenario.

The "ovn-octavia-provider/ovn_octavia_provider/helper.py" code already includes a solution to this scenario in the "_clean_ip_port_mappings" method. In this case, the method will verify if the ip address is in use by any other pools before removing the ip address from the "ip_port_mappings" table. The same logic can also be applied to the "_update_ip_port_mappings" to prevent this issue from occurring in the pool member update or delete cases. I have tried making such a change, and it works well for us.

However, it should be noted that the logic in the "_clean_ip_port_mappings" only looks at other pools, not at the current pool. Therefore, if pool members were updated from one port to another port but with the same IP, my patch to apply this to thae same logic "_update_ip_port_mappings" may not cover all scenarios.

Therefore, it may be optimal to update the checks to be more thorough. If the maintainers of this project are in agreement with the logic of my proposed updates, I could work to submit an oficial patch for this issue.

* Version:
  ** OpenStack version
Openstack Antelope (2023.1)
  ** Linux distro, kernel.
Ubuntu 22.04
  ** DevStack or other _deployment_ mechanism?
Kolla Anisable

Patch for fixing issue:
```
--- helper.py.new_orig 2024-01-21 03:05:27.000000000 +0000
+++ helper.py.proposed1 2024-01-24 19:12:17.843028336 +0000
@@ -2441,7 +2441,7 @@
         self._execute_commands(commands)
         return True

- def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip,
+ def _update_ip_port_mappings(self, ovn_lb, backend_ip, port_name, src_ip, pool_key,
                                  delete=False):

         # ip_port_mappings:${MEMBER_IP}=${LSP_NAME_MEMBER}:${HEALTH_SRC}
@@ -2449,10 +2449,42 @@
         # MEMBER_IP: IP of member_lsp
         # LSP_NAME_MEMBER: Logical switch port
         # HEALTH_SRC: source IP of hm_port
-
         if delete:
- self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
- backend_ip).execute()
+ # NOTE(jrhine): This is basically the same code as in the _clean_ip_port_mappings
+ # function, but with a single member ip. Just like in that function,
+ # before removing a member from the ip_port_mappings
+ # list, we need to ensure that the member is not being used
+ # by an other pool to prevent accidentally removing the
+ # member we can use the neutron:member_status to search for any
+ # other members with the same address
+ other_members = []
+ for k, v in ovn_lb.external_ids.items():
+ if ovn_const.LB_EXT_IDS_POOL_PREFIX in k and k != pool_key:
+ other_members.extend(self._extract_member_info(
+ ovn_lb.external_ids[k]))
+
+ member_statuses = ovn_lb.external_ids.get(
+ ovn_const.OVN_MEMBER_STATUS_KEY)
+
+ try:
+ member_statuses = jsonutils.loads(member_statuses)
+ except TypeError:
+ LOG.debug("no member status on external_ids: %s",
+ str(member_statuses))
+ member_statuses = {}
+
+ execute_delete = True
+ for member_id in [item[3] for item in other_members
+ if item[0] == backend_ip]:
+ if member_statuses.get(
+ member_id, '') != constants.NO_MONITOR:
+ # same address being monitorized by another HM
+ execute_delete = False
+ LOG.debug(f"Rumble: member {member_id} using address {backend_ip}, so not deleting it ")
+ if execute_delete:
+ LOG.debug(f"Rumble: verified address {backend_ip} is not being used both other memebers, so deleting ip_port_mapping")
+ self.ovn_nbdb_api.lb_del_ip_port_mapping(ovn_lb.uuid,
+ backend_ip).execute()
         else:
             self.ovn_nbdb_api.lb_add_ip_port_mapping(ovn_lb.uuid,
                                                      backend_ip,
@@ -2542,7 +2574,7 @@
                                'pool': pool_key})
                     return None
                 self._update_ip_port_mappings(ovn_lb, backend_ip,
- member_lsp.name, hm_source_ip,
+ member_lsp.name, hm_source_ip, pool_key,
                                               delete)
                 return constants.ONLINE
```

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.