host not removed from table ml2_vxlan_endpoints with the agent delete

Bug #2040517 reported by yatin
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Triaged
Medium
Miro Tomaska

Bug Description

After deleting an agent, there is stale entry for the host in table 'ml2_vxlan_endpoints'. An use case is during node scale down, a agent is deleted, but the host entry is not removed from ml2_vxlan_endpoints;

I have not checked other topologies but same should apply to other similar tables 'ml2_gre_endpoints' and 'ml2_geneve_endpoints'

# Ensure agent is stopped or node is removed.

$ openstack network agent show 338d13fc-3483-414f-bc55-5b2cbb0db189 --fit-width
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| Field | Value |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------+
| admin_state_up | UP |
| agent_type | Open vSwitch agent |
| alive | XXX |
| availability_zone | None |
| binary | neutron-openvswitch-agent |
| configuration | {'arp_responder_enabled': True, 'baremetal_smartnic': False, 'bridge_mappings': {'public': 'br-ex'}, 'datapath_type': 'system', |
| | 'devices': 0, 'enable_distributed_routing': True, 'extensions': [], 'in_distributed_mode': True, 'integration_bridge': 'br-int', |
| | 'l2_population': True, 'log_agent_heartbeats': False, 'ovs_capabilities': {'datapath_types': ['netdev', 'system'], 'iface_types': |
| | ['bareudp', 'erspan', 'geneve', 'gre', 'gtpu', 'internal', 'ip6erspan', 'ip6gre', 'lisp', 'patch', 'stt', 'system', 'tap', 'vxlan']}, |
| | 'ovs_hybrid_plug': False, 'resource_provider_bandwidths': {'br-ex': {'egress': 1000000, 'ingress': 1000000}}, |
| | 'resource_provider_hypervisors': {'br-ex': 'ykarel-temp3', 'rp_tunnelled': 'ykarel-temp3'}, 'resource_provider_inventory_defaults': |
| | {'allocation_ratio': 1.0, 'min_unit': 1, 'step_size': 1, 'reserved': 0}, 'resource_provider_packet_processing_inventory_defaults': |
| | {'allocation_ratio': 1.0, 'min_unit': 1, 'step_size': 1, 'reserved': 0}, 'resource_provider_packet_processing_with_direction': {}, |
| | 'resource_provider_packet_processing_without_direction': {}, 'tunnel_types': ['vxlan'], 'tunneling_ip': '10.0.109.173', |
| | 'vhostuser_socket_dir': '/var/run/openvswitch'} |
| created_at | 2023-10-25 14:30:17 |
| description | None |
| ha_state | None |
| host | ykarel-temp3 |
| id | 338d13fc-3483-414f-bc55-5b2cbb0db189 |
| last_heartbeat_at | 2023-10-25 14:30:17 |
| resources_synced | None |
| started_at | 2023-10-25 14:30:17 |
| topic | N/A |
+-------------------+----------------------------------------------------------------------------------------------------------------------------------------+

$ openstack network agent delete 338d13fc-3483-414f-bc55-5b2cbb0db189

mysql> select * from ml2_vxlan_endpoints;
+--------------+----------+--------------+
| ip_address | udp_port | host |
+--------------+----------+--------------+
| 10.0.109.173 | 4789 | ykarel-temp3 | <- Host/Ip entry still exist after the agent delete
| 10.0.109.224 | 4789 | ykarel-temp2 |
| 10.0.109.60 | 4789 | ykarel-temp1 |
+--------------+----------+--------------+
3 rows in set (0.00 sec)

The stale entry doesn't caused any issue but it's good to get rid of the host/node references which are removed from the cluster. Stale entries are also seen in table 'segmenthostmappings' which is the other known not fixed issue https://bugs.launchpad.net/neutron/+bug/1621717

These left overs were noticed in an older release train https://bugzilla.redhat.com/show_bug.cgi?id=2242298 but it still exists in master as seen above.

Miguel Lavalle (minsel)
Changed in neutron:
status: New → Triaged
importance: Undecided → Medium
Miro Tomaska (mtomaska)
Changed in neutron:
assignee: nobody → Miro Tomaska (mtomaska)
Revision history for this message
Miro Tomaska (mtomaska) wrote :

Also found this old bug[1] which should have fix this issue. But there were few refactors and the issue might have been reintroduced. I will look into it and I will see if there is a way to write a test to prevernt this from regressing again.

[1]https://bugs.launchpad.net/neutron/+bug/1179223

Revision history for this message
Miro Tomaska (mtomaska) wrote :
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.