Potential race condition in l3-ha handling of l2pop initial master selection

Bug #1488015 reported by Ihar Hrachyshka
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
neutron
Won't Fix
Low
venkata anil

Bug Description

In _ensure_host_set_on_port, if no master is set for the router yet, we get None from get_active_host_for_ha_router, and in that case we use the reporting agent host to set as an active in the port bindings, assuming that later, when master is elected, it will be reset to the proper value.

The race could occur when we first fetch the active host for the port, it's returned as None because it's not yet elected, then it's elected, and only then we hit the database with our random host, so in the end the port binding contains the host that does not potentially reflect the master (assuming the agent that sent sync_routers() is not the one that became the master).

Tags: l2-pop l3-ha
Assaf Muller (amuller)
Changed in neutron:
status: New → Confirmed
tags: added: l2-pop l3-ha
Changed in neutron:
assignee: nobody → Ann Kamyshnikova (akamyshnikova)
Changed in neutron:
importance: Undecided → Medium
Revision history for this message
Ann Taraday (akamyshnikova) wrote :

As far this issues was not detected on real deployments mark down its importance.

Changed in neutron:
importance: Medium → Low
Changed in neutron:
assignee: Ann Taraday (akamyshnikova) → venkata anil (anil-venkata)
status: Confirmed → In Progress
Revision history for this message
venkata anil (anil-venkata) wrote :

In https://review.openstack.org/#/c/323314/ we follow distributed port binding approach of DVR for HA.

1) For each l3 HA agent, _ensure_host_set_on_port is called, which creates a port entry in ml2_distributed_port_bindings table, with port binding set to this agent. So for each l3 HA agent there is an entry for the port in this table. As we have separate entry for each agent, there won't be race.

2) Then each l3 HA agent will plug the HA router port on the node and corresponding l2 agent will wire up the port (this is existing behavior). This wiring will result in port status update calls, and the new plugin implementation will update port status in ml2_distributed_port_bindings table. This helps l2pop to notify port status updates about this agent to other agents.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on neutron (master)

Change abandoned by venkata anil (<email address hidden>) on branch: master
Review: https://review.openstack.org/323314
Reason: Prefering patch 255237 over this
1) Backporting alembic migration may not be allowed
2) To avoid special handling for TOR drivers

note: We may visit these patches(340031, 324302, 323314) later to solve 1488015.

Revision history for this message
OpenStack Infra (hudson-openstack) wrote :

Change abandoned by Armando Migliaccio (<email address hidden>) on branch: master
Review: https://review.openstack.org/285773
Reason: This review is > 4 weeks without comment, and failed Jenkins the last time it was checked. We are abandoning this for now. Feel free to reactivate the review by pressing the restore button and leaving a 'recheck' comment to get fresh test results.

Changed in neutron:
status: In Progress → Won't Fix
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.