Comment 1 for bug 1525901

Revision history for this message
Miguel Angel Ajo (mangelajo) wrote :

I've triaged this bug myself, you can reproduce it by:

1) starting a 2 or 3 network nodes, and setting up ha routers
2) creating a few ha routers (10 would suffice)
3) stopping ovs-agent & l3-agent & dhcp agent on all the nodes for T>agent_down_time
4) starting them all at once.

like 50% of the time:

1) l3-agent will try to rebind some of the router ports before any ovs-agent has reported himself (via heartbeat) as UP.
2) The result is the port being moved into binding failed status.
3) Then ovs-agent boots up, and marks the ports as dead internal VLAN (4095).
4) This recovers if you restart the l3-agent again, because that tries again to rebind the port, and some agent is up now.
[5) I'm not sure now if you needed to restart OVS agent again or not]