Comment 3 for bug 1794991

Revision history for this message
Manuel Rodriguez (manuel.rodriguez) wrote :

Just adding more information to the issue number[1] Gaëtan described in the initial comment:

These are the flows listed from a vxlan network segmentation id 97, or 61 in hexadecimal from one of our compute nodes:

# dex -u0 openvswitch_vswitchd ovs-ofctl dump-flows br-tun | grep 0x61
 cookie=0xde6f920d0d405dbc, duration=500977.143s, table=4, n_packets=427, n_bytes=41999, priority=1,tun_id=0x61 actions=mod_vlan_vid:8,resubmit(,9)

As you can see there is only one flow in the table 4, there was already two instances of that network running in the compute node, one created a few days ago, and another a few minutes ago, the later never got a DHCP IP because there were no flows to reach the DHCP namespaces running on three different controllers. To fix this we have found two workarounds:

1. Add the flows manually
2. Modify the network so it will trigger a flow addition across computes running instances from that network.

We tried to the second option by adding a second subnet to the network, and we could see log events about update_fdb_entries for the DHCP agents, now flows looks like this, even after deleting the second subnet,

# dex -u0 openvswitch_vswitchd ovs-ofctl dump-flows br-tun | grep 0x61
 cookie=0xde6f920d0d405dbc, duration=501247.247s, table=4, n_packets=436, n_bytes=44143, priority=1,tun_id=0x61 actions=mod_vlan_vid:8,resubmit(,9)
 cookie=0xde6f920d0d405dbc, duration=15.405s, table=20, n_packets=1, n_bytes=42, priority=2,dl_vlan=8,dl_dst=fa:16:3e:4c:67:28 actions=strip_vlan,load:0x61->NXM_NX_TUN_ID[],output:"vxlan-0a83083e"
 cookie=0xde6f920d0d405dbc, duration=14.697s, table=20, n_packets=2, n_bytes=84, priority=2,dl_vlan=8,dl_dst=fa:16:3e:5b:63:e3 actions=strip_vlan,load:0x61->NXM_NX_TUN_ID[],output:"vxlan-0a83083d"
 cookie=0xde6f920d0d405dbc, duration=14.711s, table=22, n_packets=9, n_bytes=1818, priority=1,dl_vlan=8 actions=strip_vlan,load:0x61->NXM_NX_TUN_ID[],output:"vxlan-0a83083d",output:"vxlan-0a83083e"

From the flows above we identify two things:
- They only take care of the communication to the DHCP agents, in other words none of the outputs of the vxlan tunnels from other computes running instances in that network are added in table 22.
- We are still missing a flow from one of the DHCP agents.

If we restart the instance that didn't get a DHCP lease, now it gets an IP. However once we restart the neutron openvswitch agent in the compute, we lose the flows from table 20 and table 22. Why we don't understand is why it always returns to the same unhealthy state with only one flow for table 4.