During provision of large scale vm's number several percent of vm's fireup without network connectivity. We found that the reason of faulty networking is the incorrect state in Openflow table and there is no connectivity over vxlan between affected compute nodes and controllers.
A proper Openflow table shows complete list of vxlan interfaces to all compute nodes and controllers:
cookie=0x98572d2e5f45dc06, duration=2639.513s, table=22, n_packets=212, n_bytes=57108, priority=1,dl_vlan=10 actions=strip_vlan,load:0x30->NXM_NX_TUN_ID[],output:"vxlan-0afe0c74",output:"vxlan-0afe0c80",output:"vxlan-0afe0c0b",output:"vxlan-0afe0c0c",output:"vxlan-0afe0c0d",output:"vxlan-0afe0c7d",output:"vxlan-0afe0c66",output:"vxlan-0afe0c81",output:"vxlan-0afe0c6d",output:"vxlan-0afe0c6c",output:"vxlan-0afe0c69",output:"vxlan-0afe0c7a",output:"vxlan-0afe0c79",output:"vxlan-0afe0c78",output:"vxlan-0afe0c7f",output:"vxlan-0afe0c7e",output:"vxlan-0afe0c67",output:"vxlan-0afe0c7c",output:"vxlan-0afe0c83",output:"vxlan-0afe0c86",output:"vxlan-0afe0c87",output:"vxlan-0afe0c76",output:"vxlan-0afe0c84",output:"vxlan-0afe0c85",output:"vxlan-0afe0c75",output:"vxlan-0afe0c72",output:"vxlan-0afe0c73",output:"vxlan-0afe0c71",output:"vxlan-0afe0c6f",output:"vxlan-0afe0c7b",output:"vxlan-0afe0c6b",output:"vxlan-0afe0c6a",output:"vxlan-0afe0c6e",output:"vxlan-0afe0c77",output:"vxlan-0afe0c65",output:"vxlan-0afe0c70"
An incorrect state of Openflow table shows that the vxlan interfaces to controllers are missing:
cookie=0xeee71baa637a6dde, duration=754.490s, table=22, n_packets=147, n_bytes=39834, priority=1,dl_vlan=10 actions=strip_vlan,load:0x30->NXM_NX_TUN_ID[],output:"vxlan-0afe0c74",output:"vxlan-0afe0c80",output:"vxlan-0afe0c7d",output:"vxlan-0afe0c66",output:"vxlan-0afe0c81",output:"vxlan-0afe0c6d",output:"vxlan-0afe0c6c",output:"vxlan-0afe0c69",output:"vxlan-0afe0c7a",output:"vxlan-0afe0c79",output:"vxlan-0afe0c78",output:"vxlan-0afe0c7f",output:"vxlan-0afe0c7e",output:"vxlan-0afe0c67",output:"vxlan-0afe0c7c",output:"vxlan-0afe0c86",output:"vxlan-0afe0c87",output:"vxlan-0afe0c76",output:"vxlan-0afe0c84",output:"vxlan-0afe0c85",output:"vxlan-0afe0c75",output:"vxlan-0afe0c72",output:"vxlan-0afe0c73",output:"vxlan-0afe0c71",output:"vxlan-0afe0c6f",output:"vxlan-0afe0c7b",output:"vxlan-0afe0c6b",output:"vxlan-0afe0c6a",output:"vxlan-0afe0c6e",output:"vxlan-0afe0c65",output:"vxlan-0afe0c70"
Restarting neutron_openvswitch_agent container fix the problem on affected compute node by adding missing vxlans.
Missing output ports:
output: "vxlan- 0afe0c0b" - 10.254.12.11 ctrl1 "vxlan- 0afe0c0c" - 10.254.12.12 ctrl2 "vxlan- 0afe0c0d" - 10.254.12.13 ctrl3
output:
output:
When the issue arise, always all controllers are missing in the output port list.