We have deployed more OpenStack plateform in my company.
We used kolla ansible to deploy our plateforms.
Here is the configuration that we applied :
kolla_base_distro: "centos"
kolla_install_type : "binary"
openstack_version : "stein"
Recently, we have upgraded a master region from rocky to stein with kolla ansible upgrade procedure.
Since ugrade, sometimes openvswitch agent lost connexion to ovsdb.
We have found this error in neutron-openvswitch-agent.log : "tcp:127.0.0.1:6640: send error: Broken pipe".
And we have found this errors in ovsdb-server.log :
2020-02-24T23:13:22.644Z|00009|reconnect|ERR|tcp:127.0.0.1:50260: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T04:10:55.893Z|00010|reconnect|ERR|tcp:127.0.0.1:58544: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T07:21:12.301Z|00011|reconnect|ERR|tcp:127.0.0.1:34918: no response to inactivity probe after 5 seconds, disconnecting
2020-02-25T09:21:45.533Z|00012|reconnect|ERR|tcp:127.0.0.1:37782: no response to inactivity probe after 5 seconds, disconnecting
When we experience this issue, all "NORMAL" type flows inside br-ex doesn't get out.
Example of flows stuck:
(neutron-openvswitch-agent)[root@cnp69s12p07 /]# ovs-ofctl dump-flows br-ex | grep NORMAL
cookie=0x7adbd675f988912b, duration=72705.077s, table=0, n_packets=185, n_bytes=16024, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL
cookie=0x7adbd675f988912b, duration=72695.007s, table=2, n_packets=11835702, n_bytes=5166123797, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=1 actions=mod_vlan_vid:12,NORMAL
cookie=0x7adbd675f988912b, duration=72694.928s, table=2, n_packets=4133243, n_bytes=349654412, idle_age=0, hard_age=65534, priority=4,in_port=5,dl_vlan=9 actions=mod_vlan_vid:18,NORMAL
Hi all,
We have deployed more OpenStack plateform in my company.
We used kolla ansible to deploy our plateforms.
Here is the configuration that we applied :
kolla_base_distro: "centos"
kolla_install_type : "binary"
openstack_version : "stein"
Neutron architecture :
HA l3 enable
DVR enable
SNAT Enabled
multiple vlan provider : True
Note: Our plateforms are multi-region
Recently, we have upgraded a master region from rocky to stein with kolla ansible upgrade procedure. openvswitch- agent.log : "tcp:127. 0.0.1:6640: send error: Broken pipe". 24T23:13: 22.644Z| 00009|reconnect |ERR|tcp: 127.0.0. 1:50260: no response to inactivity probe after 5 seconds, disconnecting 25T04:10: 55.893Z| 00010|reconnect |ERR|tcp: 127.0.0. 1:58544: no response to inactivity probe after 5 seconds, disconnecting 25T07:21: 12.301Z| 00011|reconnect |ERR|tcp: 127.0.0. 1:34918: no response to inactivity probe after 5 seconds, disconnecting 25T09:21: 45.533Z| 00012|reconnect |ERR|tcp: 127.0.0. 1:37782: no response to inactivity probe after 5 seconds, disconnecting
Since ugrade, sometimes openvswitch agent lost connexion to ovsdb.
We have found this error in neutron-
And we have found this errors in ovsdb-server.log :
2020-02-
2020-02-
2020-02-
2020-02-
When we experience this issue, all "NORMAL" type flows inside br-ex doesn't get out. openvswitch- agent)[ root@cnp69s12p0 7 /]# ovs-ofctl dump-flows br-ex | grep NORMAL 0x7adbd675f9889 12b, duration= 72705.077s, table=0, n_packets=185, n_bytes=16024, idle_age=65534, hard_age=65534, priority=0 actions=NORMAL 0x7adbd675f9889 12b, duration= 72695.007s, table=2, n_packets=11835702, n_bytes=5166123797, idle_age=0, hard_age=65534, priority= 4,in_port= 5,dl_vlan= 1 actions= mod_vlan_ vid:12, NORMAL 0x7adbd675f9889 12b, duration= 72694.928s, table=2, n_packets=4133243, n_bytes=349654412, idle_age=0, hard_age=65534, priority= 4,in_port= 5,dl_vlan= 9 actions= mod_vlan_ vid:18, NORMAL
Example of flows stuck:
(neutron-
cookie=
cookie=
cookie=
Workaround to solve this issue: vswitchd neutron_ openvswitch_ agent neutron_l3_agent (containers) vswitchd openvswitch_ agent
- stop openvswitch_db openvswitch_
- start containers: openvswitch_db openvswitch_
- start neutron_l3_agent neutron_
Note: we have keep ovs connection timeout options by default : probe: 10
- of_connect_timeout: 300
- of_request_timeout: 300
- of_inactivity_
Thank you in advance for your help.