ovs-vswitchd thread consuming 100% CPU
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
openvswitch (Ubuntu) |
Confirmed
|
Undecided
|
Unassigned |
Bug Description
I have an ovs-vswitchd process consuming 100% CPU in a very lightly used Openstack Rocky cloud running on Bionic. The version in question is openvswitch-switch 2.10.0-
ovs-vswitchd is running alongside various neutron processes (lbaasv2-agent, metadata-agent, l3-agent, dhcp-agent, openvswitch-agent) inside an LXC container on a physical host. There is a single neutron router, and the entire environment including br-tun, br-ex, and br-int traffic barely goes over 200KiB/s TX/RX combined.
The issue appears to have arisen on its own after the host was hard rebooted. On restart ovs-vswitchd came up with high load and it has not diminished since then.
The thread that is consuming CPU looks like this in ps and shows up with the name 'handler89':
UID PID SPID PPID C STIME TTY TIME CMD
root 7267 7454 1 99 Apr23 ? 8-03:38:14 ovs-vswitchd unix:/var/
Logs in /var/log/
2019-05-
2019-05-
Logs in /var/log/
2019-05-01 18:35:31.174 13621 DEBUG neutron.
2019-05-01 18:35:31.177 13621 DEBUG neutron.
2019-05-01 18:35:31.179 13621 DEBUG neutron.
I'm not sure what else information to add here but am happy to gather more diagnostic data to try to pin this down. I did come across https:/
Below are various ovs-dpctl/ofctl show reports:
root@juju-
system@ovs-system:
lookups: hit:223561120 missed:5768546 lost:798
flows: 131
masks: hit:2284286371 total:15 hit/pkt:9.96
port 0: ovs-system (internal)
port 1: br-ex (internal)
port 2: eth1
port 3: gre_sys (gre: packet_type=ptap)
port 4: br-tun (internal)
port 5: br-int (internal)
port 6: tapa062d6f1-40
port 7: tapf90e3ab6-13
port 8: tap45ba891c-4c
root@juju-
OFPT_FEATURES_REPLY (xid=0x2): dpid:00003643ed
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(int-br-ex): addr:4a:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(patch-tun): addr:7a:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(tapa062d6f1-40): addr:92:
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
4(tapf90e3ab6-13): addr:9e:
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
5(tap45ba891c-4c): addr:76:
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
LOCAL(br-int): addr:36:
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_
root@juju-
OFPT_FEATURES_REPLY (xid=0x2): dpid:000092c6c8
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(patch-int): addr:02:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
2(gre-0a30029b): addr:fe:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
3(gre-0a3002a2): addr:a2:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
4(gre-0a30029d): addr:12:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
5(gre-0a3002ce): addr:ca:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
6(gre-0a3002a1): addr:de:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
7(gre-0a30029e): addr:5a:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
8(gre-0a3002c3): addr:b6:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
10(gre-0a3002a0): addr:ce:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
14(gre-0a30029f): addr:b6:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
18(gre-0a30029c): addr:da:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-tun): addr:92:
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_
root@juju-
OFPT_FEATURES_REPLY (xid=0x2): dpid:000000163e
n_tables:254, n_buffers:0
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
1(eth1): addr:00:
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
2(phy-br-ex): addr:ce:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
LOCAL(br-ex): addr:00:
config: 0
state: 0
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_
information type: | Public → Public Security |
information type: | Public Security → Public |
I have same issue with highload cpu usage on network node Openstack Rocky Ubuntu 18.04
Kernel 4.15.0-48-generic #51-Ubuntu
Package: openvswitch- agent 2:13.0. 2-0ubuntu3. 1~cloud0 all Neutron is a virtual network service for Openstack - Open vSwitch plugin agent 0ubuntu2~ cloud0 amd64 Open vSwitch common components 0ubuntu2~ cloud0 amd64 Open vSwitch switch implementations
ii neutron-
ii openvswitch-common 2.10.0-
ii openvswitch-switch 2.10.0-
Sytem not reboot, 08T15:30: 13.280Z| 00726|connmgr| INFO|br- tun<->tcp: 127.0.0. 1:6633: 12 flow_mods in the 9 s starting 10 s ago (9 adds, 3 deletes) 08T15:30: 14.232Z| 00727|bridge| INFO|bridge br-tun: added interface vxlan-0ac84a99 on port 55 08T15:30: 17.456Z| 00002|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00003|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00004|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00005|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00006|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00007|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00008|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00009|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00010|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 17.456Z| 00011|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage) 08T15:30: 23.456Z| 00012|poll_ loop(handler68) |INFO|Dropped 2841636 log messages in last 6 seconds (most recently, 0 seconds ago) due to excessive rate 08T15:30: 23.456Z| 00013|poll_ loop(handler68) |INFO|wakeup due to [POLLIN] on fd 29 (unknown anon_inode: [eventpoll] ) at ../lib/ dpif-netlink. c:2786 (99% CPU usage)
from logs ovs-vswitchd.log:
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-
2019-05-