ovn-controller on Wallaby creates high CPU usage after moving port

Bug #1963698 reported by DUFOUR Olivier
18
This bug affects 2 people
Affects Status Importance Assigned to Milestone
ovn (Ubuntu)
New
Undecided
Unassigned

Bug Description

We are deploying Focal Wallaby for a customer
Neutron package version (2:18.2.0-0ubuntu1~cloud0), GLIBC 2.31-0ubuntu9.7

When running rally/tempest tests that are creating some VMs, the following symptoms happen:
1) A huge increase of size and load of writings on /var/lib/openvswitch/conf.db
(If ovsdb-server is restarted while OVS database is a few GB, the unit can fail to start)

2) A very high CPU usage on the following processes :
* neutron-ovn-metadata-agent
* nova-compute
* ovn-controller
* ovsdb-server

3) The Nova compute node may face some severe delays and may time-out when creating any instance (for Nova or Octavia Amphora) on it.

A temporary way to solve the issue is to restart ovn-controller service.
Then it reproduces again after some time on a different hypervisor.

It has been reproducible so far only on a customer deployment with many Nova-compute units.

Ovn-controller.log on the hypervisor:
2022-03-04T12:54:43.065Z|00479|binding|INFO|Changing chassis for lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from comp04.maas to comp18.maas
.
2022-03-04T12:54:43.065Z|00480|binding|INFO|cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53: Claiming fa:16:3e:15:1f:a6 10.218.131.106/18
2022-03-04T12:54:43.077Z|00481|binding|INFO|Releasing lport cr-lrp-f741e3f2-4708-4091-841d-4a9c05f09b53 from this chassis.
2022-03-04T12:54:46.798Z|00482|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00483|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00484|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)
2022-03-04T12:54:46.799Z|00485|poll_loop|INFO|wakeup due to [POLLIN] on fd 13 (<->/var/run/openvswitch/db.sock) at lib/stream-fd.c:157 (64% CPU usage)

Full log of ovn-controller available here :
https://private-fileshare.canonical.com/~alitvinov/random/ovn-controller.txt

Bundle available as well here :
https://private-fileshare.canonical.com/~alitvinov/random/bundle-ovn-controller.txt

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

subscribed ~field-high

Revision history for this message
Launchpad Janitor (janitor) wrote :

Status changed to 'Confirmed' because the bug affects multiple users.

Changed in ovn (Ubuntu):
status: New → Confirmed
Nobuto Murata (nobuto)
affects: networking-ovn → ovn (Ubuntu)
description: updated
Revision history for this message
Nobuto Murata (nobuto) wrote :

In this specific case (the environment Olivier described), we tested focal-xena and the issue was NOT reproducible. We've decided to go with Xena so field-high can be dropped (I'm not able to remove the subscription by myself here).

Assuming that it might be focal-wallaby specific since we haven't seen this kind of issues in other customers with ussuri, there may be some patches which needs to be backported. e.g. other distribution seems to have backported the following:
https://github.com/ovn-org/ovn/commit/c83294970c62f662015a7979b12250580bee3001
(no idea if it's connected to the issue or not though)

Revision history for this message
DUFOUR Olivier (odufourc) wrote :

unsubscribed ~field-high

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.