ovn-northd not always restarted after certificates written

Bug #1895303 reported by Frode Nordahl
16
This bug affects 2 people
Affects Status Importance Assigned to Milestone
charm-ovn-central
Fix Committed
High
Frode Nordahl

Bug Description

The symptom is:
2020-09-10T17:16:41.349Z|00011|ovsdb_idl|WARN|transaction error: {"details":"RBAC rules for client \"node4.maas\" role \"ovn-controller\" prohibit row insertion into table \"Chassis\".","error":"permission error"}
2020-09-10T17:48:03.688Z|162519|main|INFO|OVNSB commit failed, force recompute next time.

in /var/log/ovn-controller.log on hypervisors combined with

2020-09-11T05:35:21.981Z|139255|reconnect|WARN|ssl:10.0.0.166:45482: connection dropped (Protocol error)

in /var/log/ovsdb-server-sb.log on the ovnsb_db leader.

Normally this means a mismatch between the host FQDN and what is configured in the Open_vSwitch table and/or the CN in the hypervisors certificate. But in this case that looks correct. What I do see is that none of the ovn-central units is claiming to have an active ovn-northd, and looking at /var/log/ovn/ovn-northd.log on one of the central units confirms that it is not able to talk to the databases.

The root cause of this is that the ovn-northd service has not been restarted after writing the certificates to disk.

systemctl status ovn-northd indicates that the service started before the certificate files in /etc/ovn were created.

It is ovn-northd's responsibility to create the RBAC rules in the database, and if it has never connected they would not be there, which would lead to chassis not being able to register itself.

Frode Nordahl (fnordahl)
Changed in charm-ovn-central:
status: New → Triaged
importance: Undecided → High
Revision history for this message
Peter Matulis (petermatulis) wrote :
Revision history for this message
Peter Matulis (petermatulis) wrote :

To be clear, the workaround is to restart the ovn-northd daemon on each ovn-central unit. For example:

juju ssh ovn-central/0 sudo systemctl restart ovn-northd
juju ssh ovn-central/1 sudo systemctl restart ovn-northd
juju ssh ovn-central/2 sudo systemctl restart ovn-northd

Revision history for this message
Peter Matulis (petermatulis) wrote :

So far this bug manifests only during a manual cloud install. It has not been observed with a bundle install.

Frode Nordahl (fnordahl)
Changed in charm-ovn-central:
assignee: nobody → Frode Nordahl (fnordahl)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (master)

Fix proposed to branch: master
Review: https://review.opendev.org/753541

Changed in charm-ovn-central:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-central (master)

Reviewed: https://review.opendev.org/753541
Committed: https://git.openstack.org/cgit/x/charm-ovn-central/commit/?id=71dd75c4cde3cbc16a5fd958b8e47edd60c98068
Submitter: Zuul
Branch: master

commit 71dd75c4cde3cbc16a5fd958b8e47edd60c98068
Author: Frode Nordahl <email address hidden>
Date: Wed Sep 23 10:09:51 2020 +0200

    Reload `ovn-northd` service on certificate data change

    The `ovn-northd` daemon does not detect certificate data changes,
    reload the service when certificate data changes.

    Change-Id: I37c6ff2c90f94ea0e77b27a9b28dc9dd0770b97e
    Closes-Bug: #1895303

Changed in charm-ovn-central:
status: In Progress → Fix Committed
Changed in charm-ovn-central:
milestone: none → 20.10
Changed in charm-ovn-central:
status: Fix Committed → Fix Released
Revision history for this message
Junien F (axino) wrote :

This doesn't work (on PS5 at least) :

2022-03-21 06:51:10 WARNING unit.ovn-central/0.certificates-relation-changed logger.go:60 Failed to reload ovn-northd.service: Job type reload is not applicable for unit ovn-northd.service.

When this happens, ovn units can't communicate properly, and this doesn't appear to raise any alert.

Changed in charm-ovn-central:
status: Fix Released → Triaged
Revision history for this message
Frode Nordahl (fnordahl) wrote :

This was fixed in OVN proper for versions 21.06 and up [0].

So I guess the proper thing to do would be to fix reload->restart for the charm on the relevant stable branches and remove the workaround for newer charms.

0: https://github.com/ovn-org/ovn/commit/8de4f8005f21014a7ff588bd803460900288d7fd

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (master)

Fix proposed to branch: master
Review: https://review.opendev.org/c/x/charm-ovn-central/+/849108

Changed in charm-ovn-central:
status: Triaged → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/20.03)

Fix proposed to branch: stable/20.03
Review: https://review.opendev.org/c/x/charm-ovn-central/+/849111

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-central (master)

Reviewed: https://review.opendev.org/c/x/charm-ovn-central/+/849108
Committed: https://opendev.org/x/charm-ovn-central/commit/5d5d089cd475ab4968a20d378df42b60931577e8
Submitter: "Zuul (22348)"
Branch: master

commit 5d5d089cd475ab4968a20d378df42b60931577e8
Author: Frode Nordahl <email address hidden>
Date: Fri Jul 8 15:27:01 2022 +0200

    Drop configure_tls northd workaround

    The ovn-northd daemon has gained support for runtime reload of
    certificate data [0] and is now on par with the other OVS/OVN
    daemons.

    Remove the workaround from the charm.

    0: https://github.com/ovn-org/ovn/commit/8de4f8005f210
    Closes-Bug: #1895303
    Change-Id: I7f45b36e03b985ba2d170ead391615f9ef9dad8e

Changed in charm-ovn-central:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ovn-central (stable/20.03)

Reviewed: https://review.opendev.org/c/x/charm-ovn-central/+/849111
Committed: https://opendev.org/x/charm-ovn-central/commit/87fe7cc6aefce1bc0d4dc875c8dc1928bf65af6c
Submitter: "Zuul (22348)"
Branch: stable/20.03

commit 87fe7cc6aefce1bc0d4dc875c8dc1928bf65af6c
Author: Frode Nordahl <email address hidden>
Date: Fri Jul 8 15:30:36 2022 +0200

    [stable/20.03] Fix ovn-northd certificate reload workaround

    The charm currently tries to reload the ovn-northd service, but
    that is unfortunately not implemented by the init/systemd script.

    Restart the service instead.

    Note that this is a stable/20.03-only fix as this was fixed in
    OVN upstream at 21.06 [0].

    0: https://github.com/ovn-org/ovn/commit/8de4f8005f210
    Closes-Bug: #1895303
    Change-Id: I50f2e76a42f7d8f305dda216048c6c5ec62a6c0e

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/22.03)

Fix proposed to branch: stable/22.03
Review: https://review.opendev.org/c/x/charm-ovn-central/+/851381

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/21.09)

Fix proposed to branch: stable/21.09
Review: https://review.opendev.org/c/x/charm-ovn-central/+/851382

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ovn-central (stable/20.12)

Fix proposed to branch: stable/20.12
Review: https://review.opendev.org/c/x/charm-ovn-central/+/851383

To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.