Comment 3 for bug 1911225

Revision history for this message
Billy Olsen (billy-olsen) wrote :

@modern911 - hmm, so the symptoms are similar here but there was distinct evidence in the previous version and comment #1 that indicates that this was clearly bug 1906280. That was a problem that was introduced by a change in systemd behavior in which the mlock was no longer able to allocate the memory required. It caused the openvswitch-vswitchd process to die and not be able to startup due to limitations within the container.

However, in the data that you've provided is presenting similar symptoms ('ovsdb' relation incomplete) but an altogether different problem. This is quite an unfortunate thing as we will likely see this in other scenarios as well as long as the <relation-name> relation is not considered to be complete.

For this bug, we should spin out a new bug rather than keep this existing bug since the initial data provided with this bug has positively been identified as a duplicate. @modern911 if you could please raise a new bug, this would be extremely helpful.

As far as the data analysis, the single unit of ovn-chassis/2 is waiting on the ovsdb.available flag to be set for the reactive handlers. However, only ovsdb.connected flag is set. On a cursory glance, this indicates that the number of expected remote units are not yet available [0][1][2] which is dependent on the goal state of Juju returning the correct target goal. It appears that this is working on the other ovn-chassis units, but there is something about this particular unit that it is not seeing the right number of units. Unfortunately logging (FFDC) is sparse in this particular path.

[0] - https://opendev.org/x/charm-interface-ovsdb/src/branch/master/src/ovsdb/requires.py#L42
[1] - https://opendev.org/x/charm-interface-ovsdb/src/branch/master/src/lib/ovsdb.py#L150
[2] - https://github.com/juju/charm-helpers/blob/master/charmhelpers/core/hookenv.py#L594