syslog-relation-departed hook fails if self.get_local("path") returns None

Bug #1917818 reported by Paul Goins
8
This bug affects 1 person
Affects Status Importance Assigned to Milestone
PostgreSQL Charm
Fix Released
Undecided
Unassigned

Bug Description

I have an environment where I have a relation between the postgresql and rsyslog-forwarder-ha charms via the syslog interface. rsyslog-forwarder-ha is running as a subordinate of a different app.

Upon removing the unit of the other app, and in turn its rsyslog-forwarder-ha subordinate, the charm hits this error:

2021-03-04 19:19:39 ERROR juju-log syslog:31: Hook error:
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/.venv/lib/python3.6/site-packages/charms/reactive/__init__.py", line 74, in main
    bus.dispatch(restricted=restricted_mode)
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 379, in dispatch
    _invoke(hook_handlers)
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 359, in _invoke
    handler.invoke()
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/.venv/lib/python3.6/site-packages/charms/reactive/bus.py", line 181, in invoke
    self._action(*args)
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/charm/hooks/relations/syslog/provides.py", line 33, in departed
    if os.path.exists(path):
  File "/var/lib/juju/agents/unit-landscape-postgresql-0/.venv/lib/python3.6/genericpath.py", line 19, in exists
    os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not NoneType

Upon examining the .unit-state.db file, when running the query "select key from kv where key like '%local-data%';", I got zero rows back. charms.reactive.relations.Conversation.get_local() and set_local() use the local-data substring during lookups and writes. In other words: there's no apparent use of set_local(), yet the code tries to do a get_local(). (Grepping also confirms no use of set_local() as well.)

This seems to be an older bit of code, from December 2015, and I don't understand the context of what it is trying to do here - but as far as I can see, in current versions of this charm, it's going to fail - get_local() will return None as a default, and that'll get passed into os.path.exists(), etc., resulting in the error.

Related branches

Revision history for this message
Paul Goins (vultaire) wrote :

My best guess as to a workaround is as follows, based on the code and what it seems to be doing, is:

* "fake" the depart hook:
  * Identify the remote unit ID.
  * SSH into the blocked unit.
  * Remove the associated file from /etc/rsyslog.d/. (e.g. /etc/rsyslog.d/juju-landscape-postgresql_1-rsyslog-forwarder-ha_12.conf)
  * Restart rsyslog.
  * CD to the charm directory. (e.g. /var/lib/juju/agents/unit-landscape-postgresql-1/charm/)
  * Install sqlite3.
  * Via an sqlite3 shell (sqlite3 .unit-state.db):
    * Dump all the keys for conversation records: select key from kv where key like '%conversation%'
    * Print the data for the associated record, e.g.: select data from kv where key='reactive.conversations.syslog:31.rsyslog-forwarder-ha/12';
      * If there's only one unit in the "units" list from the previous command, delete the record, e.g.: delete from kv where key='reactive.conversations.syslog:31.rsyslog-forwarder-ha/12';
      * Else, if more than one unit, manually update the record to remove the unit. (I did not see this, so no example here.)
* Finally, tell Juju to skip retrying the hook: juju resolved --no-retry <unit-blocked-by-this-bug>

Note: I have not tried this yet, although I am considering doing this tomorrow; it seems like it's likely the right thing...

Revision history for this message
Paul Goins (vultaire) wrote :

For additional context re: my specific case and why I'm considering doing the above:

* I have a landscape deployment where I'm trying to redeploy the landscape-server units.
* The landscape-server units have rsyslog-forwarder-ha as a subordinate, related to landscape-postgresql via the syslog relation.
* Upon removing a landscape-server unit, the landscape-postgresql units are hitting this bug because of hitting the syslog-relation-departed hook, triggered by the rsyslog-forwarder-ha subordinate on the landscape-server unit being remvoed.

description: updated
Revision history for this message
Drew Freiberger (afreiberger) wrote :

This environment has an incorrectly deployed relation:

landscape-postgresql:syslog rsyslog-forwarder-ha:syslog syslog regular

Should be:

landscape-postgresql:syslog rsyslog-forwarder-ha:juju-info juju-info subordinate

You're going to have to do some resolved --no-retries or redeploy that postgres cluster to fix the charm status implications.

Revision history for this message
Paul Goins (vultaire) wrote :

I've reviewed this together with Drew, and indeed for my case, things are not deployed correctly. We've worked out a fix for our case.

Nevertheless, I do suspect a bug does exist here, in the case of a peer which actually should be using the syslog relation of this charm, for the reasons previously explained. Thus, I'm not going to mark as invalid; I think it's still worth looking into. (Again, I see where we call local_get() in this charm, but I see no corresponding local_set()'s, thus it seems like this functionality may be legitimately broken - or I'm just missing something from my code review, which is also totally possible.)

Stuart Bishop (stub)
Changed in postgresql-charm:
status: New → Triaged
Revision history for this message
Stuart Bishop (stub) wrote :
Stuart Bishop (stub)
Changed in postgresql-charm:
status: Triaged → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.