Deleted k8s pods remain stuck in hook failed: "db-relation-broken"
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Triaged
|
High
|
Unassigned |
Bug Description
On a kubernetes environment, running ch:discourse-k8s charm with cross model relations, juju seems to have trouble clearing a unit when its pod gets deleted/replaced (like after a juju config or juju refresh).
Logs from discourse-
2022-02-14 08:23:47 ERROR juju.worker.
2022-02-14 08:24:43 INFO juju.worker.
2022-02-14 08:24:51 ERROR juju-log db:0: Uncaught exception while in charm code:
Traceback (most recent call last):
File "/var/lib/
result = run(args, **kwargs)
File "/usr/lib/
raise CalledProcessEr
subprocess.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./src/charm.py", line 421, in <module>
main(
File "/var/lib/
_emit_
File "/var/lib/
event_
File "/var/lib/
framework.
File "/var/lib/
self.
File "/var/lib/
custom_
File "/var/lib/
self.
File "/var/lib/
framework.
File "/var/lib/
self.
File "/var/lib/
custom_
File "./src/charm.py", line 402, in on_database_changed
if event.master is None:
File "/var/lib/
conn_str = _master(self.log, self.relation, self._local_unit)
File "/var/lib/
conn_str = reldata.
File "/usr/lib/
return self[key]
File "/var/lib/
return self._data[key]
File "/var/lib/
data = self._lazy_data = self._load()
File "/var/lib/
return self._backend.
File "/var/lib/
return self._run(*args, return_output=True, use_json=True)
File "/var/lib/
raise ModelError(
ops.model.
2022-02-14 08:24:52 ERROR juju.worker.
Unit can be manually cleared with: juju resolve --no-retry discourse/11
We saw lingering units like these on various juju controllers, versions 2.9.21, 2.9.18. Afaik it also happened on version 2.8.9, but I can't find one at this moment.
This bug might be related to https:/
Thank you,
Loïc
Changed in juju: | |
milestone: | none → 2.9.28 |
importance: | Undecided → High |
status: | New → Triaged |
Changed in juju: | |
milestone: | 2.9.28 → 2.9.29 |
Changed in juju: | |
milestone: | 2.9.29 → 2.9.30 |
Changed in juju: | |
status: | In Progress → Incomplete |
milestone: | 2.9.30 → none |
Changed in juju: | |
assignee: | Yang Kelvin Liu (kelvin.liu) → nobody |
Can you include the output of juju status --format yaml to show what units juju thinks should exist in the model?
You are right there have been issues in the past where a pod upgrade confused juju. There have been some fixes but we may need to revisit it.