Unit fails to start complaining there are members in the relation
Bug #1910958 reported by
Andrey Grebennikov
This bug affects 2 people
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
Canonical Juju |
Fix Released
|
High
|
Ian Booth |
Bug Description
juju 2.8.7 on focal.
Deployed CDK, later added Ceph units and added relations between them. Something went wrong, the relations were tried to get removed, now all units of kubernetes-master are in the error state with "update-status" hook failed, in the logs it is a recurring message
> ERROR juju.worker.
Attaching a crashdump and the dump_db.
https:/
Changed in juju: | |
assignee: | nobody → Ian Booth (wallyworld) |
status: | Triaged → In Progress |
Changed in juju: | |
status: | In Progress → Fix Committed |
Changed in juju: | |
status: | Fix Committed → Fix Released |
To post a comment you must log in.
The database dump shows that the kubernetes-master/0 unit has recorded some stale relation information. It is currently in relations with these ids:
2, 3, 6, 10, 11, 13, 14, 15, 16
but the unit state claims that these relations are in play
2, 3, 6, 10, 11, 13, 14, 15, 16, 25, 26
The relation ids 25 and 26 were for ceph-mon units 3, 4, 5.
But currently there's only ceph-mon units 9, 10, 11.
So 3, 4, 5 got deleted and the unit agent for kubernetes-master/0 did not get notified to clean up, or if it did, that failed.
The unit agent start up needs to be made more robust so that if it sees relation ids that no longer exist, it purges those from its state without complaining.
The next thing to figure out is how a relation got deleted without the unit agent cleaning up.