pod-spec-set destroys relation data

Bug #1830745 reported by Stuart Bishop on 2019-05-28
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Yang Kelvin Liu

Bug Description

pod-spec-set will cause a new pod to be spun up and a new unit to be created, and the old pod and unit to be torn down. A side effect of this is that all relation data is also lost.

As an example, a k8s charm negotiates database credentials with a PostgreSQL server using interface:pgsql, including specifying the database name to use. The k8s charm calls pod-spec-set with the new pod spec and configuration, and terminates. The pod is spun up and a new unit created, and when the new units hooks are run its relation data is empty, and the database name to use is unset. The PostgreSQL charm at the other end also sees the old unit leave and the new unit join, and seeing that the database name has been unset, revokes access to the specified database and grants access to a default database.

Ideally, a new unit is not created and the existing relation data preserved. Only the pod would change, and relations updated with new IP network data. The charm would not need to renegotiate relations every time the config changes and a new pod spun up. Loops would be avoided, where renegotiating the relation gives different results, requiring a new pod to be spun up with the different results as config, triggering the relation data to be lost, requiring the relations to be renegotiated.

Stuart Bishop (stub) on 2019-05-28
tags: added: canonical-is
Changed in juju:
assignee: nobody → Ian Booth (wallyworld)
Ian Booth (wallyworld) wrote :

See the discussion here


The issue is that for stateless deployments, k8s itself treats pods as disposable and that interferes with Juju's more stateful assumptions about unit behaviour.

The TL:DR; is that:

- charms now have a capability to specifically request a stateful deployment if they care about preserving relation data, see "Charms Can Specify Deployment Type"

- in juju 2.7, we are moving to a support a more application centric view and adding an application level relation databag. This is obviously managed independently of any unit lifecycle and so units can come and go and the data is preserved.

Changed in juju:
assignee: Ian Booth (wallyworld) → nobody
Ian Booth (wallyworld) wrote :

Marking as "Won't Fix" simply because we have an approach for now that gets around the issue and we have plans to develop features in 2.7 that make the issue moot.
Feel free to reopen if there's an issue with this.

Changed in juju:
status: New → Won't Fix
Stuart Bishop (stub) wrote :

In both stateful and stateless deployments, the relation data is is lost, so there does not appear to be a workaround (beyond rewriting charms and interfaces and redesigning the relation protocols to support the different k8s and traditional charm lifecycles).

The Juju 2.7 updates will certainly help, but making use of the feature will require rewriting charms and interfaces and redesigning relation protocols and updating existing production deployments (which isn't as bad as it sounds, given this already needs to happen to support cross model relations, which is required to relate k8s charms to traditional charms).

Changed in juju:
status: Won't Fix → New
Stuart Bishop (stub) wrote :

Its worth noting that charms.reactive state is preserved when a new pod is spun up, but the relation data is lost. Some of the problems I'm seeing are because the charms.reactive state gets out of sync with the relations when the relation data is lost.

Ian Booth (wallyworld) on 2019-05-30
Changed in juju:
milestone: none → 2.6.4
importance: Undecided → High
status: New → Triaged
Changed in juju:
importance: High → Medium
importance: Medium → High
Ian Booth (wallyworld) on 2019-05-30
Changed in juju:
assignee: nobody → Yang Kelvin Liu (kelvin.liu)
status: Triaged → In Progress
Ian Booth (wallyworld) wrote :

Juju is supposed to retain relation data for unit/pods managed by a StatefulSet, and it mostly does.
The root cause here appears to be a race condition where Juju polls the k8s cluster and gets 0 pods reported running as a result of a scale change. This causes Juju to delete the unit(s) in state and so when the pods are reported as running a short time later, a new unit is created and relation data is reset. There's a simple fix on the Juju side to account for this issue.

Ian Booth (wallyworld) on 2019-06-07
Changed in juju:
status: In Progress → Fix Committed
Joel Sing (jsing) on 2019-06-17
tags: added: juju-k8s
Joel Sing (jsing) on 2019-06-17
tags: added: k8s
removed: juju-k8s
Yang Kelvin Liu (kelvin.liu) wrote :

https://github.com/juju/juju/pull/10270 will be released in 2.6.4 to fix this issue

Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  Edit
Everyone can see this information.

Other bug subscribers