Juju doesn't mount storage after lxd container restart

Bug #1999758 reported by Marcelo Henrique Neppel
14
This bug affects 3 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Simon Richardson
3.0
Fix Released
High
Unassigned
3.1
Fix Committed
High
Unassigned

Bug Description

If we deploy an operator framework charm that has a storage defined on metadata.yaml, like the edge version of the PostgreSQL charm (https://charmhub.io/postgresql?channel=edge) we have the storage mounted and available if we check with lsblk.

But if the host machine (the one where lxd and the Juju agent are installed and running the lxd containers on top of it) is restarted or if restart one of the unit containers manually with lxc restart, sometimes the storage is not mounted again (sometimes it is mounted).

To reproduce this issue, you can do the following in a fresh Juju model:

juju deploy postgresql --channel edge
juju model-config update-status-hook-interval=10s # to speedup the deployment
juju ssh postgresql/0
lsblk # check that /var/lib/postgresql/data is shown in the list

Then you restart the host machine or the unit container (you may need to do it multiple times, as the issue happened in a non predictable way):

lxc restart juju-machine-id
juju ssh postgresql/0
lsblk # and check that /var/lib/postgresql/data is not show (or restart the container again and check it again)

Version used: Ubuntu 22.04, Juju 2.9.37 and lxd 5.9

Revision history for this message
Marcelo Henrique Neppel (neppel) wrote (last edit ):

One more detail that may be helpful:

I noticed that at the times the issue happens (the storage is not mounted after a restart) the ctx.filesystems map is empty (the filesystem of the storage is missing there) and then https://github.com/juju/juju/blob/juju-2.9.37/worker/storageprovisioner/storageprovisioner.go#L339 is called (which doesn't mount the storage as the filesystem is missing). The other times, ctx.filesystems is populated with the storage that is needed, so the storage is mounted.

summary: - Juju doesn't mount storage after lxd container after restart
+ Juju doesn't mount storage after lxd container restart
Revision history for this message
Paulo Machado (paulomachado) wrote :

Observed same behavior with lxd 5.0.1 on jammy and juju 2.9.37

Changed in juju:
importance: Undecided → High
status: New → Triaged
assignee: nobody → Simon Richardson (simonrichardson)
milestone: none → 2.9.39
Revision history for this message
Simon Richardson (simonrichardson) wrote :

Digging into this, it seems like there is sequencing problem when enqueuing onto the scheduler. The code expects that a filesystem change always precedes a filesystem attachment change[1]. In normally day operations this is the case, but for a hard restart, this isn't always the case.

Unfortunately, case statements in a select are shuffled upon every entry. When we receive all the information at once, when the restart has completed there is no guarantee of the correct order[2]. For example in most cases the ordering is valid:

    filesystems changed: []string{"0/0"}
    filesystem attachments changed: []watcher.MachineStorageId{watcher.MachineStorageId{MachineTag:"machine-0", AttachmentTag:"filesystem-0-0"}}

For times when a attachment is not successful, the attachment comes before the change:

    filesystem attachments changed: []watcher.MachineStorageId{watcher.MachineStorageId{MachineTag:"machine-0", AttachmentTag:"filesystem-0-0"}}
    filesystems changed: []string{"0/0"}

The solution is non-obvious, as you can't always guarantee that the proceeding change follows an attachment.

 1. https://github.com/juju/juju/blob/2.9/worker/storageprovisioner/filesystem_events.go#L165
 2. https://github.com/juju/juju/blob/2.9/worker/storageprovisioner/storageprovisioner.go#L304-L358

Changed in juju:
milestone: 2.9.39 → 2.9.40
Changed in juju:
status: Triaged → Fix Committed
Changed in juju:
milestone: 2.9.40 → 2.9.42
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.