[Azure] Storage may be added with wrong device path and breaks charm

Bug #1936752 reported by Pedro Guimarães
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Ian Booth

Bug Description

Juju 2.9.8

Generally, storage path that is passed to charms follows the pattern:

/dev/disk/azure/scsi1/lun%d

As described on: https://github.com/juju/juju/blob/8c18538dac15ecb56c8baaa31104cf8d99c8e760/provider/azure/storage.go#L403

However, some times the path that comes up is:

# storage-get -s data/0
kind: block
location: /dev/disk/by-uuid/7flWaV-Rxiv-mGF6-p9sk-sEe1-mwnl-7zjOd3

Which is non-existent.
Path of available volumes and their symlinks on the same unit: https://pastebin.ubuntu.com/p/hgK4NCNmm9/

Revision history for this message
Ian Booth (wallyworld) wrote (last edit ):

I tried a simple experiment to reproduce this. I had a simple demo charm with storage

storage:
    data:
        type: block
        multiple:
          range: 1+

In the data-storage-attached and data-storage-detaching hook I logged the output of "storage-get" to a file.

After deploying the charm, "juju storage --format yaml" and the storage location made available to the attached hook via storage-get both had "/dev/disk/azure/scsi1/lun1".

Then I "juju detach-storage data/1" and the detaching hook was also supplied the expected location "/dev/disk/azure/scsi1/lun1".

The bug says that "sometimes" the wrong path is used. How often does this happen? In such cases, what does "juju storage --format yaml" show before the storage is detached? Is it possible to get the storage, volume and volume attachment collection data from the "juju dump-db" output?
Is there a particular charm that the issues occurs for regularly?

Revision history for this message
Pedro Guimarães (pguimaraes) wrote :

Hi @wallyworld,

I know it is a bit vague, but it happens some times. I can deploy several times in a day and not see it happening, or see it once.

I've collected the logs you've requested for one of the broken runs: https://drive.google.com/drive/folders/1II5SCglZahH-UE3UKUYLCysLl4OV7NT4

That run corresponds to this bundle: https://pastebin.canonical.com/p/dJFb2JGMsm/
The logs .0, .1 ... .3 corresponds to the 4x minio units, respectively.

In that specific run, all 4x units broke with the same issue.

Whenever that happens, the device (e.g. /dev/disk/by-uuid/7flWaV-Rxiv-mGF6-p9sk-sEe1-mwnl-7zjOd3) will not be a symlink

The only exception I've noticed was on: LP #1936876
where I saw that same naming by-uuid but it resolved to an actual device.

Revision history for this message
Ian Booth (wallyworld) wrote :

The minio charm in the bundle is a local charm - we need the charm source to try and reproduce.

Also, we need the config of the "juju-lma-storage-pool" used in the bundle.

Revision history for this message
Ian Booth (wallyworld) wrote :

To gather all the required info on what block devices there are and how they are mounted, juju runs these commands:

lsblk -b -P -o KNAME,SIZE,LABEL,UUID,FSTYPE,TYPE,MOUNTPOINT,MAJ:MIN

and then for each block device in that list

udevadm info -q property --name <devicename>

It would be good to get the result of the lsblk command above, plus for the /dev/sdc or whatever the block device is for the Azure volume created for the charm storage (and any sdc1 or sdc2 etc devices). Note it may not be sdc; use whatever Azure mounts it as.

Also turn on juju.apiserver.storagecommon=TRACE before deploying the charm.

There's a couple of possible code paths that could explain the sometimes wrong behaviour. One is that we are using the wrong block device. The only way Juju would tell a charm that the location of a block storage is /dev/disk/by-uuid/xxxx is if the block device record Juju has contains a UUID. But having a UUID is only expected for devices with are (for example) partitions, not straight up block devices.

As well as the above, can we get a "juju dump-db" with the content of the blockdevices, volumes, volumeattachment collections.

I've tried several times to reproduce without luck.

Revision history for this message
Ian Booth (wallyworld) wrote :

The only way I can see the issue happening is if a partition is created on the block device outside of juju. I've done a small PR to account for that, it would be good to test it on the customer test bed.

https://github.com/juju/juju/pull/13201

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.9.10
assignee: nobody → Ian Booth (wallyworld)
importance: Undecided → High
status: New → In Progress
Revision history for this message
Ian Booth (wallyworld) wrote :

Marking as fix committed but we can revisit if needed

Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.