Juju storage-get may report incorrect device with OpenStack provider

Bug #1790728 reported by David Ames
10
This bug affects 2 people
Affects Status Importance Assigned to Milestone
Canonical Juju
Fix Released
High
Harry Pidcock

Bug Description

Juju storage-get is reporting the incorrect device name for a volume mount with the OpenStack provider.

This has been seen on stable, candidate and edge. It is intermittent but consistent enough that it effects OpenStack charm engeineering CI, requiring retries unnecessarily.

Steps to reproduce:

Model snippet:

    ceph-osd:
      charm: ceph-osd
      num_units: 3
      storage:
        osd-devices: cinder,10G

$ openstack volume list
+--------------------------------------+------------------------------+--------+------+------------------------------------------------+
| ID | Name | Status | Size | Attached to |
+--------------------------------------+------------------------------+--------+------+------------------------------------------------+
| 657de65d-22b4-44d3-9347-b1a562525ac0 | juju-58b38a-ceph-fs-volume-1 | in-use | 10 | Attached to juju-58b38a-ceph-fs-5 on /dev/vdd |
| 6004573f-bbd3-4a7a-83b1-934ba782eef2 | juju-58b38a-ceph-fs-volume-2 | in-use | 10 | Attached to juju-58b38a-ceph-fs-6 on /dev/vdd |
| 1d083c8d-1125-46b0-be5c-1fae2b10e3d0 | juju-58b38a-ceph-fs-volume-0 | in-use | 10 | Attached to juju-58b38a-ceph-fs-4 on /dev/vdd |

On a unit where the mapping is disordered. Note in the bellow case vdc is the recently attached volume and vdd is the swap device. But storage-get reports the volume is at /dev/vdd.

root@juju-58b38a-ceph-fs-5:/var/lib/juju/agents/unit-ceph-osd-1/charm# storage-get -s osd-devices/1
kind: block
location: /dev/vdd

root@juju-58b38a-ceph-fs-5:/var/lib/juju/agents/unit-ceph-osd-1/charm# parted -l
Model: Virtio Block Device (virtblk)
Disk /dev/vdd: 4295MB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number Start End Size File system Flags
 1 0.00B 4295MB 4295MB linux-swap(v1)

Model: Virtio Block Device (virtblk)
Disk /dev/vdb: 16.1GB
Sector size (logical/physical): 512B/512B
Partition Table: loop
Disk Flags:

Number Start End Size File system Flags
 1 0.00B 16.1GB 16.1GB ext4

Error: /dev/vdc: unrecognised disk label
Model: Virtio Block Device (virtblk)
Disk /dev/vdc: 10.7GB
Sector size (logical/physical): 512B/512B
Partition Table: unknown
Disk Flags:

Model: Virtio Block Device (virtblk)
Disk /dev/vda: 16.1GB
Sector size (logical/physical): 512B/512B
Partition Table: gpt
Disk Flags:

Number Start End Size File system Name Flags
14 1049kB 5243kB 4194kB bios_grub
15 5243kB 116MB 111MB fat32 boot, esp
 1 116MB 16.1GB 16.0GB ext4

Tags: uosci
Ryan Beisner (1chb1n)
tags: added: uosci
Revision history for this message
John A Meinel (jameinel) wrote :

This seems suspicious:
Error: /dev/vdc: unrecognised disk label

We do want to make sure that we have the right "chain of custody" from what we requested to be provisioned from the underlying provider, and how that actually gets mounted on the machine. Openstack is clearly telling us that it put the storage we requested at /dev/vdd and we're passing that information on to the charm.

Offhand it feels like it would be an Openstack bug if it is telling us "on the instance I just created, you should expect this storage to be available at /dev/XXX" but the storage isn't actually there.

Now if Openstack was just telling us "your device ID is ABCDEF-123-456" and we're looking at mounted device labels and not lining them up correctly, that would be our bug.

But given your "openstack volume list" command above, it does seem relevant.

I wonder if the issue is that somehow the device isn't getting the right label, and thus whatever fstab entry that tells it where it should be mounted isn't matching.

Revision history for this message
John A Meinel (jameinel) wrote :

It seems this is more a bug in Openstack that it is reporting the wrong information to us.

Changed in juju:
importance: Undecided → High
status: New → Incomplete
Revision history for this message
Launchpad Janitor (janitor) wrote :

[Expired for juju because there has been no activity for 60 days.]

Changed in juju:
status: Incomplete → Expired
Revision history for this message
James Page (james-page) wrote :

Re-opening this bug as we're seeing a mismatch between the block device path presented by Juju and the actual block device associated with a cinder volume.

The 'device' attribute of the attachment that is created for a volume in OpenStack is foobar; there is no way that Nova/Libvirt/Qemu and enforce this in the guest (think windows guests) and the block device path will be allocated when the udev events fire for the block device presentation.

That said Nova does make an effort to inject some metadata into the block device name via libvirt/qemu (this only works for virtio devices):

$ sudo udevadm info --query=all --name=/dev/vdc
P: /devices/pci0000:00/0000:00:06.0/virtio3/block/vdc
N: vdc
S: disk/by-id/virtio-abf836a3-0dcf-48d9-9
S: disk/by-path/pci-0000:00:06.0
S: disk/by-path/virtio-pci-0000:00:06.0
S: disk/by-uuid/305ec52e-4e50-4a51-ac35-85dd0a84a232
E: DEVLINKS=/dev/disk/by-uuid/305ec52e-4e50-4a51-ac35-85dd0a84a232 /dev/disk/by-path/pci-0000:00:06.0 /dev/disk/by-path/virtio-pci-0000:00:06.0 /dev/disk/by-id/virtio-abf836a3-0dcf-48d9-9
E: DEVNAME=/dev/vdc
E: DEVPATH=/devices/pci0000:00/0000:00:06.0/virtio3/block/vdc
E: DEVTYPE=disk
E: ID_FS_TYPE=xfs
E: ID_FS_USAGE=filesystem
E: ID_FS_UUID=305ec52e-4e50-4a51-ac35-85dd0a84a232
E: ID_FS_UUID_ENC=305ec52e-4e50-4a51-ac35-85dd0a84a232
E: ID_PATH=pci-0000:00:06.0
E: ID_PATH_TAG=pci-0000_00_06_0
E: ID_SERIAL=abf836a3-0dcf-48d9-9
E: MAJOR=252
E: MINOR=32
E: SUBSYSTEM=block
E: TAGS=:systemd:
E: USEC_INITIALIZED=2059336

$ openstack volume list

| abf836a3-0dcf-48d9-9cb0-ff679665ba8b | tools | in-use | 200 | Attached to jamespage-bastion on /dev/vdc |

The ID_SERIAL of the volume matches the start of the UUID allocated for the volume by cinder.

We think this is the only reliable way to map a cinder volume to the actual block device presented within an OpenStack Cloud.

Changed in juju:
status: Expired → New
Revision history for this message
James Page (james-page) wrote :

Its possible that the block device mapping data presented via the metadata service might be useful:

  http://169.254.169.254/2009-04-04/meta-data/block-device-mapping

but I suspect this is just an internal view of the same attachment information that Juju sees via the Cinder API so is probably not useful.

Revision history for this message
James Page (james-page) wrote :

More useful context for why the device attribute is foobar:

  https://docs.openstack.org/nova/queens/user/block-device-mapping.html

Revision history for this message
James Page (james-page) wrote :

K8S hit the same issue:

  https://github.com/kubernetes/kubernetes/issues/33128

and Cinder bug about this issue:

  https://bugs.launchpad.net/cinder/+bug/1387945

Revision history for this message
James Page (james-page) wrote :

Any chance we can bump priority on this issue? currently this causes a number of issues with automated testing of charm deployments that make use of block devices.

Ian Booth (wallyworld)
Changed in juju:
milestone: none → 2.7-beta1
status: New → Triaged
Harry Pidcock (hpidcock)
Changed in juju:
assignee: nobody → Harry Pidcock (hpidcock)
Revision history for this message
Harry Pidcock (hpidcock) wrote :
Harry Pidcock (hpidcock)
Changed in juju:
status: Triaged → In Progress
Changed in juju:
status: In Progress → Fix Committed
Changed in juju:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.