vaultlocker does not ensure that udev is triggered to create /dev/disk/by-uuid/<uuid-in-luks-header> symlink and fails

Bug #1780332 reported by Dmitrii Shcherbakov
12
This bug affects 1 person
Affects Status Importance Assigned to Milestone
vaultlocker
Fix Released
Critical
James Page
cryptsetup (Ubuntu)
Incomplete
Undecided
Unassigned
lvm2 (Ubuntu)
Incomplete
Undecided
Unassigned
systemd (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

When an encrypted device is setup up a UUID (osd_fsid) is passed from the charm to be used in the cryptsetup command which accepts a UUID to place into the LUKS header (shown in cryptsetup luksDump <path-to-block-device>).

https://github.com/openstack/charm-ceph-osd/blob/stable/18.05/lib/ceph/utils.py#L1788-L1804
UUID comes from osd_fsid

https://github.com/openstack-charmers/vaultlocker/blob/8c9cb85dc3ed5dbf18c66a810d189a5230d85c34/vaultlocker/shell.py#L69-L80
# else statement is used here
     block_uuid = str(uuid.uuid4()) if not args.uuid else args.uuid

     dmcrypt.luks_format(key, block_device, block_uuid) # creates a LUKS header
# ...
     dmcrypt.luks_open(key, block_uuid) # sets up a device with device mapper decrypting it via dmcrypt

https://github.com/openstack-charmers/vaultlocker/blob/d813233179bdf2eec8ed101c702a8e552a966f44/vaultlocker/dmcrypt.py#L44-L56

This UUID is visible in blkid output

/dev/sdc: UUID="<luks-header-uuid>" TYPE="crypto_LUKS"

and a udev rule exists to create a /dev/disk/by-uuid/<luks-header-uuid> symlink (which is normally used for filesystem -> block device resolution)

https://git.launchpad.net/~usd-import-team/ubuntu/+source/lvm2/tree/udev/13-dm-disk.rules.in#n25
ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"

Where vaultlocker fails is in luks_open command (right after luks_format)

 # cryptsetup --batch-mode --key-file - open UUID=<luks-header-uuid> crypt-<luks-header-uuid> --type luks

because it tries to access /dev/disk/by-uuid/<luks-header-uuid> which does not exist.

This happens since udev rules are not re-triggered to create this symlink after a LUKS device is created.

Solution: call the command below after luks_format before luks_open

udevadm settle --exit-if-exists=/dev/disk/by-uuid/<luks-header-uuid-equal-to-osd-fsid>

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

Subscribed ~field-critical as this blocks a deployment in-progress.

The bug has been discussed in field-support (status added for awareness/filtering).

David Ames (thedac)
Changed in vaultlocker:
status: New → Confirmed
assignee: nobody → David Ames (thedac)
Revision history for this message
David Ames (thedac) wrote :

This is my proposed fix. I'd like several people to check it.

https://github.com/openstack-charmers/vaultlocker/pull/1

Changed in vaultlocker:
status: Confirmed → In Progress
Revision history for this message
Ashley Lai (alai) wrote :

Deployed with the fix and got error below from vault.

juju status:
vault/0 error idle 5/lxd/5 10.216.5.56 8200/tcp hook failed: "update-status"

juju log:
2018-07-06 00:05:39 INFO juju-log Invoking reactive handler: reactive/vault_handlers.py:482:prime_assess_status
2018-07-06 00:05:39 DEBUG update-status active
2018-07-06 00:05:39 DEBUG update-status Traceback (most recent call last):
2018-07-06 00:05:39 DEBUG update-status File "/var/lib/juju/agents/unit-vault-0/charm/hooks/update-status", line 19, in <module>
2018-07-06 00:05:39 DEBUG update-status main()
2018-07-06 00:05:39 DEBUG update-status File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.5/site-packages/charms/reactive/__init__.py", line 82, in main
2018-07-06 00:05:39 DEBUG update-status hookenv._run_atexit()
2018-07-06 00:05:39 DEBUG update-status File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.5/site-packages/charmhelpers/core/hookenv.py", line 1128, in _run_atexit
2018-07-06 00:05:39 DEBUG update-status callback(*args, **kwargs)
2018-07-06 00:05:39 DEBUG update-status File "/var/lib/juju/agents/unit-vault-0/charm/reactive/vault_handlers.py", line 585, in _assess_status
2018-07-06 00:05:39 DEBUG update-status application_version_set(health.get('version'))
2018-07-06 00:05:39 DEBUG update-status File "/var/lib/juju/agents/unit-vault-0/.venv/lib/python3.5/site-packages/charmhelpers/core/hookenv.py", line 970, in application_version_set
2018-07-06 00:05:39 DEBUG update-status subprocess.check_call(cmd)
2018-07-06 00:05:39 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 576, in check_call
2018-07-06 00:05:39 DEBUG update-status retcode = call(*popenargs, **kwargs)
2018-07-06 00:05:39 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 557, in call
2018-07-06 00:05:39 DEBUG update-status with Popen(*popenargs, **kwargs) as p:
2018-07-06 00:05:39 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 947, in __init__
2018-07-06 00:05:39 DEBUG update-status restore_signals, start_new_session)
2018-07-06 00:05:39 DEBUG update-status File "/usr/lib/python3.5/subprocess.py", line 1490, in _execute_child
2018-07-06 00:05:39 DEBUG update-status restore_signals, start_new_session, preexec_fn)
2018-07-06 00:05:39 DEBUG update-status TypeError: Can't convert 'NoneType' object to str implicitly
2018-07-06 00:05:39 ERROR juju.worker.uniter.operation runhook.go:113 hook "update-status" failed: exit status 1

Revision history for this message
Ashley Lai (alai) wrote :
Download full text (8.1 KiB)

With the error in vault, we initialized vault and unsealed. vault showed it is in ready status. The error from juju log show uuid is d045f2c8-b705-4113-9291-2ef203600fb0 but this uuid does not exist. Where does the charm gets d045f2c8-b705-4113-9291-2ef203600fb0 from?

It seems that with thedac's proposed fix, it did have the correct symlink for uuid 4c151bc2-8ad8-4c18-b4e8-58b22a66a6b2 but when the charm ran, it used uuid d045f2c8-b705-4113-9291-2ef203600fb0.

/etc/udev/rules.d# cat bcache1.rules
SUBSYSTEM=="block", ACTION=="add|change", ENV{CACHED_UUID}=="4c151bc2-8ad8-4c18-b4e8-58b22a66a6b2", SYMLINK+="disk/by-dname/bcache1"

hook failed: "secrets-storage-relation-changed"

2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed vaultlocker: Command '['cryptsetup', '--batch-mode', '--key-file', '-', 'open', 'UUID=d045f2c8-b705-4113-9291-2ef203600fb0', 'crypt-d045f2c8-b705-4113-9291-2ef203600fb0', '--type', 'luks']' returned non-zero exit status 4
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed Traceback (most recent call last):
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-nova-compute-kvm-5/charm/hooks/secrets-storage-relation-changed", line 579, in <module>
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed main()
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-nova-compute-kvm-5/charm/hooks/secrets-storage-relation-changed", line 572, in main
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed hooks.execute(sys.argv)
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-nova-compute-kvm-5/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed self._hooks[hook_name]()
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-nova-compute-kvm-5/charm/hooks/secrets-storage-relation-changed", line 556, in secrets_storage_changed
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed configure_local_ephemeral_storage()
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/var/lib/juju/agents/unit-nova-compute-kvm-5/charm/hooks/nova_compute_utils.py", line 883, in configure_local_ephemeral_storage
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed dev])
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed raise CalledProcessError(retcode, cmd)
2018-07-06 07:54:37 DEBUG secrets-storage-relation-changed subprocess.CalledProcessError: Command '['vaultlocker', 'encrypt', '--uuid', 'd045f2c8-b705-4113-9291-2ef203600fb0', '/dev/disk/by-dname/bcache1']' returned non-zero exit status 1

===================

# vaultlocker encrypt --uuid d045f2c8-b705-4113-9291-2ef203600fb0 /dev/disk/by-dname/bcache1
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): 10.216.2.23
DEBUG:urllib3.connectionpool:http://10.216.2.23:8200 "POST /v1/auth/approle/login HTTP/1.1" 200 437
INFO:vaultlocker.dmcrypt:LUKS formatti...

Read more...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

osd_fsid gets generated in the ceph lib called from the ceph-osd charm. It's just a version 4 (randomized) uuid - nothing special.

_ceph_volume -> fsid UUID generated -> _allocate_logical_volume -> vaultlocker -> LUKS header

lib/ceph/utils.py:
def _ceph_volume(dev, osd_journal, encrypt=False, bluestore=False,
                 key_manager=CEPH_KEY_MANAGER):
    """
    Prepare and activate a device for usage as a Ceph OSD using ceph-volume.

    # ...
    osd_fsid = str(uuid.uuid4())
    cmd.append('--osd-fsid')
    cmd.append(osd_fsid)
    # ...

    # calls to _allocate_logical_volume

def _allocate_logical_volume(dev, lv_type, osd_fsid,
                             size=None, shared=False,
                             encrypt=False,
                             key_manager=CEPH_KEY_MANAGER):
    """
    Allocate a logical volume from a block device, ensuring any
    required initialization and setup of PV's and VG's to support
    the LV.
    # ...

    lv_name = "osd-{}-{}".format(lv_type, osd_fsid)
    current_volumes = lvm.list_logical_volumes()
    if shared:
        dev_uuid = str(uuid.uuid4())
    else:
        dev_uuid = osd_fsid # the uuid is reused here
    pv_dev = _initialize_disk(dev, dev_uuid, encrypt, key_manager) # calls vaultlocker

Revision history for this message
Liam Young (gnuoy) wrote :

dmitriis, fwiw alai's stackstrace is not from ceph, its from nova-compute. Your point still stands, its a random uuid.

Revision history for this message
Ashley Lai (alai) wrote :

With the redeployment, uuid d045f2c8-b705-4113-9291-2ef203600fb0 does not show in blkid.

# blkid
/dev/sdg: UUID="b454f3b4-b17d-4ba8-b13a-001b9aedfbbe" TYPE="bcache"
/dev/sdc: UUID="8c7cb4cb-0e29-44e3-87ab-339c64400b9c" TYPE="bcache"
/dev/sda1: UUID="05EC-7CF7" TYPE="vfat" PARTUUID="a564aba8-cb1d-4097-bc2d-88dad4d8b581"
/dev/sda2: UUID="b1940fc7-80aa-49cc-b528-d46dc05fe7b5" TYPE="ext4" PARTUUID="788adb5d-a99c-46ba-ac69-e9236f40379a"
/dev/sda3: UUID="fd6432a5-e436-450c-a240-3098024351d5" TYPE="bcache" PARTUUID="8ced2496-b16e-442a-81f6-771f7e1da1dd"
/dev/sda4: UUID="4c151bc2-8ad8-4c18-b4e8-58b22a66a6b2" TYPE="bcache" PARTUUID="5a15df04-9bcc-4ff4-9060-eecec0d09a44"
/dev/sdd: UUID="58ca75fe-0bf2-4543-9a7e-151414193814" TYPE="bcache"
/dev/sdf: UUID="2d8a4580-df8a-4e79-9189-8931f66dc978" TYPE="bcache"
/dev/sde: UUID="cd955f04-0cbe-4f66-8ba9-198d3058f098" TYPE="bcache"
/dev/sdb: UUID="fac76349-eab1-4574-9fda-455126075b7f" TYPE="bcache"
/dev/nvme0n1p1: UUID="38be3b4b-90dd-4bc3-8030-668042b4d572" TYPE="bcache" PARTUUID="4b370214-32eb-4b86-bb8e-b3b99f78ac8e"
/dev/nvme0n1p2: UUID="576c447b-1e76-438a-8158-0b316e4f5c12" TYPE="bcache" PARTUUID="07a37bb1-a994-4266-8be3-e25826c1e01f"
/dev/bcache7: UUID="d0fbbf40-5a04-4152-9937-a2164c95bb8e" TYPE="ext4"
/dev/nvme0n1: PTUUID="691a9c94-0796-4b88-a8fd-340b9fbb4674" PTTYPE="gpt"
/dev/nvme0n1p3: PARTUUID="053ee5c6-3a8c-4fd4-bcec-a8411d931b86"
/dev/bcache6: UUID="bf5fc98c-a32f-4d08-8f0c-99af2569f247" TYPE="crypto_LUKS"
/dev/bcache5: UUID="fd832a6c-aa3c-4542-bb3d-e8faede5b3d0" TYPE="crypto_LUKS"

Revision history for this message
David Ames (thedac) wrote :

From the most recent deployment we see the settle runs but is insufficient:

The format completes, the settle completes but open fails due to a missing symlink.

From nova-compute-kvm/0

2018-07-06 18:17:55 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:LUKS formatting /dev/disk/by-dname/bcache1 using UUID:74d5e66b-c3bc-4380-a0d3-eb0878827df9
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:udevadm settle /dev/disk/by-uuid/74d5e66b-c3bc-4380-a0d3-eb0878827df9
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:http://10.216.2.23:8200 "PUT /v1/charm-vaultlocker/DCS1-CLP-NOD7/74d5e66b-c3bc-4380-a0d3-eb0878827df9 HT
TP/1.1" 204 0
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed DEBUG:urllib3.connectionpool:http://10.216.2.23:8200 "GET /v1/charm-vaultlocker/DCS1-CLP-NOD7/74d5e66b-c3bc-4380-a0d3-eb0878827df9 HT
TP/1.1" 200 866
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed INFO:vaultlocker.dmcrypt:LUKS opening 74d5e66b-c3bc-4380-a0d3-eb0878827df9
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed Device /dev/disk/by-uuid/74d5e66b-c3bc-4380-a0d3-eb0878827df9 doesn't exist or access denied.
2018-07-06 18:17:59 DEBUG secrets-storage-relation-changed vaultlocker: Command '['cryptsetup', '--batch-mode', '--key-file', '-', 'open', 'UUID=74d5e66b-c3bc-4380-a0d3-eb0878827df9', 'crypt-7
4d5e66b-c3bc-4380-a0d3-eb0878827df9', '--type', 'luks']' returned non-zero exit status 4

I spent most of my afternoon trying to reproduce in our lab. This still feels like a race but I don't have a solution or proof yet.

James Page (james-page)
Changed in vaultlocker:
status: In Progress → Fix Released
Revision history for this message
James Page (james-page) wrote :

OK I hacked in a test of:

    cmd = [
        'udevadm', 'trigger',
        '--subsystem-match=block', '--action=add'
    ]

before the settle, and the by-uuid device does appear. I'm still not sure why that does not happen automatically but that is a definite fix we can apply in vaultlocker.

James Page (james-page)
Changed in vaultlocker:
status: Fix Released → In Progress
assignee: David Ames (thedac) → James Page (james-page)
importance: Undecided → Critical
James Page (james-page)
Changed in vaultlocker:
status: In Progress → Fix Released
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Marking the cryptsetup task as invalid because this is handled entirely in vaultlocker

Changed in cryptsetup (Ubuntu):
status: New → Invalid
Revision history for this message
Ryan Beisner (1chb1n) wrote :

In discussion with James, we still want cryptsetup triage on this.

Changed in cryptsetup (Ubuntu):
status: Invalid → New
Revision history for this message
Chris MacNaughton (chris.macnaughton) wrote :

Incorrectly marked it invalid, as there is a bug in cryptsetup NOT creating the block symlink

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

I'm not sure what vaultlocker is.

trigger might be appropriate together with a '--settle' flag, if/where available.

instead of manually opening things, i'd expect crypttab to be adjusted with `systemctl daemon-reload` re-run to regenerated systemd-cryptsetup@ units and "open" the encrytped device using systemd-cryptsetup@ instance unit.... However that might be "too" systemd specific for vaultcrypt upstream...

If one injects uuid onto a drive, it might not generate any kernel udev events for udevd to react to and create/update symlinks.

It would be nice to have an interleaved log of udev events from udevadm monitor, to see if there are any events emitted after "format" action is done. If there are none, then vaultlocker needs to be fixed to trigger things.

Changed in cryptsetup (Ubuntu):
status: New → Incomplete
Changed in systemd (Ubuntu):
status: New → Incomplete
tags: added: id-5b9a8473a5864f4a45e1c7b6
Revision history for this message
James Page (james-page) wrote :

Dimitri

vaultlocker is a helper tool for managing dmcrypt/LUKS keys in Vault.

vaultlocker provides commands to format a block devices using an encryption key, and to unlock a block device previously encrypted by vaultlocker (along with some systemd unit configurations to perform this action on reboots).

Currently vaultlocker performs targetted rescan and settle udevadm operations post luksFormat to wait for appropriate /dev entries to be created; I consider this a workaround as it would be generally nicer if this was actually performed as part of the luksFormat operation in cryptsetup - once the format operation completes, block devices would be ready for use without any need to udevadm trigger/settle in other tooling.

Revision history for this message
James Page (james-page) wrote :

tl;dr you don't need vaultlocker to reproduce this issue :-)

Changed in cryptsetup (Ubuntu):
status: Incomplete → New
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

As a hint on reproducing it, it may be a problem on xenial only (which we used for that deployment) for clean devices.

On a bionic VM it seems like the symlink is created but it's not clear if luksFormat returns before that symlink gets created or after - this is the important part because the automation tries to use that symlink right after `cryptsetup luksFormat <dev>` exits.

tree /sys/class/block
https://paste.ubuntu.com/p/vhjjzdytH7/

uname -a
Linux maas-vhost6 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

ubuntu@maas-vhost6:~$ tree /dev/disk/by-uuid/
/dev/disk/by-uuid/
└── d26a75c9-15f7-41de-8c0e-20f795ed5729 -> ../../sda1

0 directories, 1 file
ubuntu@maas-vhost6:~$ lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 64G 0 disk
└─sda1 8:1 0 64G 0 part /
sdb 8:16 0 8G 0 disk
sdc 8:32 0 102.4M 0 disk
sdd 8:48 0 102.4M 0 disk
sde 8:64 0 102.4M 0 disk
vda 252:0 0 102.4M 0 disk
vdb 252:16 0 102.4M 0 disk
nvme0n1 259:0 0 20G 0 disk
nvme1n1 259:1 0 20G 0 disk
ubuntu@maas-vhost6:~$ sudo cryptsetup luksFormat /dev/sdb

WARNING!
========
This will overwrite data on /dev/sdb irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/sdb:
Verify passphrase:

ubuntu@maas-vhost6:~$ tree /dev/disk/by-uuid/
/dev/disk/by-uuid/
├── a82ddfe0-7de8-4c6c-aaca-a074f000b746 -> ../../sdb
└── d26a75c9-15f7-41de-8c0e-20f795ed5729 -> ../../sda1

0 directories, 2 files

sudo cryptsetup luksFormat /dev/sdb

WARNING!
========
This will overwrite data on /dev/sdb irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase for /dev/sdb:
Verify passphrase:

ubuntu@maas-vhost6:~$ sudo cryptsetup luksDump /dev/sdb | grep UUID
UUID: 21bacaf0-9eea-4809-9b1c-6f4d7e614f5b

ubuntu@maas-vhost6:~$ tree /dev/disk/by-uuid/
/dev/disk/by-uuid/
├── 21bacaf0-9eea-4809-9b1c-6f4d7e614f5b -> ../../sdb
└── d26a75c9-15f7-41de-8c0e-20f795ed5729 -> ../../sda1

0 directories, 2 files

Revision history for this message
asi (gmazyland) wrote :

If you have properly installed device-mapper udev rules, cryptsetup is internally synchronized to all symlinks and nodes creation (it should never return only when udev is finished, that's why libdevmapper internally uses semaphores and cookies per device).

Calling udev settle is just workaround for broken udev rules (or the order of rules), you should never need that here...

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

asi,

I would rule out the lack of rules because on both xenial and bionic we have LVM udev rules with the following line that is supposed to create a LUKS UUID-based symlink:`

ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"

https://git.launchpad.net/~usd-import-team/ubuntu/+source/lvm2/tree/udev/13-dm-disk.rules.in?h=ubuntu/xenial
https://git.launchpad.net/~usd-import-team/ubuntu/+source/lvm2/tree/udev/13-dm-disk.rules.in?h=ubuntu/bionic

The above test have shown that the symlink gets created on bionic.

For xenial it seems to be the same but this test is different in terms of CPU load present on a machine (there is no load in my tests now in comparison to the situations based on which we filed this bug):

uname -a
Linux maas-vhost6 4.4.0-135-generic #161-Ubuntu SMP Mon Aug 27 10:45:01 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

grep -RP ID_FS_UUID_ENC /lib/udev/rules.d/
/lib/udev/rules.d/60-persistent-storage-dm.rules:ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
/lib/udev/rules.d/60-persistent-storage.rules:ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
/lib/udev/rules.d/69-bcache.rules:ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
/lib/udev/rules.d/69-lvm-metad.rules:ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-id/lvm-pv-uuid-$env{ID_FS_UUID_ENC}"
/lib/udev/rules.d/63-md-raid-arrays.rules:ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*", SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"

lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 64G 0 disk
`-sda1 8:1 0 64G 0 part /
sdb 8:16 0 8G 0 disk
sdc 8:32 0 102.4M 0 disk
sdd 8:48 0 102.4M 0 disk
sde 8:64 0 102.4M 0 disk
vda 253:0 0 102.4M 0 disk
vdb 253:16 0 102.4M 0 disk
nvme0n1 259:0 0 20G 0 disk
nvme1n1 259:1 0 20G 0 disk

tree /dev/disk/by-uuid/
/dev/disk/by-uuid/
└── d26a75c9-15f7-41de-8c0e-20f795ed5729 -> ../../sda1

0 directories, 1 file

sudo cryptsetup luksFormat /dev/sdb

WARNING!
========
This will overwrite data on /dev/sdb irrevocably.

Are you sure? (Type uppercase yes): YES
Enter passphrase:
Verify passphrase:

ubuntu@maas-vhost6:~$ tree /dev/disk/by-uuid/
/dev/disk/by-uuid/
├── 42bf3808-9987-454f-be27-5d6c9b9c5c12 -> ../../sdb
└── d26a75c9-15f7-41de-8c0e-20f795ed5729 -> ../../sda1

0 directories, 2 files

sudo cryptsetup luksDump /dev/sdb | grep UUID
UUID: 42bf3808-9987-454f-be27-5d6c9b9c5c12

Revision history for this message
asi (gmazyland) wrote :

Look for "dmsetup udevcomplete" call in udev rules. This is the sync point when libdevmapper continues. This must be the last call in udev chain rules related to device-mapper devices.
(Run cryptsetup with --debug and you will see that sync point.)

Sometimes it is hidden by the fact that libdevampper could fallback to device internal nodes creation because it verifies that udev nodes were created (the old way used when you compile it without udev support).

Just saying how upstream is designed to work.

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

asi,

Thanks for the guidance.

Attached some outputs.

> Look for "dmsetup udevcomplete" call in udev rules.

ubuntu@maas-vhost6:/lib/udev/rules.d$ grep -RiP udevcomplete
55-dm.rules:ENV{DM_COOKIE}=="?*", RUN+="/sbin/dmsetup udevcomplete $env{DM_COOKIE}"

> Sometimes it is hidden by the fact that libdevampper could fallback to device internal nodes creation because it verifies that udev nodes were created (the old way used when you compile it without udev support).

At least I can see that it is not disabled explicitly for xenial in the build scripts:

https://git.launchpad.net/~usd-import-team/ubuntu/+source/cryptsetup/tree/configure?h=ubuntu/xenial-updates#n15870

# Check whether --enable-udev was given.
if test "${enable_udev+set}" = set; then :
  enableval=$enable_udev;
else
  enable_udev=yes
fi

https://git.launchpad.net/~usd-import-team/ubuntu/+source/cryptsetup/tree/debian/rules?h=ubuntu/xenial-updates#n43

> This must be the last call in udev chain rules related to device-mapper devices. (Run cryptsetup with --debug and you will see that sync point.)

Do you remember anything specific? I do not see anything above this

# Key length 32, device size 16777216 sectors, header size 2050 sectors.
# Releasing crypt device /dev/sdb context.
# Releasing device-mapper backend.
# Unlocking memory.
Command successful.
14:09:07.630619660

in the attached output that would definitively resemble a sync point ("Releasing context..." could be it but I am not sure without looking at the code).

journalctl -u systemd-udevd.service -f -o short-precise

# ...
Sep 15 14:09:07.634172 maas-vhost6 systemd-udevd[3793]: IMPORT builtin 'blkid' /lib/udev/rules.d/60-persistent-storage.rules:76
Sep 15 14:09:07.634285 maas-vhost6 systemd-udevd[3793]: probe /dev/sdb raid offset=0
Sep 15 14:09:07.634399 maas-vhost6 systemd-udevd[3793]: LINK 'disk/by-uuid/fcdd1397-8fb7-410c-b343-a7bb1a2f83d0' /lib/udev/rules.d/60-persistent-storage.rules:79

# ...

Sep 15 14:09:07.635434 maas-vhost6 systemd-udevd[3793]: found 'b8:16' claiming '/run/udev/links/\x2fdisk\x2fby-uuid\x2ffcdd1397-8fb7-410c-b343-a7bb1a2f83d0'
Sep 15 14:09:07.635547 maas-vhost6 systemd-udevd[3793]: creating link '/dev/disk/by-uuid/fcdd1397-8fb7-410c-b343-a7bb1a2f83d0' to '/dev/sdb'
Sep 15 14:09:07.635659 maas-vhost6 systemd-udevd[3793]: preserve already existing symlink '/dev/disk/by-uuid/fcdd1397-8fb7-410c-b343-a7bb1a2f83d0' to '../../sdb'

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

95-dm-notify.rules in dmsetup is only shipped in bionic and up. So I suspect that xenial, does not have udev synchronisation in dmsetup package.

xenial: https://packages.ubuntu.com/search?suite=xenial&arch=any&searchon=contents&keywords=95-dm-notify.rules

bionic: https://packages.ubuntu.com/search?suite=bionic&arch=any&searchon=contents&keywords=95-dm-notify.rules

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

do we need and want to enable udev synchronisation in dmsetup package in xenial?

Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote : Re: [Bug 1780332] Re: vaultlocker does not ensure that udev is triggered to create /dev/disk/by-uuid/<uuid-in-luks-header> symlink and fails

We still have some clouds getting deployed with xenial that rely on this,
however, the workaround works for our purposes.

The only downside is that the workaround will be triggered on Bionic as
well unless a version dependent call will be added to vaultlocker.

On Wed, Oct 10, 2018, 16:31 Dimitri John Ledkov, <email address hidden>
wrote:

> do we need and want to enable udev synchronisation in dmsetup package in
> xenial?
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1780332
>
> Title:
> vaultlocker does not ensure that udev is triggered to create /dev/disk
> /by-uuid/<uuid-in-luks-header> symlink and fails
>
> Status in vaultlocker:
> Fix Released
> Status in cryptsetup package in Ubuntu:
> New
> Status in lvm2 package in Ubuntu:
> New
> Status in systemd package in Ubuntu:
> Incomplete
>
> Bug description:
> When an encrypted device is setup up a UUID (osd_fsid) is passed from
> the charm to be used in the cryptsetup command which accepts a UUID to
> place into the LUKS header (shown in cryptsetup luksDump <path-to-
> block-device>).
>
>
> https://github.com/openstack/charm-ceph-osd/blob/stable/18.05/lib/ceph/utils.py#L1788-L1804
> UUID comes from osd_fsid
>
>
> https://github.com/openstack-charmers/vaultlocker/blob/8c9cb85dc3ed5dbf18c66a810d189a5230d85c34/vaultlocker/shell.py#L69-L80
> # else statement is used here
> block_uuid = str(uuid.uuid4()) if not args.uuid else args.uuid
>
> dmcrypt.luks_format(key, block_device, block_uuid) # creates a LUKS
> header
> # ...
> dmcrypt.luks_open(key, block_uuid) # sets up a device with device
> mapper decrypting it via dmcrypt
>
> https://github.com/openstack-
>
> charmers/vaultlocker/blob/d813233179bdf2eec8ed101c702a8e552a966f44/vaultlocker/dmcrypt.py#L44-L56
>
> This UUID is visible in blkid output
>
> /dev/sdc: UUID="<luks-header-uuid>" TYPE="crypto_LUKS"
>
> and a udev rule exists to create a /dev/disk/by-uuid/<luks-header-
> uuid> symlink (which is normally used for filesystem -> block device
> resolution)
>
>
> https://git.launchpad.net/~usd-import-team/ubuntu/+source/lvm2/tree/udev/13-dm-disk.rules.in#n25
> ENV{ID_FS_USAGE}=="filesystem|other|crypto", ENV{ID_FS_UUID_ENC}=="?*",
> SYMLINK+="disk/by-uuid/$env{ID_FS_UUID_ENC}"
>
>
> Where vaultlocker fails is in luks_open command (right after
> luks_format)
>
> # cryptsetup --batch-mode --key-file - open UUID=<luks-header-uuid>
> crypt-<luks-header-uuid> --type luks
>
> because it tries to access /dev/disk/by-uuid/<luks-header-uuid> which
> does not exist.
>
> This happens since udev rules are not re-triggered to create this
> symlink after a LUKS device is created.
>
> Solution: call the command below after luks_format before luks_open
>
> udevadm settle --exit-if-exists=/dev/disk/by-uuid/<luks-header-uuid-
> equal-to-osd-fsid>
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/vaultlocker/+bug/1780332/+subscriptions
>
--
Best Regards,
Dmitrii Shcherbakov
Field Software Engineer

Revision history for this message
James Page (james-page) wrote :

@xnox

as dmitriis state's we've worked around this issue in vaultlocker; however I do think that fixing this more generally would be a nice broader ecosystem step - fixing and underlying race rather than papering over it in tooling high up the stack is always more preferable.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Also reading above comments, did we verify that Bionic is correctly synchronized, and needs no workaround? It would be nice to reliably confirm that these things operate race-free in bionic and up, and then mark this bug to affect only xenial and below.

Revision history for this message
Dimitri John Ledkov (xnox) wrote :

Ping, does vaultlocker work, _without_ work arounds, race free on bionic and up?

Changed in cryptsetup (Ubuntu):
status: New → Incomplete
Changed in lvm2 (Ubuntu):
status: New → Incomplete
Revision history for this message
Dmitrii Shcherbakov (dmitriis) wrote :

xnox,

Hmm, based on my previous notes in #18 it is not clear whether that's the case.

We would have to test with vaultlocker patched to have the workaround removed.

tags: added: fr-630
Revision history for this message
Brian Murray (brian-murray) wrote :

Could somebody please test this _without_ work arounds on Bionic and later?

Revision history for this message
Nick Rosbrook (enr0n) wrote :

Marking as invalid for systemd since the bug is stale and it remains unclear if there is a bug in systemd.

Changed in systemd (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.