Missing udev rules breaks locally attached encrypted ceph volumes

Bug #1884114 reported by Stephen Finucane
6
This bug affects 1 person
Affects Status Importance Assigned to Milestone
os-brick
Fix Committed
Medium
Stephen Finucane

Bug Description

Duplicating from https://bugzilla.redhat.com/show_bug.cgi?id=1848610

---

Nova now supports locally attaching ceph volumes using os-brick using a combination of the '[workarounds] disable_native_luksv1' (to disable native attachment using QEMU so os-brick is used instead) and '[workarounds] rbd_volume_local_attach' (to enabled local attachment) config options. This is broken on an OSP 16.1 deployment. This appears to be because the symlink os-brick expects at '/dev/rbd/{pool}/{device}' (which points to '/dev/rbdN') isn't being created. This should be created by udev rules that ceph provides [1]. Since udev isn't run within 'nova_compute' container, for these to function they must be present on the host. In OSP 13, this was the case, however, in OSP 16.1, it is not.

On an OSP 13 node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  /usr/lib/udev/rules.d/50-rbd.rules
  $ rpm -qf /usr/lib/udev/rules.d/50-rbd.rules
  ceph-common-12.2.12-115.el7cp.x86_64
  [heat-admin@compute-0 ~]$ sudo yum list 'ceph*' -q
  Installed Packages
  ceph-common.x86_64 2:12.2.12-115.el7cp @rhelosp-ceph-3-mon
  ...

On an OSP 16.1 (beta) node:

  [heat-admin@compute-0 ~]$ ls /usr/lib/udev/rules.d/50-rbd.rules
  ls: cannot access '/usr/lib/udev/rules.d/50-rbd.rules': No such file or directory
  [stack@undercloud-0 ~]$ sudo dnf list 'ceph*' --installed
  Installed Packages
  ceph-ansible.noarch

The absence of this file means the symlink is not created, and nova/os-brick raises an exception when trying to decrypt the non-existent path.

  2020-06-18 15:38:19.175 8 DEBUG os_brick.encryptors.luks [req-foo bar baz - default default] opening encrypted volume /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d _open_volume /usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py:109
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [req-foo bar baz - default default] [instance: foo] Failure attaching encryptor; rolling back volume connection: oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  Exit code: 4
  Stdout: ''
  Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Traceback (most recent call last):
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1588, in _connect_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] self._attach_encryptor(context, connection_info, encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/nova/virt/libvirt/driver.py", line 1733, in _attach_encryptor
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] encryptor.attach_volume(context, **encryption)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 167, in attach_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] self._open_volume(passphrase, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/encryptors/luks.py", line 113, in _open_volume
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] root_helper=self._root_helper)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/executor.py", line 52, in _execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] result = self.__execute(*args, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/os_brick/privileged/rootwrap.py", line 169, in execute
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] return execute_root(*cmd, **kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/oslo_privsep/priv_context.py", line 245, in _wrap
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] return self.channel.remote_call(name, args, kwargs)
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] File "/usr/lib/python3.6/site-packages/oslo_privsep/daemon.py", line 204, in remote_call
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] raise exc_type(*result[2])
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d crypt-volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Exit code: 4
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stdout: ''
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo] Stderr: "Device /dev/rbd/volumes/volume-38e1e0ee-ac91-4d1d-be84-6dc8fb29292d doesn't exist or access denied.\n"
  2020-06-18 15:38:19.192 8 ERROR nova.virt.libvirt.driver [instance: foo]

I see three possible solutions at the moment:

1. These udev rules should be present on the host. This could be as simple as installing the 'ceph-common' package, though that's pretty leaky.
2. The udev daemon should be present in the 'nova_compute' container.
3. os-brick should be enhanced to not require these symlinks.

Revision history for this message
Stephen Finucane (stephenfinucane) wrote :

Note sure why this isn't linking to https://review.opendev.org/#/c/736758/

Changed in os-brick:
assignee: nobody → Stephen Finucane (stephenfinucane)
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (master)

Related fix proposed to branch: master
Review: https://review.opendev.org/742383

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Change abandoned on os-brick (master)

Change abandoned by Stephen Finucane (<email address hidden>) on branch: master
Review: https://review.opendev.org/742383
Reason: Whoops, this should be a new PS for https://review.opendev.org/#/c/736758/

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (master)

Reviewed: https://review.opendev.org/736758
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=ee34d925ff8a8a83345941b7876b09f2c0396864
Submitter: Zuul
Branch: master

commit ee34d925ff8a8a83345941b7876b09f2c0396864
Author: Stephen Finucane <email address hidden>
Date: Wed Jul 22 11:07:19 2020 +0100

    rbd: Warn if ceph udev rules are not configured

    The LUKS encryptor feature expects devices to have a symbolic link that
    it can overwrite in order to enable transparent encryption/decryption
    for instances [1]. This is generally the case for RBD volumes, as Ceph
    uses udev rules [2] to create a '/dev/rbd/{pool}/{device}' ->
    '/dev/rbdN' symlink. However, in an environment where udev daemon is not
    present or configured correctly, this symlink will never be configured.
    This causes things to crash and burn in a rather non-obvious manner when
    locally attaching an encrypted RBD volume:

      oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-foo crypt-volume-foo
      Exit code: 4
      Stdout: ''
      Stderr: "Device /dev/rbd/volumes/foo doesn't exist or access denied.\n"

    ('foo' being a stand-in for a very long 'device-$UUID' name)

    The long term fix here is to probably stop relying on the side effects
    of these udev rules, i.e. the symlinks, but that is a far more involved
    fix that would not be backportable. Instead, for now we simply leave a
    breadcrumb for the user, informing them as to what's gone wrong and
    encouraging them to look at the bug report for more information.

    [1] https://github.com/openstack/os-brick/blob/3.1.0/os_brick/encryptors/luks.py#L191-L195
    [2] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules

    Change-Id: I2775f55039695c7ec029106c0dafe4d46255b336
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1884114

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/ussuri)

Related fix proposed to branch: stable/ussuri
Review: https://review.opendev.org/748660

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix proposed to os-brick (stable/train)

Related fix proposed to branch: stable/train
Review: https://review.opendev.org/748661

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/train)

Reviewed: https://review.opendev.org/748661
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=9905455da6d7031eb04e209f8e2225880de01913
Submitter: Zuul
Branch: stable/train

commit 9905455da6d7031eb04e209f8e2225880de01913
Author: Stephen Finucane <email address hidden>
Date: Wed Jul 22 11:07:19 2020 +0100

    rbd: Warn if ceph udev rules are not configured

    The LUKS encryptor feature expects devices to have a symbolic link that
    it can overwrite in order to enable transparent encryption/decryption
    for instances [1]. This is generally the case for RBD volumes, as Ceph
    uses udev rules [2] to create a '/dev/rbd/{pool}/{device}' ->
    '/dev/rbdN' symlink. However, in an environment where udev daemon is not
    present or configured correctly, this symlink will never be configured.
    This causes things to crash and burn in a rather non-obvious manner when
    locally attaching an encrypted RBD volume:

      oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-foo crypt-volume-foo
      Exit code: 4
      Stdout: ''
      Stderr: "Device /dev/rbd/volumes/foo doesn't exist or access denied.\n"

    ('foo' being a stand-in for a very long 'device-$UUID' name)

    The long term fix here is to probably stop relying on the side effects
    of these udev rules, i.e. the symlinks, but that is a far more involved
    fix that would not be backportable. Instead, for now we simply leave a
    breadcrumb for the user, informing them as to what's gone wrong and
    encouraging them to look at the bug report for more information.

    [1] https://github.com/openstack/os-brick/blob/3.1.0/os_brick/encryptors/luks.py#L191-L195
    [2] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules

    Change-Id: I2775f55039695c7ec029106c0dafe4d46255b336
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1884114
    (cherry picked from commit ee34d925ff8a8a83345941b7876b09f2c0396864)
    (cherry picked from commit 1eeffd986dd8d5a192c7af272fb5eefb0ce43da2)

tags: added: in-stable-train
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Related fix merged to os-brick (stable/ussuri)

Reviewed: https://review.opendev.org/748660
Committed: https://git.openstack.org/cgit/openstack/os-brick/commit/?id=d3ee7a076be6d3397946f17f15bee4bda2968459
Submitter: Zuul
Branch: stable/ussuri

commit d3ee7a076be6d3397946f17f15bee4bda2968459
Author: Stephen Finucane <email address hidden>
Date: Wed Jul 22 11:07:19 2020 +0100

    rbd: Warn if ceph udev rules are not configured

    The LUKS encryptor feature expects devices to have a symbolic link that
    it can overwrite in order to enable transparent encryption/decryption
    for instances [1]. This is generally the case for RBD volumes, as Ceph
    uses udev rules [2] to create a '/dev/rbd/{pool}/{device}' ->
    '/dev/rbdN' symlink. However, in an environment where udev daemon is not
    present or configured correctly, this symlink will never be configured.
    This causes things to crash and burn in a rather non-obvious manner when
    locally attaching an encrypted RBD volume:

      oslo_concurrency.processutils.ProcessExecutionError: Unexpected error while running command.
      Command: cryptsetup luksOpen --key-file=- /dev/rbd/volumes/volume-foo crypt-volume-foo
      Exit code: 4
      Stdout: ''
      Stderr: "Device /dev/rbd/volumes/foo doesn't exist or access denied.\n"

    ('foo' being a stand-in for a very long 'device-$UUID' name)

    The long term fix here is to probably stop relying on the side effects
    of these udev rules, i.e. the symlinks, but that is a far more involved
    fix that would not be backportable. Instead, for now we simply leave a
    breadcrumb for the user, informing them as to what's gone wrong and
    encouraging them to look at the bug report for more information.

    [1] https://github.com/openstack/os-brick/blob/3.1.0/os_brick/encryptors/luks.py#L191-L195
    [2] https://github.com/ceph/ceph/blob/v14.0.0/udev/50-rbd.rules

    Change-Id: I2775f55039695c7ec029106c0dafe4d46255b336
    Signed-off-by: Stephen Finucane <email address hidden>
    Related-Bug: #1884114
    (cherry picked from commit ee34d925ff8a8a83345941b7876b09f2c0396864)

tags: added: in-stable-ussuri
Changed in os-brick:
status: In Progress → Fix Committed
importance: Undecided → Medium
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.