Install fails due to udevadm in LXD with rev 262

Bug #1776713 reported by Chris Sanders
18
This bug affects 4 people
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
High
Unassigned

Bug Description

Starting with ceph-osd-262 the addition of a udevadm command during storage.real causes install to fail when performed in a LXD container. Testing with ceph-osd-261 confirms install of openstack-on-lxd is successful.

The return code from udevadm control --reload-rules is '2' when run in a LXD even when set to priviliaged and nesting enabled.

Debug Log showing the error:
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install Traceback (most recent call last):
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 630, in <module>
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install hooks.execute(sys.argv)
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install self._hooks[hook_name]()
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install return f(*args, **kwargs)
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 242, in install
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install install_udev_rules()
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 231, in install_udev_rules
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install '--reload-rules'])
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install raise CalledProcessError(retcode, cmd)
unit-ceph-osd-0: 16:33:15 DEBUG unit.ceph-osd/0.install subprocess.CalledProcessError: Command '['udevadm', 'control', '--reload-rules']' returned non-zero exit status 2
unit-ceph-osd-0: 16:33:15 ERROR juju.worker.uniter.operation hook "install" failed: exit status 1

Revision history for this message
Chris Sanders (chris.sanders) wrote :

Here's the debug log in a format that's less painful to read:
https://pastebin.ubuntu.com/p/6dXPjbt652/

Revision history for this message
Sean Feole (sfeole) wrote :
Download full text (3.5 KiB)

I ran into this today too, 18.04 + LXD 3.0.0-0ubuntu4 on arm64.

2018-06-20 14:00:36 DEBUG install Hit:1 http://ports.ubuntu.com/ubuntu-ports bionic InRelease
2018-06-20 14:00:36 DEBUG install Hit:2 http://ports.ubuntu.com/ubuntu-ports bionic-updates InRelease
2018-06-20 14:00:36 DEBUG install Hit:3 http://ports.ubuntu.com/ubuntu-ports bionic-backports InRelease
2018-06-20 14:00:37 DEBUG install Hit:4 http://ports.ubuntu.com/ubuntu-ports bionic-security InRelease
2018-06-20 14:00:39 DEBUG install Reading package lists...
2018-06-20 14:00:39 DEBUG install lxc
2018-06-20 14:00:39 DEBUG worker.uniter.jujuc server.go:181 running hook tool "juju-log"
2018-06-20 14:00:39 INFO juju-log Installing ['ceph', 'gdisk', 'btrfs-tools', 'python-ceph', 'radosgw', 'xfsprogs', 'python-pyudev', 'lvm2', 'parted'] with options: ['--option=Dpkg::Options::=--force-confold']
2018-06-20 14:00:39 DEBUG install Reading package lists...
2018-06-20 14:00:39 DEBUG install Building dependency tree...
2018-06-20 14:00:39 DEBUG install Reading state information...
2018-06-20 14:00:40 DEBUG install btrfs-tools is already the newest version (4.15.1-1build1).
2018-06-20 14:00:40 DEBUG install gdisk is already the newest version (1.0.3-1).
2018-06-20 14:00:40 DEBUG install lvm2 is already the newest version (2.02.176-4.1ubuntu3).
2018-06-20 14:00:40 DEBUG install parted is already the newest version (3.2-20).
2018-06-20 14:00:40 DEBUG install xfsprogs is already the newest version (4.9.0+nmu1ubuntu2).
2018-06-20 14:00:40 DEBUG install python-pyudev is already the newest version (0.21.0-1).
2018-06-20 14:00:40 DEBUG install ceph is already the newest version (12.2.4-0ubuntu1.1).
2018-06-20 14:00:40 DEBUG install python-ceph is already the newest version (12.2.4-0ubuntu1.1).
2018-06-20 14:00:40 DEBUG install radosgw is already the newest version (12.2.4-0ubuntu1.1).
2018-06-20 14:00:40 DEBUG install 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded.
2018-06-20 14:00:40 DEBUG install Traceback (most recent call last):
2018-06-20 14:00:40 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 630, in <module>
2018-06-20 14:00:40 DEBUG install hooks.execute(sys.argv)
2018-06-20 14:00:40 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
2018-06-20 14:00:40 DEBUG install self._hooks[hook_name]()
2018-06-20 14:00:40 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-06-20 14:00:40 DEBUG install return f(*args, **kwargs)
2018-06-20 14:00:40 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 242, in install
2018-06-20 14:00:40 DEBUG install install_udev_rules()
2018-06-20 14:00:40 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 231, in install_udev_rules
2018-06-20 14:00:40 DEBUG install '--reload-rules'])
2018-06-20 14:00:40 DEBUG install File "/usr/lib/python3.6/subprocess.py", line 291, in check_call
2018-06-20 14:00:40 DEBUG install raise CalledProcessError(retcode,...

Read more...

Changed in charm-ceph-osd:
status: New → Confirmed
Revision history for this message
Sean Feole (sfeole) wrote :

This is fixed with LXD version 3.1:

$ lxc --version
3.1

ubuntu@d0$ sudo snap list
sudo: unable to resolve host d05-4: Resource temporarily unavailable
Name Version Rev Tracking Developer Notes
core 16-2.33 4833 stable canonical core
juju 2.4-rc1+develop-474e0ab 4557 edge canonical classic
lxd 3.1 7424 stable canonical -

$ juju status
Model Controller Cloud/Region Version SLA Timestamp
default localhost-localhost localhost/localhost 2.4-rc1 unsupported 20:45:45Z

...

ceph-mon 12.2.4 active 3 ceph-mon jujucharms 25 ubuntu
ceph-osd 12.2.4 active 3 ceph-osd jujucharms 262 ubuntu

...

ceph-osd/0* active idle 4 10.58.90.155 Unit is ready (1 OSD)
ceph-osd/1 active idle 5 10.58.90.7 Unit is ready (1 OSD)
ceph-osd/2 active idle 6 10.58.90.128 Unit is ready (1 OSD)

ubuntu@juju-6186a6-1:~$ sudo ceph health
HEALTH_OK
ubuntu@juju-6186a6-1:~$ sudo ceph pg stat
104 pgs: 104 active+clean; 1127 bytes data, 819 GB used, 10178 GB / 10998 GB avail; 1786 B/s rd, 1 op/s
ubuntu@juju-6186a6-1:~$ sudo ceph osd stat
3 osds: 3 up, 3 in
ubuntu@juju-6186a6-1:~$ sudo ceph mon stat
e2: 3 mons at {juju-6186a6-1=10.58.90.105:6789/0,juju-6186a6-2=10.58.90.94:6789/0,juju-6186a6-3=10.58.90.112:6789/0}, election epoch 12, leader 0 juju-6186a6-2, quorum 0,1,2 juju-6186a6-2,juju-6186a6-1,juju-6186a6-3
ubuntu@juju-6186a6-1:~$

Revision history for this message
Sean Feole (sfeole) wrote :
Revision history for this message
Goran Miskovic (schkovich) wrote :

I'm running LXC 3.1 and I stumbled on the same problem.

lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.4 LTS
Release: 16.04
Codename: xenial

uname -r
4.4.0-128-generic

lxc version
Client version: 3.1
Server version: 3.1

2018-06-27 12:51:10 DEBUG install Traceback (most recent call last):
2018-06-27 12:51:10 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 630, in <module>
2018-06-27 12:51:10 DEBUG install hooks.execute(sys.argv)
2018-06-27 12:51:10 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/core/hookenv.py", line 823, in execute
2018-06-27 12:51:10 DEBUG install self._hooks[hook_name]()
2018-06-27 12:51:10 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/charmhelpers/contrib/hardening/harden.py", line 79, in _harden_inner2
2018-06-27 12:51:10 DEBUG install return f(*args, **kwargs)
2018-06-27 12:51:10 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 242, in install
2018-06-27 12:51:10 DEBUG install install_udev_rules()
2018-06-27 12:51:10 DEBUG install File "/var/lib/juju/agents/unit-ceph-osd-0/charm/hooks/install.real", line 231, in install_udev_rules
2018-06-27 12:51:10 DEBUG install '--reload-rules'])
2018-06-27 12:51:10 DEBUG install File "/usr/lib/python3.5/subprocess.py", line 581, in check_call
2018-06-27 12:51:10 DEBUG install raise CalledProcessError(retcode, cmd)
2018-06-27 12:51:10 DEBUG install subprocess.CalledProcessError: Command '['udevadm', 'control', '--reload-rules']' returned non-zero exit status 2
2018-06-27 12:51:10 ERROR juju.worker.uniter.operation runhook.go:113 hook "install" failed: exit status 1

Revision history for this message
Goran Miskovic (schkovich) wrote :

The problem does not appear to be ceph-osd charm specific. Each LXC container deployed as part of OpenStack on LXD bundle has the same problem. udevadm control --reload-rules will exit with status code 2. This is confirmed on Pike and Queens deployments. Non-Juju LXC deployments do not suffer from this problem.

Specific about ceph-osd 262 is that the method install_udev_rules that was introduced in commit 901b873 https://github.com/openstack/charm-ceph-osd/commit/901b8731d4d236191a3e3e123e508dcdaab08c93#diff-9bb63b127ff9b4717f77c840a30821d3R192. will call subprocess.check_call that would raise an exception if the sub-process returns non-zero code.

I refactored the method to use subprocess.call(...) instead which led to successful deployment of Queens bundle. Since the bug is showstopper perhaps the patch might be good enough. The patch could be improved by handling acordingly the exception raised by subprocess.check_call(...). However, this is hackish. The patch does not address the source of the issue.

Only one problem is noticeable in journalctl: mount: can't find LABEL=cloudimg-rootfs. Related bug https://github.com/lxc/lxd/issues/2319#issuecomment-245549555

Running udevadm test /sys/class/block/sdb will show the problem when executing 60-ceph-by-parttypeuuid.rules
'/sbin/blkid -o udev -p /dev/sdb'(err) 'error: /dev/sdb: No such file or directory'
Process '/sbin/blkid -o udev -p /dev/sdb' failed with exit code 2.

Let me know if I could provide further information.

Revision history for this message
James Page (james-page) wrote :

The fix here is to just skip this call in LXD containers; charmhelpers has a function to determine whether the unit is running in a container.

The udev rules support LVM use for Ceph; that won't work in a container anyway so its pretty pointless to fix the reload - lets just do the oneliner to avoid this!

Revision history for this message
James Page (james-page) wrote :

For reference, ceph-osd in LXD containers can only use directory based OSD's

Changed in charm-ceph-osd:
status: Confirmed → Triaged
importance: Undecided → High
milestone: none → 18.08
Revision history for this message
Goran Miskovic (schkovich) wrote :

I will submit PR tomorrow morning. Stéphane has some idea that I'm going to try tomorrow morning. See his comment on related LXD issue.

Revision history for this message
Goran Miskovic (schkovich) wrote :

Done. The issue wasn't updated automatically since I omitted hash in front of the issue number in the commit message. :(

https://github.com/openstack/charm-ceph-osd/pull/3

Revision history for this message
Goran Miskovic (schkovich) wrote :

openstack-gerrit was not happy and closed the pull request. I did not touch the file that our boot friend is complaining about. :)

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/580026
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=dd426903471f28eff8e357bac2ca0889ffcff4b9
Submitter: Zuul
Branch: master

commit dd426903471f28eff8e357bac2ca0889ffcff4b9
Author: James Page <email address hidden>
Date: Wed Jul 4 04:57:33 2018 +0100

    Skip udev rule install in containers

    Ensure that udev rules are not installed and reloaded
    when running in a container; this is not permitted and
    the udev rules are used for block devices, which are
    not supported within container based deployments.

    Change-Id: I9a580172fcbbf8cec63af7adccb0808915184658
    Closes-Bug: 1776713

Changed in charm-ceph-osd:
status: Triaged → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/18.05)

Fix proposed to branch: stable/18.05
Review: https://review.openstack.org/585897

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (stable/18.05)

Reviewed: https://review.openstack.org/585897
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=0cdc1ab94d49f8fa305f2a9fb4dd93ff8cb4cf57
Submitter: Zuul
Branch: stable/18.05

commit 0cdc1ab94d49f8fa305f2a9fb4dd93ff8cb4cf57
Author: James Page <email address hidden>
Date: Wed Jul 4 04:57:33 2018 +0100

    Skip udev rule install in containers

    Ensure that udev rules are not installed and reloaded
    when running in a container; this is not permitted and
    the udev rules are used for block devices, which are
    not supported within container based deployments.

    Change-Id: I9a580172fcbbf8cec63af7adccb0808915184658
    Closes-Bug: 1776713
    (cherry picked from commit dd426903471f28eff8e357bac2ca0889ffcff4b9)

James Page (james-page)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.