No OSDs has been initialized in random unit with "No block devices detected using current configuration"

Bug #1812925 reported by Vladimir Grevtsev
16
This bug affects 1 person
Affects Status Importance Assigned to Milestone
Ceph OSD Charm
Fix Released
High
Liam Young
bcache-tools (Ubuntu)
Invalid
Undecided
Unassigned

Bug Description

When deploying a bundle with Ceph and encryption enabled - some random units are failing into "No block devices detected using current configuration", and some units are missing some of OSDs that should be here.

bundle: https://pastebin.canonical.com/p/qg2Trzs3bj/
juju status: http://paste.ubuntu.com/p/9p7qyxNHzw/
juju config ceph-osd: http://paste.ubuntu.com/p/QJkGgfNfW8/
juju crashdump: https://drive.google.com/open?id=16c1rLD9FAViWWs28Q2ATHhkFI5lcXNjF

### logs from ceph-osd/0, which has one OSD missing
sosreport: https://drive.google.com/open?id=1QGrnIeH_a39z0LZYLjXwUSYFqG46IWKw
machine curtin config: http://paste.ubuntu.com/p/R82rf759wF/
juju debug-log ceph-osd/0 --replay: http://paste.ubuntu.com/p/RmZ35WqPPP/
aa-status: http://paste.ubuntu.com/p/sfWrpkRVS6/
lsblk: http://paste.ubuntu.com/p/HZWHsFJvbc/
blkid: http://paste.ubuntu.com/p/FhsDxtRZb3/

# osd log is missing - so it looks like its creations has not been triggered
$ sudo ls -lah /var/log/ceph
total 1.4M
drwxrws--T 2 ceph ceph 4.0K Jan 22 15:40 .
drwxrwxr-x 21 root syslog 4.0K Jan 22 15:17 ..
-rw-r--r-- 1 ceph ceph 93K Jan 22 15:39 ceph-osd.16.log
-rw-r--r-- 1 ceph ceph 91K Jan 22 15:39 ceph-osd.23.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:39 ceph-osd.30.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:40 ceph-osd.37.log
-rw-r--r-- 1 ceph ceph 96K Jan 22 15:40 ceph-osd.44.log
-rw-r--r-- 1 ceph ceph 94K Jan 22 15:40 ceph-osd.51.log
-rw-r--r-- 1 ceph ceph 94K Jan 22 15:38 ceph-osd.9.log
-rw-r--r-- 1 root ceph 697K Jan 22 15:40 ceph-volume.log

/var/log/ceph from ceph-osd/0: https://drive.google.com/open?id=1nHJW7VfVXd3Zhvnrpat44ugBMevo3qUQ

### logs from ceph-osd/3, which failed to initialize all OSDs
sosreport: https://drive.google.com/open?id=1wi02ORdv30Xw8AeIR9ta0dNq93q9X2K3
machine curtin config: http://paste.ubuntu.com/p/yyDQjCfqfg/
juju debug-log -i ceph-osd/3 --replay: http://paste.ubuntu.com/p/vK6Xmb9bsT/
lsblk: http://paste.ubuntu.com/p/pTpBFj7Hgd/
blkid: http://paste.ubuntu.com/p/T2C9FMCtxT/

$ sudo ls -lah /var/log/ceph
total 8.0K
drwxrws--T 2 ceph ceph 4.0K Oct 9 08:27 .
drwxrwxr-x 21 root syslog 4.0K Jan 22 15:17 ..

The root cause _maybe_ is around https://bugs.launchpad.net/vaultlocker/+bug/1780332 - at least sympthoms are the same; udev symlinks are recreated exactly at the same time when charm-authorize action (required by ceph charm to start encrypting partition) is fired.

At the time of filing this bug system was several hours in "idle" state, but something is still triggering symlinks recreation:

$ ls -lah /dev/disk/by-dname
total 0
drwxr-xr-x 2 root root 400 Jan 22 21:54 .
drwxr-xr-x 9 root root 180 Jan 22 15:09 ..
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache0 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache1 -> ../../bcache8
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache2 -> ../../bcache9
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache3 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache4 -> ../../bcache7
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache5 -> ../../bcache1
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache6 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache7 -> ../../bcache4
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache8 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jan 22 21:54 bcache9 -> ../../bcache2

$ date
Tue Jan 22 22:01:43 UTC 2019

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Subscribed field-critical as this blocks us from starting handover preparations.

tags: added: field-critical
Revision history for this message
Felipe Reyes (freyes) wrote :
Download full text (4.3 KiB)

in ceph-osd/0:

/dev/mapper:
total 0
drwxr-xr-x 2 0 0 640 Jan 22 15:40 .
drwxr-xr-x 28 0 0 5440 Jan 22 15:40 ..
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--2c8226f5--4f8a--4bbf--ab8c--de48a2b99bb7-osd--block--2c8226f5--4f8a--4bbf--ab8c--de48a2b99bb7 -> ../dm-10
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--439883e1--2062--4633--b83b--5e076bf70c86-osd--block--439883e1--2062--4633--b83b--5e076bf70c86 -> ../dm-22
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--83e8d721--fef1--4aa2--8d69--6ffb7d28d4b2-osd--block--83e8d721--fef1--4aa2--8d69--6ffb7d28d4b2 -> ../dm-14
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--8fd627b6--b32e--4766--b622--9e5366b3e307-osd--block--8fd627b6--b32e--4766--b622--9e5366b3e307 -> ../dm-6
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--9db96aa6--5dbe--4399--b597--746680ae7367-osd--block--9db96aa6--5dbe--4399--b597--746680ae7367 -> ../dm-1
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--b7244c2f--2ea0--49f7--bb12--70058d31d681-osd--block--b7244c2f--2ea0--49f7--bb12--70058d31d681 -> ../dm-26
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--f3145cff--807c--4d94--a9aa--abfd19d0322c-osd--block--f3145cff--807c--4d94--a9aa--abfd19d0322c -> ../dm-18
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--2c8226f5--4f8a--4bbf--ab8c--de48a2b99bb7 -> ../dm-12
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--439883e1--2062--4633--b83b--5e076bf70c86 -> ../dm-24
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--83e8d721--fef1--4aa2--8d69--6ffb7d28d4b2 -> ../dm-16
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--8fd627b6--b32e--4766--b622--9e5366b3e307 -> ../dm-8
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--9db96aa6--5dbe--4399--b597--746680ae7367 -> ../dm-4
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--b7244c2f--2ea0--49f7--bb12--70058d31d681 -> ../dm-28
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--db--f3145cff--807c--4d94--a9aa--abfd19d0322c -> ../dm-20
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--2c8226f5--4f8a--4bbf--ab8c--de48a2b99bb7 -> ../dm-11
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--439883e1--2062--4633--b83b--5e076bf70c86 -> ../dm-23
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--83e8d721--fef1--4aa2--8d69--6ffb7d28d4b2 -> ../dm-15
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--8fd627b6--b32e--4766--b622--9e5366b3e307 -> ../dm-7
lrwxrwxrwx 1 0 0 7 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--9db96aa6--5dbe--4399--b597--746680ae7367 -> ../dm-3
lrwxrwxrwx 1 0 0 8 Jan 22 15:40 ceph--wal--fcd660db--5e72--41fa--9c13--dabe2347e3f4-osd--wal--b7244c2f--2ea0--49f7--bb12--70058d31d681 -> ../dm-27
lrwxrwxrwx 1 0 0 8 Jan 22 1...

Read more...

tags: added: sts
Revision history for this message
Felipe Reyes (freyes) wrote :

in var/log/juju/unit-ceph-osd-0.log:

2019-01-22 15:38:03 DEBUG juju-log secrets-storage:118: Checking for pristine devices: "['/dev/disk/by-dname/bcache3', '/dev/disk/by-dname/bcache4', '/dev/disk/by-dname/bcache5', '/dev/disk/by-dname/bcache6', '/dev/disk/by-dname/bcache7', '/dev/disk/by-dname/bcache8', '/dev/disk/by-dname/bcache9']"

while in var/log/juju/unit-ceph-osd-3.log:

2019-01-22 15:37:39 DEBUG juju-log secrets-storage:118: Checking for pristine devices: "[]"

Revision history for this message
Felipe Reyes (freyes) wrote :

ceph-osd-3.log:
2019-01-22 15:37:39 INFO juju-log secrets-storage:118: Skipping osd devices previously processed by this unit: []
2019-01-22 15:37:39 DEBUG juju-log secrets-storage:118: Checking for pristine devices: "[]"
2019-01-22 15:37:39 INFO juju-log secrets-storage:118: ceph bootstrapped, rescanning disks

So these filters are removing all the devices - https://github.com/openstack/charm-ceph-osd/blob/master/hooks/ceph_hooks.py#L468-L476

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

After triggering config-changed ("juju config ceph-osd use-syslog=true", nothing important) situation become a little better: https://paste.ubuntu.com/p/5jFd6Dg6xg/

However, there is still one OSD is not being picked up by ceph-osd/0, and "ceph -s" shows warning: https://paste.ubuntu.com/p/v2GXTCMgpf/

Revision history for this message
Liam Young (gnuoy) wrote :

When the secrets-storage hook fire some /dev/disk/by-dname/ sym links are missing and the setup fails. Triggering a subsequent config-changed hooks seems to fix the issue as by then the symlinks have appeared. The missing symlinks can be seen in the juju unit log: https://pastebin.canonical.com/p/R2nrsFzB9g/

This should probably have the field critical tag removed as triggering a config changed hook appears to be a workaround.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Changed to field-high since further testing unblocked, but workaround provided (like "triggering config-changed") is not applicable for Bootstack handover.

tags: added: field-high
removed: field-critical
Liam Young (gnuoy)
Changed in charm-ceph-osd:
assignee: nobody → Liam Young (gnuoy)
importance: Undecided → Critical
status: New → Confirmed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (master)

Fix proposed to branch: master
Review: https://review.openstack.org/632738

Changed in charm-ceph-osd:
status: Confirmed → In Progress
Revision history for this message
James Page (james-page) wrote :
Download full text (3.6 KiB)

OK so some digging later - we held a deployment to inspect the state of the by-dname directory before the ceph-osd charm tries to actually use the links - comparison of numbers of links in by-dname on identical servers:

- Stdout: |
    19
  UnitId: ceph-osd/8
- Stdout: |
    19
  UnitId: ceph-osd/5
- Stdout: |
    18
  UnitId: ceph-osd/0
- Stdout: |
    19
  UnitId: ceph-osd/1
- Stdout: |
    19
  UnitId: ceph-osd/7
- Stdout: |
    19
  UnitId: ceph-osd/4
- Stdout: |
    9
  UnitId: ceph-osd/2
- Stdout: |
    19
  UnitId: ceph-osd/6
- Stdout: |
    19
  UnitId: ceph-osd/3

ceph-osd/2 has no bcache entries in by-dname, but does have bcache devices:

ubuntu@ln-sv-ostk01:~$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 15 Jan 24 13:06 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jan 24 13:06 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jan 24 13:06 nvme0n1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 10 Jan 24 13:06 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 24 13:06 sdi-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 Jan 24 13:06 sdi-part2 -> ../../sdi2
lrwxrwxrwx 1 root root 10 Jan 24 13:06 sdi-part3 -> ../../sdi3
lrwxrwxrwx 1 root root 10 Jan 24 13:06 sdi-part4 -> ../../sdi4
ubuntu@ln-sv-ostk01:~$ ls -l /dev/bcache*
brw-rw---- 1 root disk 252, 0 Jan 24 13:06 /dev/bcache0
brw-rw---- 1 root disk 252, 128 Jan 24 13:06 /dev/bcache1
brw-rw---- 1 root disk 252, 256 Jan 24 13:06 /dev/bcache2
brw-rw---- 1 root disk 252, 384 Jan 24 13:06 /dev/bcache3
brw-rw---- 1 root disk 252, 512 Jan 24 13:06 /dev/bcache4
brw-rw---- 1 root disk 252, 640 Jan 24 13:06 /dev/bcache5
brw-rw---- 1 root disk 252, 768 Jan 24 13:06 /dev/bcache6
brw-rw---- 1 root disk 252, 896 Jan 24 13:06 /dev/bcache7
brw-rw---- 1 root disk 252, 1024 Jan 24 13:06 /dev/bcache8
brw-rw---- 1 root disk 252, 1152 Jan 24 13:06 /dev/bcache9

and ceph-osd/0 has one missing symlink:

ubuntu@ln-sv-ostk09:~$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache0 -> ../../bcache8
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache1 -> ../../bcache9
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache3 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache4 -> ../../bcache1
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache5 -> ../../bcache2
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache6 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache7 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache8 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jan 24 13:09 bcache9 -> ../../bcache4
lrwxrwxrwx 1 root root 15 Jan 24 13:09 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jan 24 13:09 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jan 24 13:09 nvme0n1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 10 Jan 24 13:09 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 24 13:09 sdi-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 Jan 24 13:09 sdi-part2 -> ../../sdi2
lrwxrwxrwx 1 root root 10 Jan 24 13:09 sdi-part3 -> ../../sdi3
lrwxrwxrwx 1 root root 10 Jan 24 13:09 sdi-part4 -> ../../sdi4
ubuntu@ln-sv-ostk09:~$ ls -l /dev/bcache*
brw-rw---- 1 root disk 252, 0 Jan 24 13:09 /dev/bcache0
brw-rw---- 1 r...

Read more...

Revision history for this message
James Page (james-page) wrote :

Raising a bug task for bcache-tools

Revision history for this message
James Page (james-page) wrote :

An obvious workaround would be to trigger a re-scan before the ceph-osd charm tries to osdize the bcache devices - I would consider this a workaround only as the by-dname devices really should be present before the charm executes.

Revision history for this message
James Page (james-page) wrote :

Re-triggering udev outside of the charm creates the missing by-dname entries:

sudo udevadm trigger --subsystem-match=block --action=add

James Page (james-page)
Changed in charm-ceph-osd:
importance: Critical → High
status: In Progress → Incomplete
Revision history for this message
Ryan Harper (raharper) wrote :

Looking at the syslog and kern.log, the kernel has emitted the change events which trigger the bcache rules, for all 12 devices, each time. So what remains to understand is whether udevd ran the hook (69-bcache.rules) and I see no reason it wouldn't.

% grep kernel.*register_bcache /tmp/bcache-syslog.raw | head -n 13
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478597] bcache: register_bcache() error /dev/sdh: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478608] bcache: register_bcache() error /dev/sdg: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478673] bcache: register_bcache() error /dev/sdb: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478684] bcache: register_bcache() error /dev/sde: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478695] bcache: register_bcache() error /dev/sdc: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.478843] bcache: register_bcache() error /dev/sdf: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.492735] bcache: register_bcache() error /dev/sdd: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.522047] bcache: register_bcache() error /dev/nvme0n1p3: device already registered
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.529593] bcache: register_bcache() error /dev/sda1: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.537218] bcache: register_bcache() error /dev/nvme0n1p1: device already registered
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.537606] bcache: register_bcache() error /dev/sdi4: device already registered (emitting change event)
Jan 22 14:37:54 ln-sv-ostk08 kernel: [ 14.557305] bcache: register_bcache() error /dev/sdi3: device already registered (emitting change event)

I didn't find the actual systemd journal (/run/log/journal/<hash>) in the sosreport, so we cannot see much of the udevd logging. It's likely that one will need to enable udevd verbose debugging and reproduce and capture to see what may be going on there.

Revision history for this message
Edward Hope-Morley (hopem) wrote :

@raharper the /dev/disk/by-dname/* entries are created by the /etc/udev/rules.d/bcache*.rules created by maas whereas the /dev/bcache/by-* entries are created by /lib/udev/rules.d/69-bcache.rules from bcache-tools. The deployment in this bug is not using the /dev/bcache/by-* and is only using the by-dname bcache pointers so the question is really why the maas rules didn't run and if somehow maybe they are sometimes conflicting/racing with the bcache-utils rules such that they just never complete.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

After systems being idle for ~2h currently it looks like they have all the symlinks in-place:

$ juju run --application ceph-osd 'ls -lah /dev/disk/by-dname | wc -l' | pastebinit
http://paste.ubuntu.com/p/rDH57vGjKP/

$ ls -l /dev/disk/by-dname/
total 0
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache0 -> ../../bcache8
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache1 -> ../../bcache9
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache2 -> ../../bcache7
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache3 -> ../../bcache2
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache4 -> ../../bcache3
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache5 -> ../../bcache1
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache6 -> ../../bcache0
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache7 -> ../../bcache5
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache8 -> ../../bcache6
lrwxrwxrwx 1 root root 13 Jan 24 15:16 bcache9 -> ../../bcache4
lrwxrwxrwx 1 root root 15 Jan 24 15:16 nvme0n1-part1 -> ../../nvme0n1p1
lrwxrwxrwx 1 root root 15 Jan 24 15:16 nvme0n1-part2 -> ../../nvme0n1p2
lrwxrwxrwx 1 root root 15 Jan 24 15:16 nvme0n1-part3 -> ../../nvme0n1p3
lrwxrwxrwx 1 root root 10 Jan 24 15:16 sda-part1 -> ../../sda1
lrwxrwxrwx 1 root root 10 Jan 24 15:16 sdi-part1 -> ../../sdi1
lrwxrwxrwx 1 root root 10 Jan 24 15:16 sdi-part2 -> ../../sdi2
lrwxrwxrwx 1 root root 10 Jan 24 15:16 sdi-part3 -> ../../sdi3
lrwxrwxrwx 1 root root 10 Jan 24 15:16 sdi-part4 -> ../../sdi4

So something triggered udevadm to re-create that links?

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Just did two deployments of simple single charm in a row to exclude some openstack charm influence:

Pass 1: https://paste.ubuntu.com/p/qCt7WhHbzv/, http://paste.ubuntu.com/p/cxQ9qgY7gZ/
Pass 2: http://paste.ubuntu.com/p/Yn7ZYVSNzH/ http://paste.ubuntu.com/p/Hp5zcdGZKH/, http://paste.ubuntu.com/p/XP3fNbdv34/

So it looks like that nodes are originally bringing up with existing symlinks; however something is forcing them to disappear later.

@raharper,

> It's likely that one will need to enable udevd verbose debugging and reproduce and capture to see what may be going on there.

We are ready to do this - please, advise, what and where exactly should be enabled to let you get more insights of this.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

I just captured this once more after starting an openstack bundle deployment: https://pastebin.canonical.com/p/dhc8tBqt8Q/

So, workaround with inserting udevadm settle/trigger to cloudinit-userdata will not help there as it's being executed on node initial deployment, but nodes already have all necessary symlinks created.

I will capture the same output before Vault unlocking and let's see the difference.

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

It's failing somewhere in between: https://pastebin.canonical.com/p/bBSkf7mh9f/ (take a look at ln-sv-ostk05, it's already missing some bcaches); however they were there (see previous comment).

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Third redeployment:

$ juju status | pastebinit
http://paste.ubuntu.com/p/Y2Sdv4QJFm/

$ juju run --application ceph-osd 'ls -lah /dev/disk/by-dname | wc -l' | pastebinit
http://paste.ubuntu.com/p/7CxBPzrqhY/

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

Just have run an 'udevadm trigger' before Vault unlocking and looks like it's getting that links back:

$ juju run --application ceph-osd 'sudo udevadm trigger -ubuntu@ln-sv-infr01:~$ juju run --application ceph-osd 'sudo udevadm trigger --subsystem-match=block --action=add'
- Stdout: ""
  UnitId: ceph-osd/1
- Stdout: ""
  UnitId: ceph-osd/2
- Stdout: ""
  UnitId: ceph-osd/3
- Stdout: ""
  UnitId: ceph-osd/6
- Stdout: ""
  UnitId: ceph-osd/7
- Stdout: ""
  UnitId: ceph-osd/8
- Stdout: ""
  UnitId: ceph-osd/0
- Stdout: ""
  UnitId: ceph-osd/4
- Stdout: ""
  UnitId: ceph-osd/5

And links are there:

$ juju run --application ceph-osd 'ls -lah /dev/disk/by-dname | wc -l'
- Stdout: |
    21
  UnitId: ceph-osd/0
- Stdout: |
    21
  UnitId: ceph-osd/1
- Stdout: |
    21
  UnitId: ceph-osd/2
- Stdout: |
    21
  UnitId: ceph-osd/3
- Stdout: |
    21
  UnitId: ceph-osd/4
- Stdout: |
    21
  UnitId: ceph-osd/5
- Stdout: |
    21
  UnitId: ceph-osd/6
- Stdout: |
    21
  UnitId: ceph-osd/7
- Stdout: |
    21
  UnitId: ceph-osd/8

Also - later, when Vault unseal is in progress, I did this:

$ for x in $(seq 0 8); do juju ssh $x 'hostname; ls -lah /dev/disk/by-dname/' >> out.txt; done
http://paste.ubuntu.com/p/5Mb9Hz8b7s/

All OSDs are up: https://paste.ubuntu.com/p/ffnbGcBXTN/
But some of links are not back after charm execution: https://paste.ubuntu.com/p/TMdHKS84jf/

Revision history for this message
James Page (james-page) wrote :

Workaround for the time being until the actual root cause is identified:

 juju run --application ceph-osd "sudo udevadm trigger --subsystem-match=block --action=add"

before unsealing and authorising the vault charm. Charm task marked incomplete pending diagnosis of what's actually mangling the by-dname links.

Note that standalone ceph-* deployments are not reported to have this issue.

Revision history for this message
Robie Basak (racb) wrote :

AIUI there is no action for bcache-tools right now so I'm marking as Incomplete.

Changed in bcache-tools (Ubuntu):
status: New → Incomplete
Revision history for this message
Ryan Harper (raharper) wrote :

I spent time looking at a udev debug log and I can confirm that during a cold plug of the storage subsystem the bcache symlinks (kernel (/dev/bcache/by-{uuid/label}, dname (/dev/disk/by-dname/)) will be removed due to how the bcache driver works (it binds a backing and a cache device) and when they aren't bound, then the symlinks have no reference in the kernel and udev removes them.

The total time for a cold plug to settle was on the order of several minutes. The system will do a cold plug very early after transitioning from initramfs to rootfs; there is nothing that *forces* udev to settle; it will over time ,however, if the charms are running before udev has settled all of the uevents from the kernel, it's possible to find that bcache related symlinks are not yet processed.

I believe we're waiting on capturing the entire system journal (/var/log/journal/*) from a failing boot and then we'd like to confirm at what point do the ceph charms start interacting with the system.

Revision history for this message
Ryan Harper (raharper) wrote :
Download full text (8.4 KiB)

After looking at the journal I can conclude a few things.

1) the bcache devices are found and rules run and change events emitted fairly early during boot

[ 8.873909] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 673 keys in 35 entries, seq 724
[ 8.875099] ubuntu kernel: i40e 0000:5e:00.3 eth3: NIC Link is Up, 10 Gbps Full Duplex, Flow Control: RX/TX
[ 8.894272] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi3 as bcache8 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515
[ 8.908230] ubuntu kernel: bcache: bch_cached_dev_attach() Caching sdi4 as bcache7 on set ab6e6820-99ba-4438-9df5-3f0f02fe8515
[ 8.919554] ubuntu kernel: bcache: register_cache() registered cache device nvme0n1p1
[ 8.976751] ubuntu kernel: bcache: bch_journal_replay() journal replay done, 0 keys in 2 entries, seq 41

2) After Cold Plug (which happens very early after rootfs is mounted and the systemd-udevd.service is started; we see a repeat of the registration (which will re-emit the change event which exports the CACHED_UUID/CACHED_LABEL values from the kernel to userspace for the bcache rules to run and generate symlinks in /dev/bcache/by-uuid (and dname rules, which also depend on the same values in the ENV when the rules run).

[ 13.320515] ubuntu systemd[1]: Started udev Coldplug all Devices.
...
[ 14.405166] ubuntu kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event)
[ 14.405280] ubuntu kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event)
(skipping the additional ones)

3) Quite some time after the system is up and running, we see repeat registrations which are the result of something invoking udevadm trigger.

[ 837.100772] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdb: device already registered (emitting change event)
[ 837.117295] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdd: device already registered (emitting change event)
[ 837.120357] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: device already registered (emitting change event)
[ 837.122496] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sde: device already registered (emitting change event)
[ 837.122765] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdf: device already registered (emitting change event)
[ 837.132009] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdc: device already registered (emitting change event)
[ 837.160226] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event)
[ 837.177862] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/nvme0n1p3: device already registered
[ 837.178724] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/nvme0n1p1: device already registered
[ 837.195827] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdg: device already registered (emitting change event)
[ 837.196084] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/sdh: device already registered (emitting change event)
[ 837.196107] ln-sv-ostk01 kernel: bcache: register_bcache() error /dev/...

Read more...

Revision history for this message
James Page (james-page) wrote : Re: [Bug 1812925] Re: No OSDs has been initialized in random unit with "No block devices detected using current configuration"

On Tue, Jan 29, 2019 at 5:50 PM Ryan Harper <email address hidden>
wrote:

> I'm not 100% sure at which point ceph/charms make use of the dname
> symlinks, but ceph appears to be up much earlier than this.
>
> $ journalctl -o short-monotonic -D fd6c366f929d47a195590b3c6dc9df5a -u
> ceph*
> -- Logs begin at Tue 2019-01-29 14:20:56 UTC, end at Tue 2019-01-29
> 15:23:10 UTC. --
> [ 290.386737] ln-sv-ostk01 systemd[1]: Reached target ceph target
> allowing to start/stop all ceph*@.service instances at once.
> [ 307.411830] ln-sv-ostk01 systemd[1]: Reached target ceph target
> allowing to start/stop all ceph-radosgw@.service instances at once.
> [ 308.321612] ln-sv-ostk01 systemd[1]: Reached target ceph target
> allowing to start/stop all ceph-osd@.service instances at once.
> [ 309.064386] ln-sv-ostk01 systemd[1]: Reached target ceph target
> allowing to start/stop all ceph-mon@.service instances at once.
> [ 309.911307] ln-sv-ostk01 systemd[1]: Reached target ceph target
> allowing to start/stop all ceph-mgr@.service instances at once.
>
>
Those are the top level target systemd configurations that get enabled and
started on package install so I think they can be ignored in the context of
this bug.

A ceph-osd@<UUID> is an actual OSD starting associated with a specific
block device.

Revision history for this message
Ryan Harper (raharper) wrote :

I've recreated an VM instance that uses the curtin config attached to this bug.

I can recreate the additional registrations and delete/create of the dname symlinks to the bcache devices by simply running 'udevadm trigger' in a loop, and watching, dname directory with inotifywatch (inotifywait -mr /dev/disk/by-dname).

During a 'udevadm trigger', I can observer 10 DELETE operations (bcache0..9), and then 10 CREATE operations (bcache0..9).

The symlinks appear almost immediately.

Next, I increased the IO load on the system by spinning up fio against several of the bcacheN devices; This increased the time between removal and creation, but only in the order of milliseconds.

I've not been able to induce a state such that only some of the bcache* symlinks are missing; so we're still looking for what else is going on in the system such that we've got pending uevents that need to be processed.

Do we have any insight into the program using the dname symlinks?

Revision history for this message
Vladimir Grevtsev (vlgrevtsev) wrote :

For information: I have redeployed the same bundle [with only difference - removed Livepatch references] on similar hardware and no reproducer here - all symlinks are in-place.

Changed in charm-ceph-osd:
status: Incomplete → In Progress
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (master)

Reviewed: https://review.openstack.org/632738
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=8655dbd9e34a31b11b276e34118e190f6bf8467e
Submitter: Zuul
Branch: master

commit 8655dbd9e34a31b11b276e34118e190f6bf8467e
Author: Liam Young <email address hidden>
Date: Wed Jan 23 14:46:26 2019 +0000

    udevadm settle before processing osd-devices

    Run udevadm settle before processing osd-devices as there may by
    udev events that have been queued for processing on server startup.

    Includes charms.ceph sync

    Corresponding charms.ceph change is:
    Iec5932a4d819ad87e54c2af391abe1befe84f164

    Change-Id: Ia4210d6a5c2c8a0b0e2038b25f24d2d82600e10a
    Closes-Bug: #1812925

Changed in charm-ceph-osd:
status: In Progress → Fix Committed
Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix proposed to charm-ceph-osd (stable/18.11)

Fix proposed to branch: stable/18.11
Review: https://review.openstack.org/644503

Revision history for this message
OpenStack Infra (hudson-openstack) wrote : Fix merged to charm-ceph-osd (stable/18.11)

Reviewed: https://review.openstack.org/644503
Committed: https://git.openstack.org/cgit/openstack/charm-ceph-osd/commit/?id=c8124598617f68023ed59dc207d2d4fbe25a6004
Submitter: Zuul
Branch: stable/18.11

commit c8124598617f68023ed59dc207d2d4fbe25a6004
Author: Liam Young <email address hidden>
Date: Wed Jan 23 14:46:26 2019 +0000

    udevadm settle before processing osd-devices

    Run udevadm settle before processing osd-devices as there may by
    udev events that have been queued for processing on server startup.

    Includes charms.ceph sync

    Corresponding charms.ceph change is:
    Iec5932a4d819ad87e54c2af391abe1befe84f164

    Change-Id: Ia4210d6a5c2c8a0b0e2038b25f24d2d82600e10a
    Closes-Bug: #1812925
    (cherry picked from commit 8655dbd9e34a31b11b276e34118e190f6bf8467e)

James Page (james-page)
Changed in charm-ceph-osd:
milestone: none → 19.04
David Ames (thedac)
Changed in charm-ceph-osd:
status: Fix Committed → Fix Released
Revision history for this message
Joshua Powers (powersj) wrote :

I do not think there was any remaining work for bcache-tools to do and that this is fixed, so I am marking the bcache-tools task as invalid. If my assumption is incorrect, please feel free to respond to this bug if there is open work remaining. Thanks!

Changed in bcache-tools (Ubuntu):
status: Incomplete → Invalid
To post a comment you must log in.
This report contains Public information  
Everyone can see this information.

Other bug subscribers

Remote bug watches

Bug watches keep track of this bug in other bug trackers.